Node Monitoring
Set up comprehensive monitoring for your blockchain infrastructure with Prometheus, Grafana, and alerting to ensure maximum uptime and performance.
The Monitoring Stack
Prometheus
Metrics CollectionTime-series database for collecting and storing metrics from your nodes
Grafana
VisualizationDashboard platform for creating beautiful visualizations of your metrics
Alertmanager
AlertingHandles alerts from Prometheus and routes them to various notification channels
Node Exporter
System MetricsCollects hardware and OS metrics like CPU, memory, and disk usage
Key Metrics to Monitor
Node Health
Is the node fully synced?
Number of connected peers
Current block vs network head
Number of chain reorganizations
System Resources
Processor utilization percentage
RAM consumption
Read/write operations per second
Available storage remaining
Network
Network traffic volume
Peer connection latency
API response latency
Failed connection attempts
Quick Setup with Docker
version: "3.8"
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
volumes:
prometheus-data:
grafana-data:Prometheus Configuration
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- "alert_rules.yml"
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Node Exporter (system metrics)
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# Geth metrics
- job_name: 'geth'
static_configs:
- targets: ['geth:6060']
metrics_path: /debug/metrics/prometheus
# Lighthouse metrics
- job_name: 'lighthouse'
static_configs:
- targets: ['lighthouse:5054']Essential Alert Rules
Node Out of Sync
sync_status == false for 5mAction: Check node logs, restart if needed
Low Peer Count
peer_count < 10 for 10mAction: Check network connectivity, firewall rules
High CPU Usage
cpu_usage > 90% for 15mAction: Investigate processes, consider scaling
Disk Space Critical
disk_free < 50GBAction: Prune data or expand storage immediately
Memory Pressure
memory_usage > 85% for 10mAction: Check for memory leaks, adjust limits
Recommended Grafana Dashboards
Node Overview
High-level view of sync status, peer count, and block height across all nodes.
Dashboard ID: 13473Geth Metrics
Geth-specific metrics including chain data, transaction pool, and RPC stats.
Dashboard ID: 13856Monitoring Best Practices
Do
- • Set up alerts for all critical metrics
- • Use multiple notification channels (Slack, PagerDuty, email)
- • Retain metrics for at least 30 days for trend analysis
- • Create runbooks for each alert type
- • Test alert routing regularly
- • Monitor the monitoring system itself
Don't
- • Create too many alerts (alert fatigue)
- • Ignore warning-level alerts
- • Skip documentation for dashboards
- • Rely on a single notification channel
- • Forget to monitor disk growth rate
- • Set thresholds too tight (false positives)
Want Pre-Built Monitoring?
ChainLens provides built-in monitoring and alerting for all your blockchain infrastructure.