Node Monitoring
Set up comprehensive monitoring for your Ethereum nodes using Prometheus, Grafana, and alerting to ensure 24/7 reliability.
Monitoring Stack
Docker Compose Setup
version: "3.8"
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:-admin}
- GF_INSTALL_PLUGINS=grafana-clock-panel
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager-data:/alertmanager
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
volumes:
prometheus-data:
grafana-data:
alertmanager-data:Prometheus Configuration
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Node exporter (system metrics)
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
# Geth execution client
- job_name: 'geth'
static_configs:
- targets: ['geth:6060']
metrics_path: /debug/metrics/prometheus
# Prysm beacon node
- job_name: 'prysm-beacon'
static_configs:
- targets: ['prysm:8080']
# Prysm validator (if running)
- job_name: 'prysm-validator'
static_configs:
- targets: ['prysm-validator:8081']Enable Metrics on Nodes
Geth: Add --metrics --metrics.addr 0.0.0.0
Prysm: Metrics enabled by default on port 8080
Key Metrics to Monitor
Node Health
eth_syncingSync status - should be false when synced
eth_blockNumberCurrent block number
eth_peerCountNumber of connected peers
chain_head_blockLatest block on the chain
Performance
process_cpu_seconds_totalCPU usage of the node process
process_resident_memory_bytesMemory usage
p2p_peersConnected P2P peers count
rpc_duration_secondsRPC request latency
Consensus
beacon_head_slotCurrent beacon chain slot
beacon_finalized_epochLast finalized epoch
validator_countTotal validators known
attestation_countAttestations processed
Recommended Alert Rules
Node Out of Sync
criticalNode has been syncing for too long or fell behind
eth_syncing == true for > 10mLow Peer Count
warningNode has fewer peers than recommended
eth_peerCount < 10High Memory Usage
warningNode memory approaching system limits
process_resident_memory_bytes > 28GBBlock Production Stalled
criticalNo new blocks received in 5 minutes
rate(eth_blockNumber[5m]) == 0Beacon Chain Not Finalized
criticalBeacon chain has not finalized in 30+ minutes
time() - beacon_finalized_epoch * 384 > 1800groups:
- name: ethereum-node
rules:
- alert: NodeOutOfSync
expr: eth_syncing == 1
for: 10m
labels:
severity: critical
annotations:
summary: "Ethereum node is syncing"
description: "Node has been syncing for more than 10 minutes"
- alert: LowPeerCount
expr: eth_peer_count < 10
for: 5m
labels:
severity: warning
annotations:
summary: "Low peer count"
description: "Node has {{ $value }} peers (< 10)"
- alert: HighMemoryUsage
expr: process_resident_memory_bytes > 28e9
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
description: "Node memory usage is {{ $value | humanize }}"
- alert: BeaconNotFinalized
expr: time() - beacon_finalized_epoch * 384 > 1800
for: 5m
labels:
severity: critical
annotations:
summary: "Beacon chain not finalizing"
description: "No finalization for 30+ minutes"Grafana Dashboards
Import these community dashboards into Grafana for instant visibility:
Geth Dashboard
Comprehensive Geth monitoring with sync status, peers, and performance
Prysm Dashboard
Beacon chain metrics, validator performance, and attestation tracking
Node Exporter
System metrics: CPU, memory, disk I/O, network
Ethereum Validator
Validator earnings, effectiveness, and attestation tracking
Alertmanager Configuration
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'default'
routes:
- match:
severity: critical
receiver: 'critical'
receivers:
- name: 'default'
slack_configs:
- api_url: '${SLACK_WEBHOOK_URL}'
channel: '#alerts'
title: '{{ .CommonAnnotations.summary }}'
text: '{{ .CommonAnnotations.description }}'
- name: 'critical'
slack_configs:
- api_url: '${SLACK_WEBHOOK_URL}'
channel: '#alerts-critical'
pagerduty_configs:
- service_key: '${PAGERDUTY_KEY}'