Prerequisites
Before you begin, make sure the following components are available:
- Prometheus 2.x or later, running and accessible
- Grafana 10.x or later, running and accessible
- Reva deployed with the
/api/metricsendpoint reachable from Prometheus
Verify the metrics endpoint is working:
curl http://<reva-host>:3978/api/metrics
You should see Prometheus text format output with metrics prefixed reva_.
No authentication is required for the metrics endpoint. Ensure network-level access from your Prometheus instance to the Reva host on port 3978.
Prometheus Configuration
Copy or merge config/prometheus/prometheus.yml into your Prometheus configuration.
The default static_configs target reva:3978 assumes Prometheus runs in the same Docker network as Reva.
# prometheus.yml (Docker Compose)
scrape_configs:
- job_name: "reva"
metrics_path: "/api/metrics"
scrape_interval: 30s
static_configs:
- targets: ["reva:3978"]
Replace static_configs with kubernetes_sd_configs to auto-discover the Reva pod in the reva namespace.
# prometheus.yml (Kubernetes)
scrape_configs:
- job_name: "reva"
metrics_path: "/api/metrics"
scrape_interval: 30s
kubernetes_sd_configs:
- role: pod
namespaces:
names: ["reva"]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: reva
action: keep
Reload Prometheus after updating the config:
docker compose exec prometheus kill -HUP 1
kubectl rollout restart deployment/prometheus -n monitoring
Verification: Navigate to Status > Targets in the Prometheus UI and confirm the reva target shows state UP.
Grafana Dashboard Import
Reva ships with a pre-built Grafana dashboard at config/grafana/reva-dashboard.json. Import it in four steps:
- Open Grafana and go to Dashboards > Import.
- Click Upload JSON file and select
config/grafana/reva-dashboard.json. - In the import dialog, select your Prometheus datasource from the dropdown.
- Click Import.
Dashboard Panels
The dashboard "Reva — Release Virtual Assistant" contains seven panels across four rows:
| Row | Panels |
|---|---|
| Request Performance | Latency percentiles, request rate, error rate |
| Agent & LLM | Agent loop steps, LLM inference time |
| MCP Tools | Tool duration by server (table) |
| Active Sessions & Infrastructure | Sessions, message count, memory/DB/Redis |
The dashboard auto-refreshes every 30 seconds. Some panels may show "No data" until Reva receives its first request.
Alert Configuration
Copy config/prometheus/alerts.yml to your Prometheus rules directory (the same directory as prometheus.yml or wherever rule_files points).
Alert Rules
The following alerts are pre-defined:
| Alert | Condition | Severity |
|---|---|---|
RevaHighErrorRate | Error rate > 10% for 5 min | critical |
RevaMcpDisconnected | MCP reconnect count increases | warning |
RevaSlowLlm | LLM p95 latency > 60s for 5 min | warning |
RevaHighMemory | RSS > 2 GB for 10 min | warning |
RevaNoRequests | 0 req/min for 15 min during business hours | info |
AlertManager Routing
To receive alert notifications, configure AlertManager with a receiver. Example for Slack:
# alertmanager.yml
route:
receiver: "slack"
routes:
- match:
severity: critical
receiver: "slack"
repeat_interval: 1h
receivers:
- name: "slack"
slack_configs:
- api_url: "https://hooks.slack.com/services/..."
channel: "#reva-alerts"
title: '{{ .CommonAnnotations.summary }}'
text: '{{ .CommonAnnotations.description }}'
You can also route alerts to Microsoft Teams, email, or PagerDuty. See the AlertManager documentation for all receiver types.
Verification
After completing the setup, walk through these four checks to confirm everything is working:
- Prometheus target: Navigate to
http://<prometheus>:9090/targetsand confirmrevashows statusUP. - Raw query: Run
reva_uptime_secondsin the Prometheus query UI. It should return a positive value. - Grafana panels: Open the Reva dashboard. All panels should show data (some may be zero if no traffic has occurred yet).
- Alert rules: Navigate to
http://<prometheus>:9090/rulesand verify thereva.rulesgroup is loaded.
If all four checks pass, your monitoring stack is fully operational. Reva will now report metrics continuously and trigger alerts when thresholds are breached.
Metric Reference
Gauges
| Metric | Description |
|---|---|
reva_uptime_seconds | Process uptime in seconds |
reva_request_duration_p50_seconds | Request duration 50th percentile |
reva_request_duration_p95_seconds | Request duration 95th percentile |
reva_request_duration_p99_seconds | Request duration 99th percentile |
reva_requests_per_minute | Requests received in the last 60 seconds |
reva_agent_steps_p50 | Agent loop steps per request, 50th percentile |
reva_agent_steps_p95 | Agent loop steps per request, 95th percentile |
reva_agent_tool_calls_p50 | Tool calls per request, 50th percentile |
reva_agent_tool_calls_p95 | Tool calls per request, 95th percentile |
reva_llm_step_duration_p50_seconds | LLM step duration 50th percentile |
reva_llm_step_duration_p95_seconds | LLM step duration 95th percentile |
reva_llm_step_duration_p99_seconds | LLM step duration 99th percentile |
reva_mcp_tool_duration_p50_seconds{server="..."} | MCP tool call duration p50, per server |
reva_mcp_tool_duration_p95_seconds{server="..."} | MCP tool call duration p95, per server |
reva_webhook_events_per_hour | Webhook events received in the last hour |
reva_active_sessions | Currently active conversation sessions |
reva_messages_last_24h | Messages processed in the last 24 hours |
reva_db_pool_checked_out | Database connections currently checked out |
reva_db_pool_size | Total database connection pool size |
reva_process_memory_rss_bytes | Process resident set size in bytes |
reva_redis_connected | Redis connection status (1 = connected, 0 = disconnected) |
Counters
| Metric | Labels | Description |
|---|---|---|
reva_requests_total | status (success, error) | Total messages received |
reva_agent_max_steps_aborts_total | — | Times the agent hit the max-steps limit |
reva_llm_tokens_input_total | — | Estimated input tokens consumed |
reva_llm_tokens_output_total | — | Estimated output tokens generated |
reva_mcp_reconnects_total | — | MCP server reconnection events |
reva_webhook_events_total | source (release, jira) | Incoming webhook events by source |
reva_notification_deliveries_total | status (success, failed) | Notification delivery attempts |
reva_messages_by_language_total | lang (de, en) | Messages by detected language |
reva_router_classifications_total | role (release, ...) | Router intent classifications by role |
All gauge metrics are computed over a sliding window. Counter metrics increase monotonically and reset to zero on process restart. Use Prometheus rate() or increase() functions for counter-based queries.