Monitoring Setup

Prerequisites

Before you begin, make sure the following components are available:

Prometheus 2.x or later, running and accessible
Grafana 10.x or later, running and accessible
Reva deployed with the /api/metrics endpoint reachable from Prometheus

Verify the metrics endpoint is working:

curl http://<reva-host>:3978/api/metrics

You should see Prometheus text format output with metrics prefixed reva_.

No authentication is required for the metrics endpoint. Ensure network-level access from your Prometheus instance to the Reva host on port 3978.

Prometheus Configuration

Copy or merge config/prometheus/prometheus.yml into your Prometheus configuration.

The default static_configs target reva:3978 assumes Prometheus runs in the same Docker network as Reva.

# prometheus.yml (Docker Compose)
scrape_configs:
  - job_name: "reva"
    metrics_path: "/api/metrics"
    scrape_interval: 30s
    static_configs:
      - targets: ["reva:3978"]

Replace static_configs with kubernetes_sd_configs to auto-discover the Reva pod in the reva namespace.

# prometheus.yml (Kubernetes)
scrape_configs:
  - job_name: "reva"
    metrics_path: "/api/metrics"
    scrape_interval: 30s
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: ["reva"]
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: reva
        action: keep

Reload Prometheus after updating the config:

docker compose exec prometheus kill -HUP 1

kubectl rollout restart deployment/prometheus -n monitoring

Verification: Navigate to Status > Targets in the Prometheus UI and confirm the reva target shows state UP.

Grafana Dashboard Import

Reva ships with a pre-built Grafana dashboard at config/grafana/reva-dashboard.json. Import it in four steps:

Open Grafana and go to Dashboards > Import.
Click Upload JSON file and select config/grafana/reva-dashboard.json.
In the import dialog, select your Prometheus datasource from the dropdown.
Click Import.

Dashboard Panels

The dashboard "Reva — Release Virtual Assistant" contains seven panels across four rows:

Row	Panels
Request Performance	Latency percentiles, request rate, error rate
Agent & LLM	Agent loop steps, LLM inference time
MCP Tools	Tool duration by server (table)
Active Sessions & Infrastructure	Sessions, message count, memory/DB/Redis

The dashboard auto-refreshes every 30 seconds. Some panels may show "No data" until Reva receives its first request.

Alert Configuration

Copy config/prometheus/alerts.yml to your Prometheus rules directory (the same directory as prometheus.yml or wherever rule_files points).

Alert Rules

The following alerts are pre-defined:

Alert	Condition	Severity
`RevaHighErrorRate`	Error rate > 10% for 5 min	critical
`RevaMcpDisconnected`	MCP reconnect count increases	warning
`RevaSlowLlm`	LLM p95 latency > 60s for 5 min	warning
`RevaHighMemory`	RSS > 2 GB for 10 min	warning
`RevaNoRequests`	0 req/min for 15 min during business hours	info

AlertManager Routing

To receive alert notifications, configure AlertManager with a receiver. Example for Slack:

# alertmanager.yml
route:
  receiver: "slack"
  routes:
    - match:
        severity: critical
      receiver: "slack"
      repeat_interval: 1h

receivers:
  - name: "slack"
    slack_configs:
      - api_url: "https://hooks.slack.com/services/..."
        channel: "#reva-alerts"
        title: '{{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

You can also route alerts to Microsoft Teams, email, or PagerDuty. See the AlertManager documentation for all receiver types.

Verification

After completing the setup, walk through these four checks to confirm everything is working:

Prometheus target: Navigate to http://<prometheus>:9090/targets and confirm reva shows status UP.
Raw query: Run reva_uptime_seconds in the Prometheus query UI. It should return a positive value.
Grafana panels: Open the Reva dashboard. All panels should show data (some may be zero if no traffic has occurred yet).
Alert rules: Navigate to http://<prometheus>:9090/rules and verify the reva.rules group is loaded.

If all four checks pass, your monitoring stack is fully operational. Reva will now report metrics continuously and trigger alerts when thresholds are breached.

Metric Reference

Gauges

Metric	Description
`reva_uptime_seconds`	Process uptime in seconds
`reva_request_duration_p50_seconds`	Request duration 50th percentile
`reva_request_duration_p95_seconds`	Request duration 95th percentile
`reva_request_duration_p99_seconds`	Request duration 99th percentile
`reva_requests_per_minute`	Requests received in the last 60 seconds
`reva_agent_steps_p50`	Agent loop steps per request, 50th percentile
`reva_agent_steps_p95`	Agent loop steps per request, 95th percentile
`reva_agent_tool_calls_p50`	Tool calls per request, 50th percentile
`reva_agent_tool_calls_p95`	Tool calls per request, 95th percentile
`reva_llm_step_duration_p50_seconds`	LLM step duration 50th percentile
`reva_llm_step_duration_p95_seconds`	LLM step duration 95th percentile
`reva_llm_step_duration_p99_seconds`	LLM step duration 99th percentile
`reva_mcp_tool_duration_p50_seconds{server="..."}`	MCP tool call duration p50, per server
`reva_mcp_tool_duration_p95_seconds{server="..."}`	MCP tool call duration p95, per server
`reva_webhook_events_per_hour`	Webhook events received in the last hour
`reva_active_sessions`	Currently active conversation sessions
`reva_messages_last_24h`	Messages processed in the last 24 hours
`reva_db_pool_checked_out`	Database connections currently checked out
`reva_db_pool_size`	Total database connection pool size
`reva_process_memory_rss_bytes`	Process resident set size in bytes
`reva_redis_connected`	Redis connection status (1 = connected, 0 = disconnected)

Counters

Metric	Labels	Description
`reva_requests_total`	`status` (success, error)	Total messages received
`reva_agent_max_steps_aborts_total`	—	Times the agent hit the max-steps limit
`reva_llm_tokens_input_total`	—	Estimated input tokens consumed
`reva_llm_tokens_output_total`	—	Estimated output tokens generated
`reva_mcp_reconnects_total`	—	MCP server reconnection events
`reva_webhook_events_total`	`source` (release, jira)	Incoming webhook events by source
`reva_notification_deliveries_total`	`status` (success, failed)	Notification delivery attempts
`reva_messages_by_language_total`	`lang` (de, en)	Messages by detected language
`reva_router_classifications_total`	`role` (release, ...)	Router intent classifications by role

All gauge metrics are computed over a sliding window. Counter metrics increase monotonically and reset to zero on process restart. Use Prometheus rate() or increase() functions for counter-based queries.