Contents
  1. Prerequisites
  2. Prometheus Configuration
  3. Grafana Dashboard Import
  4. Alert Configuration
  5. Verification
  6. Metric Reference

Prerequisites

Before you begin, make sure the following components are available:

Verify the metrics endpoint is working:

curl http://<reva-host>:3978/api/metrics

You should see Prometheus text format output with metrics prefixed reva_.

No authentication is required for the metrics endpoint. Ensure network-level access from your Prometheus instance to the Reva host on port 3978.

Prometheus Configuration

Copy or merge config/prometheus/prometheus.yml into your Prometheus configuration.

The default static_configs target reva:3978 assumes Prometheus runs in the same Docker network as Reva.

# prometheus.yml (Docker Compose)
scrape_configs:
  - job_name: "reva"
    metrics_path: "/api/metrics"
    scrape_interval: 30s
    static_configs:
      - targets: ["reva:3978"]

Replace static_configs with kubernetes_sd_configs to auto-discover the Reva pod in the reva namespace.

# prometheus.yml (Kubernetes)
scrape_configs:
  - job_name: "reva"
    metrics_path: "/api/metrics"
    scrape_interval: 30s
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: ["reva"]
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: reva
        action: keep

Reload Prometheus after updating the config:

docker compose exec prometheus kill -HUP 1
kubectl rollout restart deployment/prometheus -n monitoring

Verification: Navigate to Status > Targets in the Prometheus UI and confirm the reva target shows state UP.

Grafana Dashboard Import

Reva ships with a pre-built Grafana dashboard at config/grafana/reva-dashboard.json. Import it in four steps:

  1. Open Grafana and go to Dashboards > Import.
  2. Click Upload JSON file and select config/grafana/reva-dashboard.json.
  3. In the import dialog, select your Prometheus datasource from the dropdown.
  4. Click Import.

Dashboard Panels

The dashboard "Reva — Release Virtual Assistant" contains seven panels across four rows:

RowPanels
Request PerformanceLatency percentiles, request rate, error rate
Agent & LLMAgent loop steps, LLM inference time
MCP ToolsTool duration by server (table)
Active Sessions & InfrastructureSessions, message count, memory/DB/Redis

The dashboard auto-refreshes every 30 seconds. Some panels may show "No data" until Reva receives its first request.

Alert Configuration

Copy config/prometheus/alerts.yml to your Prometheus rules directory (the same directory as prometheus.yml or wherever rule_files points).

Alert Rules

The following alerts are pre-defined:

AlertConditionSeverity
RevaHighErrorRateError rate > 10% for 5 mincritical
RevaMcpDisconnectedMCP reconnect count increaseswarning
RevaSlowLlmLLM p95 latency > 60s for 5 minwarning
RevaHighMemoryRSS > 2 GB for 10 minwarning
RevaNoRequests0 req/min for 15 min during business hoursinfo

AlertManager Routing

To receive alert notifications, configure AlertManager with a receiver. Example for Slack:

# alertmanager.yml
route:
  receiver: "slack"
  routes:
    - match:
        severity: critical
      receiver: "slack"
      repeat_interval: 1h

receivers:
  - name: "slack"
    slack_configs:
      - api_url: "https://hooks.slack.com/services/..."
        channel: "#reva-alerts"
        title: '{{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

You can also route alerts to Microsoft Teams, email, or PagerDuty. See the AlertManager documentation for all receiver types.

Verification

After completing the setup, walk through these four checks to confirm everything is working:

  1. Prometheus target: Navigate to http://<prometheus>:9090/targets and confirm reva shows status UP.
  2. Raw query: Run reva_uptime_seconds in the Prometheus query UI. It should return a positive value.
  3. Grafana panels: Open the Reva dashboard. All panels should show data (some may be zero if no traffic has occurred yet).
  4. Alert rules: Navigate to http://<prometheus>:9090/rules and verify the reva.rules group is loaded.

If all four checks pass, your monitoring stack is fully operational. Reva will now report metrics continuously and trigger alerts when thresholds are breached.

Metric Reference

Gauges

MetricDescription
reva_uptime_secondsProcess uptime in seconds
reva_request_duration_p50_secondsRequest duration 50th percentile
reva_request_duration_p95_secondsRequest duration 95th percentile
reva_request_duration_p99_secondsRequest duration 99th percentile
reva_requests_per_minuteRequests received in the last 60 seconds
reva_agent_steps_p50Agent loop steps per request, 50th percentile
reva_agent_steps_p95Agent loop steps per request, 95th percentile
reva_agent_tool_calls_p50Tool calls per request, 50th percentile
reva_agent_tool_calls_p95Tool calls per request, 95th percentile
reva_llm_step_duration_p50_secondsLLM step duration 50th percentile
reva_llm_step_duration_p95_secondsLLM step duration 95th percentile
reva_llm_step_duration_p99_secondsLLM step duration 99th percentile
reva_mcp_tool_duration_p50_seconds{server="..."}MCP tool call duration p50, per server
reva_mcp_tool_duration_p95_seconds{server="..."}MCP tool call duration p95, per server
reva_webhook_events_per_hourWebhook events received in the last hour
reva_active_sessionsCurrently active conversation sessions
reva_messages_last_24hMessages processed in the last 24 hours
reva_db_pool_checked_outDatabase connections currently checked out
reva_db_pool_sizeTotal database connection pool size
reva_process_memory_rss_bytesProcess resident set size in bytes
reva_redis_connectedRedis connection status (1 = connected, 0 = disconnected)

Counters

MetricLabelsDescription
reva_requests_totalstatus (success, error)Total messages received
reva_agent_max_steps_aborts_totalTimes the agent hit the max-steps limit
reva_llm_tokens_input_totalEstimated input tokens consumed
reva_llm_tokens_output_totalEstimated output tokens generated
reva_mcp_reconnects_totalMCP server reconnection events
reva_webhook_events_totalsource (release, jira)Incoming webhook events by source
reva_notification_deliveries_totalstatus (success, failed)Notification delivery attempts
reva_messages_by_language_totallang (de, en)Messages by detected language
reva_router_classifications_totalrole (release, ...)Router intent classifications by role

All gauge metrics are computed over a sliding window. Counter metrics increase monotonically and reset to zero on process restart. Use Prometheus rate() or increase() functions for counter-based queries.

This website does not use cookies or tracking technologies. All fonts are self-hosted; no data is transferred to third parties. See our Privacy Policy for details.