Health Check
The health endpoint verifies database connectivity, Teams adapter initialization, and MCP server status. Docker and Kubernetes healthchecks point here — the pod is marked unhealthy if any MCP server is disconnected.
curl -s http://localhost:3978/api/health | python3 -m json.tool
curl -s https://your-domain.example.com/api/health | python3 -m json.tool
Expected response (all systems operational):
{
"status": "ok",
"adapter_initialized": true,
"db": true,
"mcp": {
"release": { "connected": true, "tools": 38, "error": null },
"jira": { "connected": true, "tools": 29, "error": null }
}
}
| Field | Meaning |
|---|---|
status | "ok" (HTTP 200) or "degraded" (HTTP 503) |
adapter_initialized | Teams Bot Framework adapter is ready |
db | PostgreSQL connection successful |
mcp.*.connected | MCP server is connected and responding |
mcp.*.tools | Number of tools registered by the MCP server |
After a deployment, MCP sidecars need 20–30 seconds to connect. The health check returns 503 during this startup window.
Metrics & Monitoring
Reva exposes metrics in two formats. No authentication is required for metrics endpoints.
Prometheus Format
GET /api/metrics # text/plain; version=0.0.4
Scrape this endpoint with Prometheus, Grafana Agent, or any compatible collector. Key metrics:
| Metric | Type | Description |
|---|---|---|
reva_request_duration_p50_seconds | Gauge | Message handling time (p50) |
reva_requests_total | Counter | Total messages by status (success/error) |
reva_llm_step_duration_p50_seconds | Gauge | LLM inference time (p50) |
reva_agent_max_steps_aborts_total | Counter | Agent loops that hit the iteration limit |
reva_mcp_tool_duration_p50_seconds | Gauge | MCP tool call duration by server |
reva_active_sessions | Gauge | Currently active conversation sessions |
reva_notification_deliveries_total | Counter | Proactive notification deliveries |
JSON Format
GET /api/stats # application/json
Same data as structured JSON — useful for dashboards, scripts, or ad-hoc inspection.
Support Bundle
The support bundle collects GDPR-safe diagnostic data for remote troubleshooting. It is designed for customer deployments where X-idra has no SSH access. Two collection methods are available: an API endpoint (when the application is running) and a shell script (when it is not).
Setup
Set a shared secret to enable the support bundle endpoint:
Add to your .env file:
REVA_SUPPORT_SECRET=your-secret-here
Then restart:
docker compose up -d reva
Patch the ConfigMap:
kubectl patch configmap reva-env -n reva \
--type merge -p '{"data":{"REVA_SUPPORT_SECRET":"your-secret-here"}}'
kubectl rollout restart deployment/reva -n reva
API Endpoint (Application Running)
Collect diagnostics as JSON or a downloadable ZIP archive:
# JSON response
curl -H "X-Support-Secret: $REVA_SUPPORT_SECRET" \
https://your-domain.example.com/api/support-bundle
# ZIP archive (one file per section)
curl -H "X-Support-Secret: $REVA_SUPPORT_SECRET" \
https://your-domain.example.com/api/support-bundle?format=zip \
-o support-bundle.zip
What It Collects
The bundle runs 11 independent collectors. If one fails, the others still complete.
| Section | Data Collected |
|---|---|
system_info | OS, kernel, architecture, hostname, RAM, CPU, disk, Python version, installed packages |
config | Reva + Renfield settings (secrets masked), relevant environment variables |
health | Database, MCP servers, Teams adapter, Ollama, Redis connectivity |
metrics | Full metrics snapshot (request performance, agent loop, LLM, MCP, webhooks) |
mcp_status | MCP server connectivity, tool list per server |
router_state | Agent router roles, models, descriptions, MCP bindings |
db_stats | Table row counts, active connections, database size, connection pool stats |
ollama | Available models, running models with VRAM usage |
network | DNS resolution + TCP connectivity tests for all configured services |
logs | Last 500 log lines + last 200 error/warning lines (sanitized) |
error_summary | Error counts, agent aborts, MCP failures, notification failures |
GDPR Sanitization
The support bundle is designed to be safe for sharing across organizational boundaries:
| Data Type | Treatment |
|---|---|
| Passwords, tokens, API keys | Replaced with *** |
Environment variables matching PASSWORD, SECRET, TOKEN, KEY, APP_ID, TENANT | Values replaced with *** |
| User names in logs | Replaced with [REDACTED] |
| Session / conversation IDs in logs | Truncated to 8 characters |
| Database content (messages, memories) | Never queried — only pg_stat_user_tables row counts |
Authentication: The endpoint returns 403 if REVA_SUPPORT_SECRET is not configured, and 401 if the X-Support-Secret header does not match. Without the correct secret, no diagnostic data is exposed.
Offline Shell Script (Application Not Running)
When the application cannot start or the API is unreachable, use the shell script. It auto-detects whether the deployment uses Docker Compose or Kubernetes.
# Without API access (collects system info, logs, container status, DB stats)
./bin/support-bundle.sh
# With API access (also pulls the full API bundle)
REVA_SUPPORT_SECRET=your-secret ./bin/support-bundle.sh
The script produces a support-bundle-YYYY-MM-DD-HHMMSS.tar.gz archive in the project root directory containing:
- System information (OS, memory, disk, Docker/K8s version)
- Container/pod status and resource usage
- Application logs (last 1000 lines)
- Health check response (if reachable)
- Configuration with secrets masked
- Database statistics (row counts, connections, size)
- Full API bundle (if secret provided and API reachable)
GDPR_NOTICE.txtdocumenting what was sanitized
Send the resulting .tar.gz or .zip file to info@x-idra.de for analysis. The bundle contains no personal data, conversation content, or credentials.
Database Backups
# Manual backup
./bin/backup-db.sh
# Automated daily backup (add to crontab)
0 2 * * * /path/to/reva/bin/backup-db.sh
Backups are saved as gzipped SQL dumps to backups/reva_YYYY-MM-DD_HHMMSS.sql.gz with 30-day automatic retention.
# Automated: CronJob runs daily at 02:00 UTC
kubectl get cronjob db-backup -n reva
# Manual backup
kubectl create job --from=cronjob/db-backup db-backup-manual -n reva
# Check backup logs
kubectl logs -n reva job/db-backup-manual
Backups are stored on the persistent volume. For off-cluster backup, mount an additional PVC or configure an S3 upload in the CronJob.
Backups use the postgres superuser account (not the restricted reva app user) to ensure all schemas and permissions are captured.
Log Management
# Follow Reva logs
docker compose logs -f reva
# Search for errors
docker compose logs reva | grep -i "error\|warning\|critical"
# Follow specific MCP server logs
docker compose logs -f reva 2>&1 | grep -i mcp
Log Rotation
Docker log rotation is pre-configured in docker-compose.yml (json-file driver):
| Service | Max Size | Max Files | Total |
|---|---|---|---|
| reva | 50 MB | 5 | 250 MB |
| postgres | 20 MB | 3 | 60 MB |
| redis | 10 MB | 3 | 30 MB |
# Reva application
kubectl logs -n reva -l app=reva -c reva -f
# MCP sidecars
kubectl logs -n reva -l app=reva -c release-mcp -f
kubectl logs -n reva -l app=reva -c jira-mcp -f
# Previous pod logs (after crash)
kubectl logs -n reva -l app=reva -c reva --previous
# Search for errors
kubectl logs -n reva -l app=reva -c reva | grep -i "error\|warning"
Kubernetes manages log rotation through the container runtime. For long-term log retention, configure a log aggregator (Loki, Elasticsearch, etc.).
Updates & Rollback
Update
# Update version in .env
REVA_VERSION=1.0.5
# Pull new image and restart
docker compose pull reva
docker compose up -d reva
Rollback
# Revert to previous version
REVA_VERSION=1.0.4
docker compose up -d reva
Update
# Build new image and import
docker build -t reva:latest .
docker save reva:latest | sudo k3s ctr images import -
# Restart deployment
kubectl rollout restart deployment/reva -n reva
# Watch rollout
kubectl rollout status deployment/reva -n reva
Rollback
# Rollback to previous version
kubectl rollout undo deployment/reva -n reva
# Check rollout history
kubectl rollout history deployment/reva -n reva
Before updating: Always create a database backup first. The application runs database migrations automatically at startup, and some migrations may not be reversible.