Team Dashboard
production-sre
Watch OpenSRE Investigate a Real Incident
Pick a production scenario below. OpenSRE dispatches 4 subagents in parallel to analyze Kubernetes, logs, metrics, and traces — then synthesizes a root cause report.
Payment Service Failure
payment-service returning 500s on /api/v1/charge endpoint with 45% error rate. Customers unable to complete purchases.
Auth Gateway Latency Spike
auth-gateway p99 latency jumped from 120ms to 2.8s. Downstream services experiencing timeouts.
Order Processing Cascade
order-processing queue depth growing unbounded with 45k message consumer lag. Orders not being fulfilled.
Each investigation takes ~60 seconds. You can skip to the end at any time.
Team Overview
Recent Activity
Investigation completed: Redis connection timeout in cache-service
Investigation completed: Disk pressure on monitoring-worker-3
Investigation failed: Unable to reach metrics endpoint
Agent topology updated: Added traces subagent
Investigation completed: API gateway 504s on /api/orders
Knowledge base updated: Added runbook for database failover
Investigation completed: Certificate expiry warning on ingress-controller
Investigation completed: Memory leak in notification-service
