Incident Response Demo
Alert → incident → timeline → root cause → postmortem.
Full incident lifecycle with AI-assisted root cause and postmortem generation.
data flow
scenario architecture
alert ──promote──▶ incident
│
timeline (deploys + configs + logs joined on correlationId)
│
root-cause hypothesis (Incident Engineer)
│
▼
two-person approval ──tip──▶ mitigation
│
postmortem auto-draftedAlert becomes incident
Auto-promotion when severity + duration thresholds cross.
incident timeline · corr_ck98zxa9
- T-0Alert fired · cpu_p95 > 92% for 5m
- T+12sPromoted to incident · severity=high
- T+22sRecent deploys queried · 3 in last 30m
- T+34sDeploy correlation hit · sha 4a2b8c (12m ago)
- T+58sTrace surge identified · /orders @ 4.1s p95
- T+1mRoot-cause hypothesis · missing index on user_id
- T+2mMitigation proposed · awaiting approval
Timelines are stitched, not narrated
The incident timeline is built from real platform events — deploys, audit rows, config changes, alert ingestions — joined on correlationId. The AI proposes a root cause hypothesis with citations to the underlying rows.
safety invariants in play
- ✓Blast radius capped — Each change is risk-tiered against the number of resources it touches; high tiers force extra approvers.
expected result
Incident page opens with metadata.
engineering principle
Incidents persist their timeline + their root-cause hypothesis as Prisma rows, not in-memory state. A process restart never loses a postmortem in flight. Approval-gated mitigations show up in the same approvals queue as everything else.