Intermediate Any model

Blameless Incident Postmortem Generator

Turn incident notes into a structured, blameless postmortem with timeline, root cause chain, and concrete action items.

incident-responsepostmortemreliabilityblamelesssre

What it does

Takes your raw incident notes — Slack messages, on-call logs, timestamps, whatever you have — and produces a structured postmortem document. The key differentiator: it enforces blameless language, separates contributing factors from the root cause, and generates action items that are specific enough to actually track.

The Prompt

Generate a blameless incident postmortem from the following incident notes.

Incident notes:
[PASTE RAW NOTES — Slack messages, timeline entries, on-call logs, whatever you have. Messy is fine.]

Service/system affected:
[WHICH SYSTEMS WERE IMPACTED]

Impact duration:
[START TIME → DETECTION TIME → MITIGATION TIME → RESOLUTION TIME, or best estimates]

User impact:
[WHAT USERS EXPERIENCED — errors, latency, data issues, complete outage]

Structure the postmortem as follows:

## Summary
One paragraph: what happened, how long, what was impacted. Written so someone outside the team understands it.

## Timeline
Chronological table: Time | Event | Actor/System
Start from the triggering event (not from when on-call was paged). Include automated system responses (alerts firing, auto-scaling, circuit breakers) alongside human actions.

## Root Cause Chain
Do NOT write a single root cause. Write a CHAIN of contributing factors:
- Triggering event: The specific change or condition that initiated the incident
- Enabling condition: What made the system vulnerable to this trigger (missing guard, untested path, stale config)
- Propagation factor: Why the impact spread beyond the initial failure point
- Detection gap: Why it took [X minutes] to detect (if detection was slow)

## What Went Well
At least 2 things. Detection speed, team response, containment effectiveness, communication quality. Be specific.

## What Went Poorly
At least 2 things. Detection gaps, missing runbooks, unclear ownership, tooling gaps. Be specific.

## Action Items
For each action item:
- Action: Specific, completable task (not "improve monitoring")
- Type: PREVENT (stop recurrence) / DETECT (catch it faster) / MITIGATE (reduce blast radius)
- Owner: [TO BE ASSIGNED]
- Priority: P0 (this week) / P1 (this sprint) / P2 (this quarter)

Rules:
- NEVER use phrases like "X failed to" or "Y should have." Describe what happened, not who failed.
- Write action items as engineering tasks, not behavioral changes ("add circuit breaker on service X" not "be more careful with deployments").
- If something is unclear from the notes, flag it as "[UNCLEAR — verify with team]" rather than guessing.

Usage Notes

Feed in raw, messy notes. The prompt is designed to handle unstructured input — you don’t need to clean it up first.
The root cause chain format is more useful than a single root cause. Incidents almost never have one cause; they have a trigger that hits an enabling condition.
The “never use ‘failed to’” rule produces genuinely blameless documents. Without it, AI defaults to finger-pointing language.
Action items typed as PREVENT/DETECT/MITIGATE help you balance your reliability investment. If all your action items are PREVENT, you’re neglecting detection and mitigation.
Run the output past the incident responders before publishing. The AI structures well but may misinterpret sequence or causality from ambiguous notes.