Coding Intermediate Any model

Blameless Incident Postmortem Generator

Turn incident notes into a structured, blameless postmortem with timeline, root cause chain, and concrete action items.

incident-responsepostmortemreliabilityblamelesssre

What it does

Takes your raw incident notes — Slack messages, on-call logs, timestamps, whatever you have — and produces a structured postmortem document. The key differentiator: it enforces blameless language, separates contributing factors from the root cause, and generates action items that are specific enough to actually track.

The Prompt

Generate a blameless incident postmortem from the following incident notes.

Incident notes:
[PASTE RAW NOTES — Slack messages, timeline entries, on-call logs, whatever you have. Messy is fine.]

Service/system affected:
[WHICH SYSTEMS WERE IMPACTED]

Impact duration:
[START TIME → DETECTION TIME → MITIGATION TIME → RESOLUTION TIME, or best estimates]

User impact:
[WHAT USERS EXPERIENCED — errors, latency, data issues, complete outage]

Structure the postmortem as follows:

## Summary
One paragraph: what happened, how long, what was impacted. Written so someone outside the team understands it.

## Timeline
Chronological table: Time | Event | Actor/System
Start from the triggering event (not from when on-call was paged). Include automated system responses (alerts firing, auto-scaling, circuit breakers) alongside human actions.

## Root Cause Chain
Do NOT write a single root cause. Write a CHAIN of contributing factors:
- Triggering event: The specific change or condition that initiated the incident
- Enabling condition: What made the system vulnerable to this trigger (missing guard, untested path, stale config)
- Propagation factor: Why the impact spread beyond the initial failure point
- Detection gap: Why it took [X minutes] to detect (if detection was slow)

## What Went Well
At least 2 things. Detection speed, team response, containment effectiveness, communication quality. Be specific.

## What Went Poorly
At least 2 things. Detection gaps, missing runbooks, unclear ownership, tooling gaps. Be specific.

## Action Items
For each action item:
- Action: Specific, completable task (not "improve monitoring")
- Type: PREVENT (stop recurrence) / DETECT (catch it faster) / MITIGATE (reduce blast radius)
- Owner: [TO BE ASSIGNED]
- Priority: P0 (this week) / P1 (this sprint) / P2 (this quarter)

Rules:
- NEVER use phrases like "X failed to" or "Y should have." Describe what happened, not who failed.
- Write action items as engineering tasks, not behavioral changes ("add circuit breaker on service X" not "be more careful with deployments").
- If something is unclear from the notes, flag it as "[UNCLEAR — verify with team]" rather than guessing.

Usage Notes

  • Feed in raw, messy notes. The prompt is designed to handle unstructured input — you don’t need to clean it up first.
  • The root cause chain format is more useful than a single root cause. Incidents almost never have one cause; they have a trigger that hits an enabling condition.
  • The “never use ‘failed to’” rule produces genuinely blameless documents. Without it, AI defaults to finger-pointing language.
  • Action items typed as PREVENT/DETECT/MITIGATE help you balance your reliability investment. If all your action items are PREVENT, you’re neglecting detection and mitigation.
  • Run the output past the incident responders before publishing. The AI structures well but may misinterpret sequence or causality from ambiguous notes.