Date:
Incident summary
Write a summary of the incident in a few sentences. Include a brief description of who discovered the incident, what happened, the severity of the incident and how long the impact lasted.
Breakdown
Timeline: This will constitute the majority of the post-mortem. Start by including important changes in incident status or impact to customers and any major actions taken by responders, engineers, or subject matter experts. Additionally, for each item, include a data source or metric
Analysis: A simple summary of what happened. This should capture the underlying cause of the incident, how many customers were affected, and the overall impact on customers (e.g., what functionality was degraded or affected).
Action items: List the actions that were identified and undertaken during the incident, as well as any necessary follow-up tasks. These action items should be captured in the post-mortem so that they can be assigned later on.
External messaging: Assuming this was a major incident, draft the external messaging to customers, recapping some of the details above.
Stakeholder Recap
Alignment on the timeline. Quickly recap and review the timeline and ensure that everyone is on the same page.
Discussion of how the problem could have been caught. Capture any new action items along the way.
Discussion of customer impact and the external messaging, if needed.
Review and assignment of action items, along with ETAs.