Technology & EngineeringIncident Postmortem122 lines

Incident Commander Role

Serve as the incident commander during an active production incident.

Quick Summary18 lines

The incident commander is the role that holds the incident together. Engineers investigate root cause; the IC coordinates the investigation, decides what to communicate to whom, and keeps the team's energy directed. Without an IC, every engineer in the channel is simultaneously coordinating, investigating, and communicating, and all three suffer.

## Key Points

- The incident is SEV-1 or SEV-2 (always)
- The incident has more than three engineers active
- The incident has crossed the 30-minute mark with no clear path to resolution
- Stakeholders outside engineering are starting to ask questions
- What we know
- What we're investigating
- What we've tried that didn't work
- What we're going to try next
- Customer impact (current)
- Current state of the investigation
- What's been ruled out
- What's being tried now

skilldb get incident-postmortem-skills/Incident Commander RoleFull skill: 122 lines

Paste into your CLAUDE.md or agent config

The incident commander is the role that holds the incident together. Engineers investigate root cause; the IC coordinates the investigation, decides what to communicate to whom, and keeps the team's energy directed. Without an IC, every engineer in the channel is simultaneously coordinating, investigating, and communicating, and all three suffer.

The IC role is a discipline. The person playing it is not necessarily the most technical engineer in the room. The IC is the engineer who can hold the situation in their head, decide who is doing what, and keep the response coherent.

When to Name an IC

Name an incident commander when:

The incident is SEV-1 or SEV-2 (always)
The incident has more than three engineers active
The incident has crossed the 30-minute mark with no clear path to resolution
Stakeholders outside engineering are starting to ask questions

For SEV-3 and below, the on-call engineer typically plays both IC and investigator. For SEV-1 and SEV-2, separating the roles is what allows the response to scale.

The IC's Job

The IC has three responsibilities. Hold these clearly; the IC who tries to also investigate root cause will drop one of them.

1. Coordinate the Response

The IC tracks who is doing what. If the team is splitting into investigation streams (one engineer on the database, one on the application, one on the deploy pipeline), the IC writes those assignments down — in the incident channel, in a pinned message, in a shared doc — and keeps the assignments current.

The IC also makes the calls about what to try next. "We've ruled out a deploy. Let's check the database. Alice, take the connection pool. Bob, take query times. Report back in 10 minutes." The IC is making decisions; the engineers are executing.

2. Track Status

The IC keeps a running summary of the incident's state. Every 15 minutes, the IC posts a status update to the incident channel:

What we know
What we're investigating
What we've tried that didn't work
What we're going to try next
Customer impact (current)

This status update has multiple audiences: the engineers who joined the incident late, the leaders who are watching, the support team waiting to update customers. The status update is a courtesy to all of them.

3. Communicate with Stakeholders

The IC is the single voice from the incident to the rest of the company. Engineering leadership wants updates; the IC provides them. Support wants to know what to tell customers; the IC tells them. Marketing wants to know if the status page should be updated; the IC decides.

This single-voice principle prevents the engineers in the channel from being interrupted by leadership questions every five minutes. The IC handles the questions; the engineers stay focused on root cause.

The communication is calibrated. To engineering leadership: "we're investigating, no ETA, will update in 30 minutes." To support: "the symptoms users are seeing are [X]; please tell them we're aware and working on it." To the status page: "Investigating: payment provider degraded for some users." Each audience gets the communication appropriate to them.

What the IC Doesn't Do

The IC does not investigate root cause. The temptation is strong, especially if the IC is technical and has ideas. Resist. The moment the IC starts hypothesizing about the database, they have stopped being the IC.

If the IC has technical ideas, they tell the engineers to consider them. They do not pursue them personally.

The IC also does not type into the production system. They do not run commands. They do not deploy. The IC's job is to coordinate; the moment they take action they are an investigator and someone else needs to be the IC.

The Handoff

Long incidents need IC handoffs. After two to three hours, the IC's judgment degrades. They have been holding too much state for too long. Hand off to a fresh IC.

The handoff is a structured artifact:

Current state of the investigation
What's been ruled out
What's being tried now
Outstanding decisions the new IC will need to make
Communication threads that are open with stakeholders

Take 10 minutes for the handoff. The new IC reads the running summary, asks questions, and confirms they have the picture before the old IC drops off. Bad handoffs lose state; the new IC re-investigates things the old team already ruled out, and progress regresses.

The Decision Authority

The IC is the decision-maker during the incident. Engineers debating whether to roll back, restart a service, page another team, or update the status page bring the decision to the IC. The IC decides.

This authority is granted explicitly when the IC role is named. "Carol is IC; her decisions stand." Without explicit authority, decisions get debated by committee in the channel and the response slows.

The IC's decisions can be appealed to the on-call manager or to the engineering leadership only after the incident, in the postmortem. During the incident, the IC's call is final.

The Calm Voice

The IC's voice is calm in writing and in audio. Not artificially calm; not so calm it suggests the IC doesn't grasp the severity. But not panicked. The team mirrors the IC's tone; if the IC is anxious, the team becomes anxious; if the IC is collected, the team is collected.

The calm voice is harder during a SEV-1 at 03:00 AM than at 14:00. Practice it. The IC who has been in this seat before is more valuable than the IC who knows the system better but has never run an incident.

The IC's Notes

The IC keeps detailed notes during the incident. Timestamps, decisions, what was tried, what people said. The notes feed directly into the postmortem timeline. Without good IC notes, the postmortem author has to reconstruct from chat logs after the fact, and details are lost.

The notes are also a hedge against the IC's own forgetting. By the time the IC hands off or the incident is over, they will not remember which thing was tried at which time. The notes are the memory.

After the Incident

After resolution, the IC is responsible for:

Confirming the incident is fully resolved (not just mitigated)
Closing the incident channel with a final summary
Initiating the postmortem (or naming an author)
Thanking the engineers who responded

The thank-you is not a courtesy; it is part of building the team's incident response culture. Engineers who feel acknowledged for stepping up to incidents continue to step up. Engineers who don't, don't.

Anti-Patterns

IC investigates root cause. The IC starts typing commands. They have stopped being the IC. Find another IC or hand off.

No status updates. Engineering leadership is asking what's happening every 5 minutes because they don't have the running summary. Post status updates; the questions stop.

Multiple voices to stakeholders. Engineers in the channel are individually replying to leadership questions. The signal is fragmented. Route through the IC.

No handoff at the 3-hour mark. The IC's judgment is degrading. They are still on the call. Hand off; pick up the pace again.

Decisions by committee. The team debates whether to roll back. The IC's job is to decide. They decide; the team executes.

Panicked IC. The team mirrors the IC's tone. If the IC is anxious, the response is anxious. Train for calm in writing.

Install this skill directly: skilldb add incident-postmortem-skills

Get CLI access →

Incident Commander Role

When to Name an IC

The IC's Job

1. Coordinate the Response

2. Track Status

3. Communicate with Stakeholders

What the IC Doesn't Do

The Handoff

The Decision Authority

The Calm Voice

The IC's Notes

After the Incident

Anti-Patterns

Related Skills

Customer Communication During Incidents

Incident Response Runbooks

Incident Severity Classification

Writing Blameless Postmortems

Adversarial Code Review

API Design Testing