Case study

IR Playbook Library: 10 structured playbooks built for analyst handoff quality

Ten incident response playbooks built around a consistent 7-section template. Every playbook ships with explicit escalation criteria, key analytical questions, and runnable investigation commands — designed so a second analyst can pick up mid-investigation without asking the first analyst what happened.

Verify count: (Get-ChildItem .\incident-response\playbooks -Filter *.md).Count

Context

A playbook that says "investigate the process tree" is not a playbook. It's a reminder that process trees exist. These playbooks were built to answer a specific question at each phase: at triage — is this alert real and what's the scope? At investigation — what commands produce the evidence that answers that question? At containment — what's the exact decision tree for when to isolate vs. not? At documentation — what does the next analyst need to pick this up cold? Structure is the deliverable, not the incident type.

Problem

Every playbook follows the same structure. No exceptions.

1. DETECTION 2. TRIAGE (5 min) 3. INVESTIGATION (30 min) 4. CONTAINMENT 5. ERADICATION 6. RECOVERY 7. DOCUMENTATION
  • DETECTION — rule name, rule ID, log source, exact indicators that fired.
  • TRIAGE — validation checklist, key analytical questions, explicit escalation criteria (pass/fail, not judgment calls). Time-boxed to 5 minutes.
  • INVESTIGATION — copy-paste commands for process review, log correlation, and pivot steps. Time-boxed to 30 minutes for initial scope.
  • CONTAINMENT — decision tree with justification: when to isolate, when not to, what the blast radius is of each choice.
  • ERADICATION — verified removal steps per artifact type: processes, persistence mechanisms, registry keys, files.
  • RECOVERY — validation that threat is gone before restoring connectivity.
  • DOCUMENTATION — structured handoff format, evidence inventory, timeline template.

Approach

Playbook MITRE Severity
LSASS Process AccessT1003.001Critical
Suspicious PowerShellT1059.001High
Ransomware DetectedT1486, T1490, T1489Critical
Brute Force / Auth AttackT1110High
Malware DetectedT1204, T1055High
Privilege EscalationT1068, T1548High
Active Directory CompromiseT1003.006, T1558Critical
Lateral MovementT1021, T1550High
ExfiltrationT1041, T1048High
Supply Chain CompromiseT1195, T1554Critical

Plus a condensed quick-reference checklist covering IR-004 through IR-030 for speed lookups during active incidents.

Evidence

Triage criteria that produce a binary decision — not analyst intuition:

  • Unknown SourceImage accessing lsass.exe → Escalate immediately
  • Multiple systems affected simultaneously → Escalate to Incident Commander
  • Mimikatz, ProcDump, or comsvcs.dll in process name → Critical incident
  • Known security tool (Defender, CrowdStrike) + authorized admin user → document and close

Investigation commands (built in)

Every investigation section ships with runnable commands. No "go look at logs" vagueness:

# LSASS — identify accessing process
Get-Process | Where-Object {
  $_.ProcessName -eq "lsass"
} | Select-Object *

# Ransomware — check for shadow copy deletion
Get-WinEvent -LogName Security |
  Where-Object { $_.Message -like "*vssadmin*" }

# AD Compromise — find replication permission changes
Get-ADUser -Filter * -Properties * |
  Where-Object { $_.MemberOf -contains "Domain Admins" }

Outcome

Critical-incident triage has different defaults than standard triage:

  • Do not shut down affected systems — memory forensics are preserved.
  • Do not pay ransom — FBI guidance, no exceptions in the playbook.
  • Immediately isolate from network before any other step.
  • Check backups offline before any recovery conversation.
  • Identify ransomware variant from ransom note before containment decisions.

These are pre-decided defaults. The analyst under pressure has one job: follow the steps, not re-derive incident response doctrine in the moment.

Verify the count

The playbook count is verified by CI and reproducible from repo root:

# Count playbooks (PowerShell)
(Get-ChildItem .\incident-response\playbooks -Filter *.md).Count

# Expected: 10
# Source of truth: PROOF_PACK/VERIFIED_COUNTS.md

The count is not self-reported. It's generated from file system state by scripts/verify/verify-counts.ps1 on every commit.

Lessons + next hardening step

  • Time-boxed phases — Triage is 5 minutes. Investigation scope is 30 minutes. These are not aspirational; they force the analyst to reach a decision state rather than investigate indefinitely.
  • Explicit non-actions — Ransomware playbook lists what not to do first. Critical incidents often go wrong because someone does something well-intentioned that destroys forensic evidence.
  • Binary escalation criteria — criteria that produce yes/no decisions, not criteria that produce "it depends." Unknown SourceImage → escalate. Period.
  • Commands over descriptions — Every investigation section has runnable commands. An analyst should not have to know the command from memory under pressure.
  • Handoff format — Documentation section is a structured template, not a blank field. The next analyst gets a timeline, evidence inventory, and open questions — not a wall of notes.