LN: (2023) — Incident Management in the Age of AI: A Survey

Bibliographic Reference

Citation: (2023). Incident management in the age of AI: A survey. arXiv:2312.14411. https://arxiv.org/abs/2312.14411

Pass Status

This note is at Pass 1 — Bird’s Eye only. Full three-pass analysis pending.


Pass 1 — Bird’s Eye View (5 Cs)

CAssessment
CategorySurvey paper
ContextComprehensive survey of AI/ML applications to IT service management incident lifecycle
CorrectnessSurvey methodology; scope covers detection → triage → RCA → mitigation
Contributions(1) Taxonomy of AI techniques per incident phase; (2) Dataset catalogue for incident management ML; (3) Open challenges in automated incident resolution
ClarityTo be assessed on full read

Relevance: ⭐⭐⭐⭐⭐

Directly aligned with PUMA’s core task — issue triage is incident management. This survey provides the state-of-the-art landscape that PUMA’s SLR must cover.


Pass 1 Notes (Preliminary)

Incident Management Lifecycle (AI Applications)

PhaseAI/ML TechniqueRelevance to PUMA
DetectionAnomaly detection, log analysisUpstream of PUMA scope
TriageClassification (type, priority, routing)PUMA H1 core task
Root Cause AnalysisGraph-based, LLM-assistedExtends PUMA Stage 4
MitigationAction recommendation, playbook retrievalSmartPMO Stage 5
Post-Incident ReviewSummary generation, pattern miningFuture PUMA work

Key Metrics in Incident Management

  • MTTD (Mean Time to Detect): time from incident start to alert
  • MTTR (Mean Time to Resolve): time from alert to resolution
  • Triage accuracy (Type, Priority, Assignee F1)
  • SLA compliance rate

PUMA Connection

PUMA Analogy

This survey positions PUMA’s triage task (H1) within the broader incident management lifecycle. PUMA addresses the “Triage” phase — automating issue type, priority, and component classification. The survey provides the academic framing for PUMA’s contribution.

Permanent Notes Generated

Reading Log

MOCs