LN: Spichkova (2025) — Cognitive Agents for Agile PM
Bibliographic Reference
Citation: Spichkova, M., Georgievski, I., & Čizmić, B. (2025). Cognitive agents for Agile software project management. In Proceedings of EASE 2025 [Preprint]. arXiv:2508.16678.
Pass 1 Summary (5 Cs)
| C | Assessment |
|---|---|
| Category | Empirical evaluation + system proposal |
| Context | Builds on CoGEE; related to PM+LLM emerging body of work |
| Correctness | Uses proprietary models (GPT-4) — limited reproducibility |
| Contributions | Evaluates 5 Agile PM tasks with LLM agents. Shows CoT helps for structured tasks. |
| Clarity | Good. Task definitions clear. |
Relevance: ⭐⭐⭐⭐⭐ (5/5)
Direct competitor / predecessor to PUMA
Pass 2 Key Points
Five PM tasks evaluated:
- Story point estimation
- Issue triage (priority classification)
- Sprint planning
- Risk identification
- Status reporting
Key limitation (relevant to PUMA): Uses GPT-4 via API. Not reproducible without payment. No systematic prompting strategy comparison. No carbon measurement.
PUMA’s differentiation:
- Local models (no API cost)
- Systematic 4-strategy prompting comparison
- CodeCarbon measurement
- Open-source reproducible benchmark
Metrics used: F1 (triage), MAE (estimation), BLEU/ROUGE (text tasks). PUMA uses same for consistency.
PUMA Integration
Used in: Section 1.1 (gap: reproducibility + systematic prompting), SLR evidence table
Supports: Research gap justification (Limitation 2: no systematic prompting strategy comparison)
Pass 3 TODO
- Reconstruct their triage prompting approach
- Compare their F1 results to PUMA baselines when available
- Generate permanent note: insight about task-specific prompt design
🔗 Connected Notes
Superseded by: LN-Cinkusz-2025-CognitiveAgentsAgilePM (corrected author metadata — Cinkusz is first author)
Permanent notes:
- PN-IssueTriage-StoryPoints — core PM tasks evaluated
- PN-CoT-FewShot-Prompting — CoT effectiveness shown here
- PN-KeyConcepts-Agents-Reproducibility-RedTeam — reproducibility gap this paper exhibits
PUMA project:
- PR-PUMA-Ch1-Introduction — cited as Gap 2 (systematic prompting)
- EX-Hypotheses-H1-H2 — F1 and MAE baselines
- LN-Datasets-JiraSR-TAWOS — datasets used
MOCs: