LN: Cinkusz et al. (2025) — Cognitive Agents Powered by LLMs for Agile Software PM

Bibliographic Reference

Citation: Cinkusz, K., Barański, M., Brodowski, M., Kowalczyk, R., & Spichkova, M. (2025). Cognitive agents powered by large language models for agile software project management. arXiv:2508.16678. EASE 2025. https://arxiv.org/abs/2508.16678

Important Note

Overview

The bibliography entry lists “Spichkova, M., Georgievski, I., & Čizmić, B.” as the author list. The verified first author is Konrad Cinkusz. Spichkova is the last author. Georgievski and Čizmić do not appear in the verified paper.

(Also exists in vault as LN-Spichkova-2025-CognitiveAgents — this note supersedes with corrected author metadata.)


Pass 1 — Bird’s Eye View (5 Cs)

CAssessment
CategoryEmpirical evaluation + system proposal
ContextEvaluates LLM agents on 5 Agile PM tasks using GPT-4 and others
CorrectnessUses proprietary models — limited reproducibility
Contributions(1) Evaluation of 5 Agile PM tasks (triage, estimation, sprint planning, risk, reporting); (2) CoT improves structured task performance; (3) First systematic multi-task PM benchmark with agents
ClarityGood. Tasks clearly defined.

Relevance: ⭐⭐⭐⭐⭐

Direct predecessor to PUMA. PUMA’s three differentiating contributions address the three limitations of this paper.


Pass 3 — Virtual Reconstruction

Three limitations PUMA addresses (cited from this paper’s gaps):

  1. Reproducibility: Cinkusz et al. use GPT-4 via paid API. Results cannot be reproduced without payment and may change with model updates. PUMA uses Ollama + local models + seed=42.

  2. No systematic prompting comparison: The paper tests CoT but does not systematically compare zero-shot vs. few-shot-3 vs. few-shot-6 vs. CoT. PUMA’s 4-strategy design fills this gap.

  3. No environmental measurement: No carbon footprint data. PUMA integrates CodeCarbon as a first-class experimental variable.

Q3 (What if?): What if we apply the same 5-task evaluation framework but with local models? PUMA’s Stage 1–3 covers 3 of those 5 tasks, enabling direct comparison with this paper’s results.


PUMA Integration

MOCs