LN: Cinkusz et al. (2025) — Cognitive Agents Powered by LLMs for Agile Software PM

Bibliographic Reference

Citation: Cinkusz, K., Barański, M., Brodowski, M., Kowalczyk, R., & Spichkova, M. (2025). Cognitive agents powered by large language models for agile software project management. arXiv:2508.16678. EASE 2025. https://arxiv.org/abs/2508.16678

Important Note

Overview

The bibliography entry lists “Spichkova, M., Georgievski, I., & Čizmić, B.” as the author list. The verified first author is Konrad Cinkusz. Spichkova is the last author. Georgievski and Čizmić do not appear in the verified paper.

(Also exists in vault as LN-Spichkova-2025-CognitiveAgents — this note supersedes with corrected author metadata.)

Pass 1 — Bird’s Eye View (5 Cs)

C	Assessment
Category	Empirical evaluation + system proposal
Context	Evaluates LLM agents on 5 Agile PM tasks using GPT-4 and others
Correctness	Uses proprietary models — limited reproducibility
Contributions	(1) Evaluation of 5 Agile PM tasks (triage, estimation, sprint planning, risk, reporting); (2) CoT improves structured task performance; (3) First systematic multi-task PM benchmark with agents
Clarity	Good. Tasks clearly defined.

Relevance: ⭐⭐⭐⭐⭐

Direct predecessor to PUMA. PUMA’s three differentiating contributions address the three limitations of this paper.

Pass 3 — Virtual Reconstruction

Three limitations PUMA addresses (cited from this paper’s gaps):

Reproducibility: Cinkusz et al. use GPT-4 via paid API. Results cannot be reproduced without payment and may change with model updates. PUMA uses Ollama + local models + seed=42.
No systematic prompting comparison: The paper tests CoT but does not systematically compare zero-shot vs. few-shot-3 vs. few-shot-6 vs. CoT. PUMA’s 4-strategy design fills this gap.
No environmental measurement: No carbon footprint data. PUMA integrates CodeCarbon as a first-class experimental variable.

Q3 (What if?): What if we apply the same 5-task evaluation framework but with local models? PUMA’s Stage 1–3 covers 3 of those 5 tasks, enabling direct comparison with this paper’s results.

PUMA Integration

Section 1.1: Three limitations → PUMA’s three contributions → PR-PUMA-Ch1-Introduction
Section 2 (SLR): Key comparison paper in evidence table → MOC-Literature-Review
Results: F1-macro and MAE results directly comparable → EX-Hypotheses-H1-H2
Reproducibility gap: Underpins Gap 1 → LN-Angermeir-2025-Reproducibility

PN-IssueTriage-StoryPoints — core tasks this paper evaluates
PN-CoT-FewShot-Prompting — prompting strategies compared
PN-KeyConcepts-Agents-Reproducibility-RedTeam — reproducibility gap
PN-LLM-Local-vs-Cloud — local vs cloud gap
LN-Spichkova-2025-CognitiveAgents — related note (overlapping authorship)
SP-Architecture — PUMA’s architecture addresses these gaps

PUMA Vault

Explorador

Cognitive Agents Powered by Large Language Models for Agile Software Project Management

LN: Cinkusz et al. (2025) — Cognitive Agents Powered by LLMs for Agile Software PM

Pass 1 — Bird’s Eye View (5 Cs)

Pass 3 — Virtual Reconstruction

PUMA Integration

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces

PUMA Vault

Explorador

Cognitive Agents Powered by Large Language Models for Agile Software Project Management

LN: Cinkusz et al. (2025) — Cognitive Agents Powered by LLMs for Agile Software PM

Pass 1 — Bird’s Eye View (5 Cs)

Pass 3 — Virtual Reconstruction

PUMA Integration

Related Permanent Notes

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces