LN: (2024) — LLM-based Multi-Agent Systems for Software Engineering: Vision and Challenges

Bibliographic Reference

Citation: (2024). LLM-based multi-agent systems for software engineering: Vision and challenges. ACM TOSEM. https://doi.org/10.1145/3712003

Pass 1 — Bird’s Eye View (5 Cs)

C	Assessment
Category	Vision/position paper + structured analysis
Context	SE community responds to the proliferation of LLM agent frameworks (MetaGPT, ChatDev, SWE-bench) with a principled analysis
Correctness	Structured literature analysis; challenges grounded in published empirical results
Contributions	(1) Taxonomy of LLM-based MAS roles for SE (Developer, Tester, Reviewer, PM); (2) Challenge inventory: hallucination, coordination overhead, evaluation; (3) Research agenda for SE-specific MAS design
Clarity	Good. Vision papers are inherently less technically precise but conceptually clear.

Relevance: ⭐⭐⭐⭐⭐

This paper frames PUMA’s research contribution within the SE community agenda. PUMA addresses one specific MAS task (PM triage + estimation) from the broader landscape described here.

Pass 2 — Content

MAS Roles in SE

Agent Role	Primary Tasks	PUMA Mapping
Requirements Agent	Extract, clarify, validate requirements	Upstream of PUMA scope
Design Agent	Architecture decisions, API design	BMAD Architect agent
Developer Agent	Code generation, refactoring	Outside PUMA scope
Tester Agent	Test case generation, bug finding	QA agent in BMAD
PM Agent	Issue triage, sprint planning, estimation	PUMA’s core contribution
Reviewer Agent	Code review, documentation	Adjacent to PUMA

Key Challenges Identified

Challenges for PUMA

Hallucination in structured outputs: Agents frequently produce syntactically invalid JSON or misformat classification labels — directly addressed by PUMA’s “Successful Parsing Rate” metric

Coordination overhead: Agent-to-agent communication adds latency — PUMA’s single-agent design in Stages 1-2 avoids this

Evaluation gap: No standard benchmark for PM-specific MAS tasks — PUMA fills this gap for triage + estimation

Reproducibility: MAS experiments are harder to reproduce due to non-determinism — PUMA’s constitution (temp=0, seed=42) directly addresses this

Research Agenda (relevant to PUMA)

Domain-specific agent specialization (PUMA: PM domain)
Hybrid architectures (PUMA: LLM + retrieval)
Human-in-the-loop integration (PUMA Ch.5 discussion)
Standardized evaluation protocols (PUMA contributes one for PM)

PUMA Integration

Ch.2 Literature Review: This paper provides the direct SE community framing for PUMA’s contribution — cite as the primary motivation for an SE-specific PM agent
Research gap: The “evaluation gap” finding justifies PUMA’s experiment design
Architecture: MAS role taxonomy maps to PUMA’s BMAD agent roster → BMAD-Agent-Roster
Challenges → PUMA solutions: Create a table in Ch.5 mapping each challenge to PUMA’s mitigation strategy

LN-Hong-2023-MetaGPT — MetaGPT SE-specific MAS
LN-Qian-2023-ChatDev — ChatDev SE MAS
LN-Jimenez-2023-SWEbench — SE benchmark
PN-MultiAgent-ArchitecturePatterns — MAS architecture synthesis
PR-PUMA-Ch2-Ch3-Ch4-Ch5 — Ch.2 SoA section

PUMA Vault

Explorador

LLM-based Multi-Agent Systems for Software Engineering: Vision and Challenges

LN: (2024) — LLM-based Multi-Agent Systems for Software Engineering: Vision and Challenges

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Content

MAS Roles in SE

Key Challenges Identified

Research Agenda (relevant to PUMA)

PUMA Integration

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces

PUMA Vault

Explorador

LLM-based Multi-Agent Systems for Software Engineering: Vision and Challenges

LN: (2024) — LLM-based Multi-Agent Systems for Software Engineering: Vision and Challenges

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Content

MAS Roles in SE

Key Challenges Identified

Research Agenda (relevant to PUMA)

PUMA Integration

Related Notes

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces