Multi-agent systems outperform single agents on PM tasks when agent roles match task specialisation boundaries

A single general-purpose LLM accumulates error when acting simultaneously as domain expert, planner, executor, and quality checker. Decomposing these roles into specialised agents with narrow objectives reduces error propagation.

The Core Claim

Multiple specialist agents — each with a narrow objective, dedicated tools, and domain-specific context — outperform a single agent trying to handle everything. This is not universally true (overhead exists), but it holds when:

Tasks have distinct information requirements
Tasks have distinct decision criteria
Tasks have distinct failure modes

Evidence

MASAI (Arora et al., 2024): Modular sub-agents achieve 28.33% on SWE-bench Lite vs. single-agent baselines that achieve <15%. Each sub-agent (repository curator, fault localiser, patch generator) has one objective.

MetaGPT (Hong et al., 2023): Role-based agents (PM → Architect → Engineer → QA) with artifact-driven handoffs produce higher-quality software than unstructured agent conversations.

Orchestrating Human-AI Teams (Dorri et al., 2025): GPT-5 as Manager Agent outperforms GPT-4.1 by decomposing tasks proactively. Reactive communication loops underperform.

Application to PUMA

PUMA Stage 1–3 tests the single-agent hypothesis: can one LLM (prompted correctly) match a specialist? This establishes the baseline.

PUMA Stage 5 (Smart PMO) tests the multi-agent hypothesis: does decomposing into (Triage Agent + Estimation Agent + Planning Agent + Risk Agent) + Manager Agent outperform the Stage 1–3 single agents?

Architecture rules for PUMA multi-agent design:

Each agent has exactly one PM task type
Agents share no context directly — only through the Manager Agent
Manager Agent decomposes backlog → assigns tasks → aggregates results
No agent modifies another agent’s output without Manager oversight (HITL principle)

References

MOCs

Additional Links

SP-Architecture — PUMA 7-layer architecture
EX-Stages-Overview — Stage 5 multi-agent plan
PN-ReAct-AgentPattern — Reasoning + acting pattern
PN-SDD-Framework — BMAD multi-agent simulation

PUMA Vault

Explorador

Multi-agent systems outperform single agents on PM tasks when agent roles match task specialisation boundaries

Multi-agent systems outperform single agents on PM tasks when agent roles match task specialisation boundaries

The Core Claim

Evidence

Application to PUMA

References

MOCs

Additional Links

Vista Gráfica

Tabla de Contenidos

Retroenlaces

PUMA Vault

Explorador

Multi-agent systems outperform single agents on PM tasks when agent roles match task specialisation boundaries

Multi-agent systems outperform single agents on PM tasks when agent roles match task specialisation boundaries

The Core Claim

Evidence

Application to PUMA

References

Related Notes

MOCs

Additional Links

Vista Gráfica

Tabla de Contenidos

Retroenlaces