π Consensus β Evidence-Based Search for PUMA
Tool: Consensus (https://consensus.app) Phase: Phase 1 β Research Step: 01 β Literature Exploration Methodology: Evidence-Based Research
Prompt A β Triage Effectiveness
What is the empirical evidence on the effectiveness of LLM agents for issue triage and ticket classification in software project management? Focus on studies from 2023 to 2026.
Prompt B β Effort Estimation with LLMs
Are large language models effective for story point estimation and software effort prediction in Agile projects? What metrics are used and what are the main findings?
Prompt C β Multi-Agent vs Single-Agent Performance
Do multi-agent systems outperform single-agent systems in software engineering tasks such as issue classification, prioritization, and project risk detection?
Prompt D β Reproducibility in LLM Research
What are the main reproducibility challenges in studies evaluating LLM-based agents for software engineering tasks? What datasets and protocols are most commonly used?
Prompt E β Human-in-the-Loop Governance
What evidence exists on the effectiveness of human-in-the-loop supervision and governance mechanisms in AI agent systems for project management?
Usage Notes
Consensus provides a βConsensus Meterβ showing the level of agreement across papers. Use the filter: Empirical Studies only. Export citations to Zotero. The evidence quality indicator is crucial for PUMAβs statistical validation section.
PUMA Relevance
Consensus directly validates PUMAβs H1 and H2 hypotheses by surfacing existing empirical evidence. The Consensus Meter output can be cited in Section 2 (Materials and Methods) to justify the experimental design choices.