πŸ” Consensus β€” Evidence-Based Search for PUMA

Tool: Consensus (https://consensus.app) Phase: Phase 1 β€” Research Step: 01 β€” Literature Exploration Methodology: Evidence-Based Research


Prompt A β€” Triage Effectiveness

What is the empirical evidence on the effectiveness of LLM agents for issue triage and ticket classification in software project management? Focus on studies from 2023 to 2026.

Prompt B β€” Effort Estimation with LLMs

Are large language models effective for story point estimation and software effort prediction in Agile projects? What metrics are used and what are the main findings?

Prompt C β€” Multi-Agent vs Single-Agent Performance

Do multi-agent systems outperform single-agent systems in software engineering tasks such as issue classification, prioritization, and project risk detection?

Prompt D β€” Reproducibility in LLM Research

What are the main reproducibility challenges in studies evaluating LLM-based agents for software engineering tasks? What datasets and protocols are most commonly used?

Prompt E β€” Human-in-the-Loop Governance

What evidence exists on the effectiveness of human-in-the-loop supervision and governance mechanisms in AI agent systems for project management?

Usage Notes

Consensus provides a β€œConsensus Meter” showing the level of agreement across papers. Use the filter: Empirical Studies only. Export citations to Zotero. The evidence quality indicator is crucial for PUMA’s statistical validation section.


PUMA Relevance

Consensus directly validates PUMA’s H1 and H2 hypotheses by surfacing existing empirical evidence. The Consensus Meter output can be cited in Section 2 (Materials and Methods) to justify the experimental design choices.


MOCs