π PUMA β Product Requirements Document (PRD)
Overview
BMAD Phase 2 artefact. This PRD defines WHAT the system must do and WHY. Reviewed by: Advisor (Architect role). Approved: 2026-04-01. Downstream: SP-Architecture
1. Problem Statement
ICT project management organisations face three measurable, high-cost operational problems:
| Problem | Evidence | Impact |
|---|---|---|
| Manual issue triage | Ortu et al. (2015): 50k+ Jira issues show systematic cognitive bias in priority assignment | Latency + inconsistency |
| Inconsistent effort estimation | Tawosi et al. (2022): teams systematically over/under-estimate on TAWOS dataset | Planning debt |
| Uniqueness Trap | Flyvbjerg & Gardner (2023): managers treat each project as unique, blocking statistical learning | 45% average cost overrun |
No existing solution combines: (1) reproducible local LLM evaluation, (2) systematic prompting strategy comparison, (3) environmental impact measurement.
2. Target Users
Primary: ICT project managers in multi-project organisations (consultancies, MSPs, software teams)
Academic: SE researchers studying LLM capabilities for PM tasks
Secondary: Open-source community wanting a replicable PM+LLM benchmark
3. Product Goals
G1: Reproducible benchmark β any researcher with standard hardware can replicate results
G2: Empirical evidence β statistically validated claims about model Γ strategy effects
G3: Environmental accountability β gCOβeq measured per experimental condition
G4: Practical tool β issue classification module usable in real PM contexts
4. Objectives (OE1βOE8)
| ID | Objective | Metric | Deadline |
|---|---|---|---|
| OE1 | SLR: β₯40 references, comparative table | Table with gap mapping | PEC1 β |
| OE2 | Define H1 + H2 with full operationalisation | Falsifiable hypotheses | PEC1 β |
| OE3 | Download + prepare Jira SR + TAWOS datasets | Reproducible scripts, bias doc | PEC2 |
| OE4 | Triage module: 4 strategies Γ 2 models | F1-macro table + Wilcoxon | PEC2 |
| OE5 | Estimation module: TAWOS + baselines | MAE vs Deep-SE / CoGEE | PEC3 |
| OE6 | Full experiment: stats + CodeCarbon | p-values, CI, gCOβeq table | PEC4 |
| OE7 | Discuss results vs H1/H2 + limitations | Discussion chapter | PEC4 |
| OE8 | Publish GitHub MIT repo with docs | README + notebook + v1.0 tag | PEC5 |
5. MVP Definition (Strategy C β Guaranteed)
MVP = Triage module (Stage 1) with:
- β₯1 model achieving F1-macro β₯ 0.55 (or documented failure with analysis)
- Wilcoxon test with p-value and effect size
- CodeCarbon measurement per condition
- Reproducible code in public GitHub
MVP is complete and independent academic contribution even if Stage 2 not reached.
6. Target Metrics
| Metric | Definition | Minimum | Desired |
|---|---|---|---|
| F1-macro (triage) | Macro-averaged F1 on Jira SR priority classes | β₯ 0.55 | β₯ 0.70 |
| MAE (estimation) | Mean Absolute Error vs TAWOS ground truth | β€ 3.0 SP | β€ 1.5 SP |
| Latency | Time per LLM call on 16GB RAM CPU | < 60s | < 20s |
| Reproducibility | Clean environment β identical results | 100% | 100% |
7. Non-Requirements (Out of Scope for MVP)
- Real-time integration with live Jira instances
- Paid API models (GPT-4, Claude Opus) in experiments
- Fine-tuning or RAG (Stage 4 β optional)
- Multi-agent Smart PMO (Stage 5 β optional / future work)
- Mobile or web interface
8. Technical Constraints
- Hardware: β€ 16GB RAM, CPU only (no GPU required)
- Models: open-weights, local via Ollama (Llama 3.2 8B, Mistral 7B)
- Datasets: public, DOI-stable (Jira SR, TAWOS), CC + Apache-2.0 licences
- Reproducibility: seed=42, temperature=0, fixed requirements.txt
- Language: Python 3.11+
- License: MIT
9. Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Inference latency > 60s | Medium | High | Fallback to Phi-3.5 Mini 3.8B |
| F1 < 0.55 for all conditions | Low | Medium | Deeper error analysis, additional baselines |
| Dataset access compromised | Very Low | High | Local archive before experiments start |
| Time overrun in F2 | Medium | Medium | 20% contingency buffer; rollback to Stage 1 only |
10. Downstream Artefacts
From this PRD:
- SP-Architecture β architecture spec
- SP-Triage-Agent β triage agent spec
- EX-Hypotheses-H1-H2 β experiment design
- Sprint-02 β sprint tasks
Related Notes
Datasets: LN-Datasets-JiraSR-TAWOS β Jira SR + TAWOS (OE3) Literature: LN-KeyPapers-CoGEE-Angermeir-Flyvbjerg β evidence for problem statement (Β§1) Reproducibility: PN-KeyConcepts-Agents-Reproducibility-RedTeam β reproducibility protocol (G1, Β§8) Uniqueness Trap: PN-KeyConcepts-Agents-Reproducibility-RedTeam (Uniqueness Trap section β Β§1 problem 3) PM Tasks: PN-IssueTriage-StoryPoints β F1-macro + MAE tasks (Β§6) Prompting: PN-CoT-FewShot-Prompting β 4 prompting strategies (OE4) Smart PMO: Smart-PMO-Vision β Stage 5 out-of-scope but future work BMAD Roster: BMAD-Agent-Roster MOC: MOC-PUMA-Master
PRD v1.2 β Approved 2026-04-01 β Next review: PEC2 delivery