πŸ“‹ PUMA β€” Product Requirements Document (PRD)

Overview

BMAD Phase 2 artefact. This PRD defines WHAT the system must do and WHY. Reviewed by: Advisor (Architect role). Approved: 2026-04-01. Downstream: SP-Architecture


1. Problem Statement

ICT project management organisations face three measurable, high-cost operational problems:

ProblemEvidenceImpact
Manual issue triageOrtu et al. (2015): 50k+ Jira issues show systematic cognitive bias in priority assignmentLatency + inconsistency
Inconsistent effort estimationTawosi et al. (2022): teams systematically over/under-estimate on TAWOS datasetPlanning debt
Uniqueness TrapFlyvbjerg & Gardner (2023): managers treat each project as unique, blocking statistical learning45% average cost overrun

No existing solution combines: (1) reproducible local LLM evaluation, (2) systematic prompting strategy comparison, (3) environmental impact measurement.


2. Target Users

Primary: ICT project managers in multi-project organisations (consultancies, MSPs, software teams)
Academic: SE researchers studying LLM capabilities for PM tasks
Secondary: Open-source community wanting a replicable PM+LLM benchmark


3. Product Goals

G1: Reproducible benchmark β€” any researcher with standard hardware can replicate results
G2: Empirical evidence β€” statistically validated claims about model Γ— strategy effects
G3: Environmental accountability β€” gCOβ‚‚eq measured per experimental condition
G4: Practical tool β€” issue classification module usable in real PM contexts


4. Objectives (OE1–OE8)

IDObjectiveMetricDeadline
OE1SLR: β‰₯40 references, comparative tableTable with gap mappingPEC1 βœ…
OE2Define H1 + H2 with full operationalisationFalsifiable hypothesesPEC1 βœ…
OE3Download + prepare Jira SR + TAWOS datasetsReproducible scripts, bias docPEC2
OE4Triage module: 4 strategies Γ— 2 modelsF1-macro table + WilcoxonPEC2
OE5Estimation module: TAWOS + baselinesMAE vs Deep-SE / CoGEEPEC3
OE6Full experiment: stats + CodeCarbonp-values, CI, gCOβ‚‚eq tablePEC4
OE7Discuss results vs H1/H2 + limitationsDiscussion chapterPEC4
OE8Publish GitHub MIT repo with docsREADME + notebook + v1.0 tagPEC5

5. MVP Definition (Strategy C β€” Guaranteed)

MVP = Triage module (Stage 1) with:

  • β‰₯1 model achieving F1-macro β‰₯ 0.55 (or documented failure with analysis)
  • Wilcoxon test with p-value and effect size
  • CodeCarbon measurement per condition
  • Reproducible code in public GitHub

MVP is complete and independent academic contribution even if Stage 2 not reached.


6. Target Metrics

MetricDefinitionMinimumDesired
F1-macro (triage)Macro-averaged F1 on Jira SR priority classesβ‰₯ 0.55β‰₯ 0.70
MAE (estimation)Mean Absolute Error vs TAWOS ground truth≀ 3.0 SP≀ 1.5 SP
LatencyTime per LLM call on 16GB RAM CPU< 60s< 20s
ReproducibilityClean environment β†’ identical results100%100%

7. Non-Requirements (Out of Scope for MVP)

  • Real-time integration with live Jira instances
  • Paid API models (GPT-4, Claude Opus) in experiments
  • Fine-tuning or RAG (Stage 4 β€” optional)
  • Multi-agent Smart PMO (Stage 5 β€” optional / future work)
  • Mobile or web interface

8. Technical Constraints

  • Hardware: ≀ 16GB RAM, CPU only (no GPU required)
  • Models: open-weights, local via Ollama (Llama 3.2 8B, Mistral 7B)
  • Datasets: public, DOI-stable (Jira SR, TAWOS), CC + Apache-2.0 licences
  • Reproducibility: seed=42, temperature=0, fixed requirements.txt
  • Language: Python 3.11+
  • License: MIT

9. Risks and Mitigations

RiskProbabilityImpactMitigation
Inference latency > 60sMediumHighFallback to Phi-3.5 Mini 3.8B
F1 < 0.55 for all conditionsLowMediumDeeper error analysis, additional baselines
Dataset access compromisedVery LowHighLocal archive before experiments start
Time overrun in F2MediumMedium20% contingency buffer; rollback to Stage 1 only

10. Downstream Artefacts

From this PRD:


Datasets: LN-Datasets-JiraSR-TAWOS β€” Jira SR + TAWOS (OE3) Literature: LN-KeyPapers-CoGEE-Angermeir-Flyvbjerg β€” evidence for problem statement (Β§1) Reproducibility: PN-KeyConcepts-Agents-Reproducibility-RedTeam β€” reproducibility protocol (G1, Β§8) Uniqueness Trap: PN-KeyConcepts-Agents-Reproducibility-RedTeam (Uniqueness Trap section β€” Β§1 problem 3) PM Tasks: PN-IssueTriage-StoryPoints β€” F1-macro + MAE tasks (Β§6) Prompting: PN-CoT-FewShot-Prompting β€” 4 prompting strategies (OE4) Smart PMO: Smart-PMO-Vision β€” Stage 5 out-of-scope but future work BMAD Roster: BMAD-Agent-Roster MOC: MOC-PUMA-Master


PRD v1.2 β€” Approved 2026-04-01 β€” Next review: PEC2 delivery