π Sprint 02: Triage Module + Baseline
Goal: Functional triage module with 4 prompting strategies evaluated on Jira SR subset (200 issues, stratified), baseline implemented, Wilcoxon test applied. Period: 2026-03-29 β 2026-04-08 PEC2 Deadline: 2026-04-08
π Backlog
- Download and verify Jira SR dataset (DOI: 10.5281/zenodo.5901893) π 2026-03-30
- Implement stratified sampling script (50 issues Γ 4 priority classes = 200 total, seed=42) π 2026-03-31
- Implement heuristic baseline (majority class + TF-IDF+SVM) π 2026-04-01
- Implement Zero-Shot prompting template for triage agent π 2026-04-01
- Implement Few-Shot-3 prompting template π 2026-04-02
- Implement Few-Shot-6 prompting template π 2026-04-02
- Implement Chain-of-Thought (CoT) prompting template π 2026-04-02
- Run benchmark: Llama 3.2 8B Γ 4 strategies Γ 200 issues (temperature=0, seed=42) π 2026-04-03
- Run benchmark: Mistral 7B Γ 4 strategies Γ 200 issues π 2026-04-04
- Generate F1-macro / Precision / Recall results table π 2026-04-05
- Run Wilcoxon signed-rank test (Ξ±=0.05), compute effect size r π 2026-04-05
- Write error analysis by issue type (qualitative) π 2026-04-06
- Integrate CodeCarbon measurement per condition π 2026-04-06
- Update Ch.2 + Ch.3 with methodology description π 2026-04-07
- Commit reproducible code to GitHub with seed=42, requirements.txt fixed π 2026-04-07
- Final review + PEC2 submission π 2026-04-08
π In Progress
- Validate Ollama inference environment (Llama 3.2 8B + Mistral 7B running, latency < 60s) π 2026-03-30
ποΈ Review
- Architecture spec updated (SP-Architecture-v1) β awaiting advisor feedback
β Done
- Environment setup: Ollama + models + test inference log β 2026-03-08
- PEC1: Chapter 1 complete β 2026-03-08
- H1 + H2 hypotheses formalised β 2026-03-08
- SLR initial: β₯40 references reviewed β 2026-03-08
π Sprint Metrics
| Metric | Value |
|---|---|
| Total tasks | 16 backlog + 1 in progress |
| Completed | 4 |
| Carried over from Sprint 1 | 0 |
| Target velocity | 16 tasks / 10 days |
π Sprint 1 Retrospective
What went well: PEC1 delivered on time with strong theoretical framing. 40+ references reviewed. Hypotheses formally falsifiable.
What to improve: Start coding earlier. Set up CodeCarbon before running any experiments.
Sprint 2 focus: Execution. Every day should produce measurable output (code / results / text).
π Related
- EX-Hypotheses-H1-H2 β H1 formal definition
- SP-Triage-Agent β Triage spec
- PT-PUMA-Experiment-Prompts β Prompting strategies
- Carbon-Tracking-Log β CodeCarbon log