π Elicit β Structured Data Extraction for PUMA SLR
Tool: Elicit (https://elicit.org) Phase: Phase 1 β Research Step: 03 β Structured Extraction Methodology: SLR / PRISMA Data Extraction
Prompt A β Core Extraction Schema
For each paper in the collection, extract the following fields in a structured table:
1. Research question / main objective
2. Study type (empirical, survey, position paper, benchmark)
3. AI/LLM method used (model name, framework, local/API)
4. Dataset name and size (number of issues, projects, samples)
5. Task performed (triage, estimation, risk detection, planning, other)
6. Evaluation metrics reported (F1, MAE, Precision, Recall, Accuracy, other)
7. Key numerical result (best metric value achieved)
8. Baseline compared against (heuristic, ML, prior LLM, human)
9. Reproducibility (code available Y/N, data available Y/N, seed reported Y/N)
10. Main limitation stated by authors
11. Relevance to PUMA (High/Medium/Low with one-sentence justification)
Prompt B β Metrics-Focused Extraction
From each paper, extract only the quantitative evaluation results: (1) metric name, (2) metric value for the proposed method, (3) metric value for the baseline, (4) relative improvement percentage, (5) statistical significance test used (if any), (6) whether p-value < 0.05 was achieved. Return as a compact table for cross-study comparison.
Prompt C β Dataset Registry Extraction
Identify all datasets mentioned across the literature collection. For each dataset: (1) name, (2) source (GitHub, Jira, synthetic, etc.), (3) size, (4) task it was used for, (5) whether it is publicly available, (6) URL or citation for access, (7) relevance to PUMA tasks (triage/estimation). Flag datasets that PUMA could directly reuse.
Prompt D β Prompt Strategy Extraction
From papers that describe using LLMs for issue triage or effort estimation, extract the prompt engineering strategy used: (1) zero-shot, (2) few-shot (how many examples?), (3) chain-of-thought, (4) retrieval-augmented, (5) fine-tuned, (6) system prompt described Y/N, (7) examples of actual prompts if available. Create a comparison table of strategies vs. performance outcomes.
Integration with PUMA PRISMA Protocol
After extraction:
- Populate PRISMA-Log with extracted data
- Use metric values to set PUMA baseline targets (F1-macro β₯ 0.55, MAE β€ 3.0)
- Dataset registry feeds LN-Datasets-JiraSR-TAWOS
PUMA Relevance
Elicitβs automatic extraction eliminates the need to manually read all 40+ papers in the SLR corpus. The metrics extraction directly informs the justification for PUMAβs success thresholds. The dataset registry prevents redundant data collection.