π¦ ChatGPT β Dataset Preparation Protocol for PUMA
Tool: ChatGPT (OpenAI) Phase: Phase 1 β Research Step: 06 β Experimental Design Methodology: Reproducible Data Engineering
Prompt A β Jira SR Stratified Sample Design
Role: Data engineer designing a reproducible experiment sample.
Context: PUMA uses the Jira Social Repository (Jira SR) for issue triage. The dataset contains issues from Apache projects with priority labels: Blocker, Critical, Major, Minor, Trivial. The distribution is highly imbalanced (Trivial and Minor dominate). We need a 200-issue stratified sample for the experiment.
Objective: Design the stratified sampling procedure for PUMA's triage experiment.
Instructions:
1. Define: stratification variable (priority label), sampling method (proportional stratified random sampling), random seed (seed=42), split rationale (train context for few-shot / test for evaluation).
2. Specify: minimum class representation to ensure F1-macro is meaningful (not dominated by majority class).
3. Generate Python code using pandas + sklearn.model_selection.StratifiedShuffleSplit for: (a) loading the dataset, (b) class distribution report, (c) stratified 200-issue sample, (d) saving with reproducible seed.
4. Define ground truth: what is the "correct" priority label? Is it the assigned priority or the resolved priority?
5. Identify: potential dataset issues (issues re-prioritized after creation, issues without resolution, cross-project labeling inconsistency). State how PUMA handles each.
Format: Sampling procedure description. Python code (complete). Data quality checklist.
Prompt B β TAWOS Dataset Preprocessing
Role: Data scientist preparing a story point estimation dataset.
Context: PUMA uses TAWOS (Tawosi et al.) for effort estimation. TAWOS contains story points from multiple Agile projects. We need to: (1) select projects with sufficient data, (2) clean outliers, (3) create a consistent input format for LLM prompting.
Objective: Design the preprocessing pipeline for PUMA's estimation experiment.
Instructions:
1. Define: project selection criteria (minimum N issues per project, story point scale consistency, time period 2016β2023).
2. Specify: outlier detection and removal method (IQR or z-score). Justify the choice.
3. Design: input prompt template for LLM β what issue fields to include (title, description, acceptance criteria, epic, sprint context) and what to exclude (assignee, resolution date, actual time spent).
4. Define: the historical average baseline (mean story points per project per issue type).
5. Generate Python code for: (a) loading TAWOS, (b) applying filters, (c) outlier removal, (d) creating prompt templates, (e) computing baseline MAE.
Format: Preprocessing pipeline description. Python code (complete). Baseline calculation.
Prompt C β Few-Shot Example Selection Protocol
Role: Prompt engineer designing few-shot example selection.
Context: PUMA's few-shot strategies use 3 or 6 historical examples per issue. The quality of example selection significantly impacts performance. For triage, examples should represent the priority distribution. For estimation, examples should be from the same project/domain.
Objective: Design the few-shot example selection protocol for both PUMA tasks.
Instructions:
1. For triage (3 examples): specify the selection strategy β (a) random from training set, (b) stratified (1 per priority range), (c) semantically similar (nearest neighbor in embedding space). For PUMA Stage 1β2: use fixed 3-example set; for Stage 4: retrieve dynamically from Qdrant.
2. For estimation (3 examples): specify β (a) same project, (b) similar story point range, (c) temporal proximity (most recent sprints first).
3. Generate the few-shot prompt template for triage: [SYSTEM] + [3 EXAMPLES] + [CURRENT ISSUE] + [INSTRUCTION].
4. Generate the few-shot prompt template for estimation: same structure.
5. Reproducibility: how are examples frozen for the baseline experiment (Stage 1β2)?
Format: Selection protocol per task. Python code for example retrieval. Prompt templates (complete, ready to use).
PUMA Relevance
These dataset preparation prompts generate the code and protocols used in PUMAβs scripts/prepare_experiment_dataset.py. The few-shot templates are the actual prompts used in Stages 1β3 of the triage experiment. All outputs feed LN-Datasets-JiraSR-TAWOS.