πŸ“Š Tools β€” Datasets, Benchmarks & Data Access

Overview

All datasets and data access platforms used in PUMA experiments. Reproducibility principle: all datasets have stable DOIs or permanent URLs.


Primary Experiment Datasets

Jira Social Repository (Jira SR)

  • Source: Zenodo β€” DOI: 10.5281/zenodo.5901893
  • URL: https://zenodo.org/record/5901893
  • Alternative URL: https://mcislab.github.io/publications/2015/ortu_promise.pdf
  • Content: 50,000+ real Jira issues from Apache Software Foundation projects (Kafka, Spark, Hadoop, etc.) with manually assigned priority labels (Blocker/Critical/Major/Minor/Trivial)
  • License: CC BY 4.0
  • Phase: F2 – F4 (Stage 1: triage benchmark)
  • PUMA use: PRIMARY dataset for Stage 1 triage evaluation; ground truth = human-assigned priority labels verified by Apache community
  • Sampling strategy: Stratified sample of 200 issues (50 per priority class), seed=42, StratifiedShuffleSplit
  • Key statistics: ~60% Major, ~25% Minor, ~10% Critical, ~5% Trivial/Blocker β†’ imbalanced β†’ F1-macro required
  • Paper: Ortu, M., et al. (2015). MSR 2015. β†’ LN-Datasets-JiraSR-TAWOS

TAWOS (The Agile Workflow Optimisation Suite)

  • Source: figshare β€” DOI: 10.5522/04/21308124
  • GitHub: https://github.com/SOLAR-group/TAWOS
  • Content: 23,000+ real Agile user stories from 26 open-source projects with developer-assigned story point estimates
  • License: Apache 2.0
  • Phase: F3 (Stage 2: effort estimation benchmark)
  • PUMA use: PRIMARY dataset for Stage 2 story point estimation; ground truth = team planning estimates
  • Key projects: MESOS (baseline MAE ~2.9 SP), APSTUD (baseline MAE ~3.8 SP), XD (baseline MAE ~4.1 SP)
  • Paper: Tawosi, V., et al. (2022). MSR 2022. β†’ LN-Datasets-JiraSR-TAWOS

Comparative Benchmark Datasets

SWE-bench


Supplementary Data Sources

GitHub Archive

  • URL: https://www.gharchive.org / BigQuery public dataset
  • Content: Public GitHub events: commits, PRs, issues, releases (projects: Apache, VSCode, Linux)
  • Phase: F3 (optional β€” Stage 5 component)
  • PUMA use (optional): Development activity telemetry for Smart PMO Stage 5; commit velocity analysis; PR merge time tracking
  • Access: Via Google BigQuery (free tier: 1TB/month queries); direct download via gharchive.org

PROMISE Repository

  • URL: http://openscience.us/repo
  • Content: NASA software quality metrics datasets (JM1, PC1, KC1) with defect labels
  • Phase: F3 (optional)
  • PUMA use (optional): Input features for risk prediction model component; defect prediction baselines
  • Note: Optional Stage 3+ extension; not required for MVP

Data Processing Stack

pandas

  • Install: pip install pandas
  • PUMA use: Loading Jira SR CSV; stratified sampling; results table generation
  • Key operations: StratifiedShuffleSplit, groupby, value_counts, DataFrame to LaTeX tables

scikit-learn

  • Install: pip install scikit-learn
  • PUMA use:
    • StratifiedShuffleSplit for balanced 200-issue sample
    • f1_score(average='macro') for triage metric
    • TfidfVectorizer + SVC for baseline classifier
    • confusion_matrix for error analysis

scipy

  • Install: pip install scipy
  • PUMA use: scipy.stats.wilcoxon for H1/H2 statistical tests (two-sided, Ξ±=0.05)

matplotlib + seaborn

  • Install: pip install matplotlib seaborn
  • PUMA use: F1-macro bar charts by condition; MAE comparison plots; confusion matrices; carbon footprint bar charts

MOCs