Jira Social Repository (Jira SR)
One-sentence summary: A publicly available dataset of 50,000+ Jira issues from Apache Software Foundation projects, with manually assigned priority labels, used in PUMA as the ground truth for issue triage evaluation.
📋 Dataset Description
| Property | Value |
|---|---|
| Size | ~50,000 issues |
| Source | Apache Software Foundation projects |
| Label | Priority (Critical / High / Medium / Low) |
| Format | CSV with issue title, description, comments, priority |
| Period | 2002–2014 |
| Projects | Apache Hadoop, Spark, Kafka, Cassandra, and others |
| Download | zenodo.org/records/5901893 |
| License | CC BY 4.0 |
📊 Class Distribution
| Priority | Count (approx.) | % |
|---|---|---|
| Critical | ~4,000 | 8% |
| High | ~11,000 | 22% |
| Medium | ~22,500 | 45% |
| Low | ~12,500 | 25% |
Imbalance note: PUMA creates a balanced evaluation subset of 200 issues (50 per class) using stratified random sampling with seed=42.
🔬 Preparation for PUMA
# Dataset preparation script (reproducible)
import pandas as pd
from sklearn.model_selection import train_test_split
# Load full dataset
df = pd.read_csv("jira_sr.csv")
# Filter to 4-class priority
df = df[df['priority'].isin(['Critical', 'High', 'Medium', 'Low'])]
# Stratified sample: 50 per class, seed=42
subset = df.groupby('priority').apply(
lambda x: x.sample(n=50, random_state=42)
).reset_index(drop=True)
# Verify stratification
print(subset['priority'].value_counts())
# → Critical: 50, High: 50, Medium: 50, Low: 50🧠 Key Findings from Literature on This Dataset
The Jira SR has been used to show: human-assigned priorities exhibit significant inconsistency across projects for technically equivalent issues. This is the core empirical motivation for automating triage in PUMA.
🔗 Connected Notes
Defines task for: PN-IssueTriage-StoryPoints (Issue Triage section) Hypothesis: EX-Hypotheses-H1-H2 (H1) Experiment stages: EX-Stages-Overview Methods: PR-PUMA-Ch3-Methods (§3.2.1) Navigation: MOC-PUMA-Master
id: LN-TAWOS-Dataset title: “TAWOS — The Agile Workflow Optimisation Suite” type: literature-note subtype: dataset tags: [literature, dataset, tawos, story-points, estimation, agile] authors: [“Tawosi, Vali”, “Sarro, Federica”, “Harman, Mark”] year: 2022 venue: “MSR 2022” doi: “10.1145/3524842.3528029” url: “https://github.com/SOLAR-group/TAWOS” zotero_key: “Tawosi2022” dataset_size: “23,000+ user stories” license: “Apache 2.0” puma_task: “Effort Estimation (Stage 2 — H2)” read_status: processed created: 2026-03-01
TAWOS — The Agile Workflow Optimisation Suite
One-sentence summary: A dataset of 23,000+ real Agile user stories with story point estimates from diverse teams, used in PUMA Stage 2 to benchmark LLM effort estimation against human baseline and published baselines (Deep-SE, CoGEE).
📋 Dataset Description
| Property | Value |
|---|---|
| Size | 23,000+ user stories |
| Source | Real software teams using Jira/Agile |
| Label | Story points (Fibonacci: 1,2,3,5,8,13,21) |
| Format | CSV with story title, description, acceptance criteria, SP |
| License | Apache 2.0 |
| GitHub | github.com/SOLAR-group/TAWOS |
📊 Story Point Distribution
| Story Points | % (approx.) |
|---|---|
| 1–2 | 18% |
| 3 | 22% |
| 5 | 28% |
| 8 | 18% |
| 13+ | 14% |
🔬 Reference Baselines
| System | MAE | Notes |
|---|---|---|
| Mean historical | ~3.5 SP | Per-project mean as predictor |
| Deep-SE | ~3.2 SP | Choetkiertikul et al. (2018) |
| CoGEE (GPT-4) | ~1.9 SP | Tawosi et al. (2024) — state of art |
PUMA H2 threshold: MAE ≤ 3.0 SP (minimum) / ≤ 1.5 SP (desired)
📄 Dataset Citation
The TAWOS dataset was described and released in two publications:
-
Conference paper (primary dataset description): Tawosi, V., Sarro, F., & Harman, M. (2022). TAWOS: The Agile Workflow Optimisation Suite. In Proceedings of the 19th International Conference on Mining Software Repositories (pp. 1–5). https://doi.org/10.1145/3524842.3528029
-
GitHub dataset (data archive): Mousavi, S. H., & Giardino, C. (2023). TAWOS: The Agile Work of Stories dataset. GitHub. https://github.com/SOLAR-group/TAWOS
PUMA Usage
PUMA cites Tawosi et al. (2022) as the primary dataset reference and uses the GitHub archive (Mousavi & Giardino, 2023) for data access. Both citations are included in BIB-Master-APA7 for completeness.
🔗 Connected Notes
Task defined in: PN-IssueTriage-StoryPoints (Story Points section) Hypothesis: EX-Hypotheses-H1-H2 (H2) Key paper: LN-KeyPapers-CoGEE-Angermeir-Flyvbjerg (CoGEE baseline) Experiment stages: EX-Stages-Overview Methods: PR-PUMA-Ch3-Methods (§3.2.2) Navigation: MOC-Literature-Review