PUMA Experiment Prompt — Issue Triage (Zero-Shot)

This is the actual prompt used in PUMA benchmark experiments. It runs inside the Ollama inference pipeline, not as a conversational prompt.

Experiment Context

Independent variable: This zero-shot strategy is the baseline condition for H1.


📋 The Prompt

TRIAGE_ZERO_SHOT_TEMPLATE = """You are an expert software project manager 
with 10+ years of experience triaging Jira issues.
 
Classify the priority of the following Jira issue into exactly one category:
- Critical: System down, data loss, security breach, complete blocker for all users
- High: Major feature broken, significant user impact, no acceptable workaround
- Medium: Feature degraded, minor data issues, workaround exists
- Low: Cosmetic issue, documentation, minor UX improvement, nice-to-have
 
Issue Title: {issue_title}
Issue Description: {issue_description}
 
Respond with ONLY the priority label. No explanation. No punctuation.
Valid responses: Critical | High | Medium | Low
 
Priority:"""

⚙️ Calling the Prompt

from codecarbon import EmissionsTracker
import requests, time
 
def run_triage_zero_shot(issue: dict, model: str = "llama3.2:8b") -> dict:
    prompt = TRIAGE_ZERO_SHOT_TEMPLATE.format(
        issue_title=issue["title"],
        issue_description=issue.get("description", "")[:500]  # truncate
    )
    
    tracker = EmissionsTracker(project_name=f"puma-triage-zero-{model}")
    tracker.start()
    
    start = time.time()
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": model, "prompt": prompt, "stream": False,
              "options": {"seed": 42, "temperature": 0, "num_predict": 10}}
    )
    latency = time.time() - start
    emissions = tracker.stop()
    
    raw = response.json()["response"].strip()
    # Parse to valid label
    label = parse_priority_label(raw)
    
    return {
        "predicted": label,
        "raw_response": raw,
        "latency_s": latency,
        "emissions_gco2": emissions * 1000,  # kg → g
        "model": model,
        "strategy": "zero-shot"
    }


id: PT-PUMA-Triage-FewShot3 title: “PUMA Experiment Prompt — Issue Triage (Few-Shot 3)“

PUMA Experiment Prompt — Issue Triage (Few-Shot k=3)

TRIAGE_FEW_SHOT_3_TEMPLATE = """You are an expert software project manager 
triaging Jira issues for priority.
 
Priority definitions:
- Critical: System down, data loss, security breach, complete blocker
- High: Major feature broken, significant user impact, no workaround
- Medium: Feature degraded, workaround exists, moderate impact
- Low: Cosmetic, documentation, minor UX, enhancement
 
Examples:
---
Issue: "Production database not responding — all API calls failing since 03:00 UTC"
Priority: Critical
 
Issue: "User search returns incorrect results when filtering by date range"
Priority: High
 
Issue: "Export to CSV button tooltip text is truncated on Firefox"
Priority: Low
---
 
Now classify:
Issue Title: {issue_title}
Issue Description: {issue_description}
 
Priority:"""


id: PT-PUMA-Triage-CoT title: “PUMA Experiment Prompt — Issue Triage (Chain-of-Thought)“

PUMA Experiment Prompt — Issue Triage (CoT)

TRIAGE_COT_TEMPLATE = """You are an expert software project manager.
Classify this Jira issue's priority using step-by-step reasoning.
 
Priority levels: Critical | High | Medium | Low
 
Issue Title: {issue_title}
Issue Description: {issue_description}
 
Think step by step:
1. Scope: How many users are affected? (all / many / some / few)
2. Severity: What is the functional impact? (system-down / broken / degraded / cosmetic)
3. Urgency: Is there a workaround? (none / partial / yes)
4. Environment: Production, staging, or development?
 
Based on this analysis:
Priority: [state ONLY: Critical | High | Medium | Low]"""


id: PT-PUMA-Estimation-FewShot title: “PUMA Experiment Prompt — Effort Estimation (Few-Shot)“

PUMA Experiment Prompt — Effort Estimation (Few-Shot)

Stage Context

Used in Stage 2 (TAWOS dataset). This prompt tests H2.

ESTIMATION_FEW_SHOT_TEMPLATE = """You are an experienced Agile project manager 
estimating story points for user stories.
 
Story points follow the Fibonacci scale: 1, 2, 3, 5, 8, 13, 21
- 1 SP: Trivial change, under 1 hour, well-understood
- 2 SP: Simple task, 2-4 hours, minimal unknowns
- 3 SP: Moderate, half-day to 1 day, some complexity
- 5 SP: Complex, 1-2 days, several moving parts
- 8 SP: Very complex, 3-4 days, significant uncertainty
- 13 SP: Large, multiple days, high uncertainty
- 21 SP: Epic-level, break into smaller stories if possible
 
Examples from similar projects:
---
Story: "Add 'forgot password' link to login page that sends reset email"
Acceptance: Email sent within 30s, link expires in 24h, secure token
Story Points: 3
 
Story: "Implement real-time notifications using WebSockets for task updates"
Acceptance: <200ms delivery, supports 10K concurrent users, fallback to polling
Story Points: 8
 
Story: "Fix typo in error message on checkout form"
Acceptance: Typo corrected, no regression in form validation
Story Points: 1
---
 
Estimate this story:
Title: {story_title}
Description: {story_description}
Acceptance Criteria: {acceptance_criteria}
 
Respond with ONLY the numeric story point value from: 1, 2, 3, 5, 8, 13, 21
 
Story Points:"""


🧪 PUMA Experiment Prompts — Master Reference

This note contains the canonical prompts used in PUMA’s four experimental strategies. Any change to these prompts must: (1) increment the version number, (2) be logged in AI-Use-Log, (3) trigger a new experiment run.


Strategy Registry

IDStrategyN ExamplesUsed In Stage
S1Zero-Shot01–5
S2Few-Shot-331–5
S3Few-Shot-661–5
S4CoT (Chain-of-Thought)0 + reasoning1–5

Full Prompt Specifications

See PT-P2-003-PromptEngineering-AgentPrompts for the complete prompt text for all four strategies.


Prompt Version History

VersionDateChangesExperiment Re-run?
v1.02026-04-07Initial promptsYes — baseline

MOCs


🔗 All Experiment Prompts

PromptStageStrategyNotes
PT-PUMA-Triage-ZeroShot1Zero-shotBaseline condition
PT-PUMA-Triage-FewShot31Few-shot k=31 example per class
PT-PUMA-Triage-FewShot61Few-shot k=62 examples per critical classes
PT-PUMA-Triage-CoT1Chain-of-ThoughtZero-shot + reasoning
PT-PUMA-Estimation-FewShot2Few-shot3 examples across scale
PT-PUMA-Estimation-CoT2CoTStructured reasoning