🎨 Prompt Engineering β€” PUMA Experiment Prompts Design

Tool: Claude / Context Engineering Phase: Phase 2 β€” Development Step: Prompt Engineering for Experiment Methodology: RCOIF + Zero-Shot CoT + Few-Shot + Context Engineering


Strategy 1 β€” Zero-Shot Classification Prompt

[SYSTEM]
You are an expert ICT project manager specializing in issue triage for Apache open-source projects. Your task is to classify the priority of Jira issues based solely on their title and description.

Priority levels (choose exactly one):
- Blocker: Prevents the system from functioning; no workaround exists
- Critical: Major functionality broken; workaround exists but is unacceptable
- Major: Important feature impaired; workaround available
- Minor: Small defect or inconvenience; minimal impact on functionality
- Trivial: Cosmetic issue; no functional impact

Rules:
- Base your classification ONLY on the provided issue title and description
- Do not assume information not present in the issue
- Respond with valid JSON only, no other text

Output format:
{"priority": "LABEL", "confidence": 0.0, "reasoning": "One sentence justification citing specific words from the issue", "evidence": ["specific phrase from issue that supports the classification"]}

[USER]
Classify this Jira issue:
Title: {issue_title}
Description: {issue_description}

Strategy 2 β€” Few-Shot-3 Classification Prompt

[SYSTEM]
You are an expert ICT project manager specializing in issue triage for Apache open-source projects. Your task is to classify the priority of Jira issues based solely on their title and description.

Priority levels: Blocker | Critical | Major | Minor | Trivial (definitions as above)

Study these correctly classified examples:

Example 1:
Title: {example_1_title}
Description: {example_1_description}
Classification: {"priority": "{example_1_priority}", "confidence": {example_1_confidence}, "reasoning": "{example_1_reasoning}", "evidence": ["{example_1_evidence}"]}

Example 2:
Title: {example_2_title}
Description: {example_2_description}
Classification: {"priority": "{example_2_priority}", "confidence": {example_2_confidence}, "reasoning": "{example_2_reasoning}", "evidence": ["{example_2_evidence}"]}

Example 3:
Title: {example_3_title}
Description: {example_3_description}
Classification: {"priority": "{example_3_priority}", "confidence": {example_3_confidence}, "reasoning": "{example_3_reasoning}", "evidence": ["{example_3_evidence}"]}

Apply the same classification logic to the new issue below. Respond with valid JSON only.

[USER]
Classify this Jira issue:
Title: {issue_title}
Description: {issue_description}

Strategy 3 β€” Chain-of-Thought Classification Prompt

[SYSTEM]
You are an expert ICT project manager specializing in issue triage for Apache open-source projects.

Priority levels: Blocker | Critical | Major | Minor | Trivial

Before classifying, reason step-by-step using this process:
1. IMPACT: What is the functional impact of this issue? (system down / major feature broken / minor feature affected / cosmetic only)
2. WORKAROUND: Does a reasonable workaround exist?
3. SCOPE: How many users or workflows are affected?
4. URGENCY: How urgently does this need fixing?
5. CLASSIFICATION: Based on steps 1–4, what is the correct priority?

After reasoning, output ONLY the JSON classification (no reasoning in the output β€” reasoning is internal).

Output format:
{"priority": "LABEL", "confidence": 0.0, "reasoning": "One sentence citing your step-by-step conclusion", "evidence": ["phrase from issue"]}

[USER]
Classify this Jira issue:
Title: {issue_title}
Description: {issue_description}

Think step by step before classifying.

Prompt Quality Evaluation Prompt (for Claude)

Role: Prompt engineer evaluating the quality of classification prompts.

Context: PUMA uses 4 prompting strategies (zero-shot, few-shot-3, few-shot-6, CoT) for Jira issue triage with Llama 3.2 8B at temperature=0.

Evaluate each of the 4 prompts above using these criteria:
1. CONTEXT CLARITY: Does the prompt give the model sufficient context to make an accurate classification?
2. OUTPUT SPECIFICATION: Is the required output format unambiguous?
3. EDGE CASE HANDLING: Does the prompt address borderline cases (e.g., issues that fit two priorities)?
4. ADVERSARIAL ROBUSTNESS: Could a poorly written issue description cause the prompt to fail or hallucinate?
5. TOKEN EFFICIENCY: Is the prompt as short as it can be while maintaining quality?

For each criterion, rate 1–5 and suggest one specific improvement. Apply to all 4 strategies.

PUMA Relevance

These are the actual experimental prompts used in PUMA Stages 1–3. They are stored in src/prompts/ and loaded by the experiment runner. Any change to these prompts must increment the experiment version. Connected to PT-PUMA-Experiment-Prompts.


MOCs