Chain-of-Thought (CoT) Prompting
Atomic Claim
Overview
Instructing a language model to reason step-by-step before producing a final answer (CoT) consistently improves performance on tasks requiring multi-step inference — including issue triage and effort estimation — compared to direct answer prompting.
💡 The Concept
Chain-of-Thought prompting elicits intermediate reasoning steps from LLMs before they produce a final answer. Two main variants:
Zero-Shot CoT
Add a simple trigger to the prompt — no examples needed:
"Think step by step before giving your final answer."
"Let's reason through this carefully:"
"Walk me through your reasoning before classifying:"
Few-Shot CoT
Provide examples where the reasoning chain is shown:
Example 1:
Issue: "Login page returns 500 error for all users"
Reasoning: This affects all users (scope: entire system),
is a hard blocker (severity: critical),
is a production system (environment: prod).
Priority: Critical
Example 2:
Issue: "Dashboard graph legend overlaps text on mobile"
Reasoning: This is a visual issue (severity: minor),
affects only mobile users (scope: partial),
has a workaround (pinch to zoom).
Priority: Low
Now classify:
Issue: [NEW ISSUE]
Reasoning: [MODEL GENERATES HERE]
Priority: [FINAL ANSWER]
📊 Relevance to PUMA
| Strategy | H1 Test | Expected Effect |
|---|---|---|
| Zero-shot (no CoT) | Baseline comparison | Lower F1 |
| Zero-shot CoT | Experimental condition | Moderate improvement |
| Few-shot-3 | Experimental condition | Significant improvement |
| Few-shot-6 CoT | Experimental condition | Highest F1 expected |
Key research question: Does CoT help more with triage (classification) or estimation (regression proxy)?
Key reference: Wei et al. (2022) show CoT gains are most pronounced for models ≥100B parameters — may be limited for 7-8B local models like Llama 3.2 and Mistral 7B. This is a testable prediction in PUMA H1.
🔗 Connected Ideas
Extends: PN-CoT-FewShot-Prompting (Few-Shot section) | Used in: PT-PUMA-Experiment-Prompts Tested in: EX-Hypotheses-H1-H2 (S4 strategy) Contrasts with: Zero-shot direct prompting | Enabled by: PN-RCOIF-Framework (I component) PM target: PN-IssueTriage-StoryPoints | Structure: ST-Prompting-Strategies Grounded by: PN-ReAct-AgentPattern (Stage 4 extension)
id: PN-Few-Shot-Prompting title: “Few-Shot Prompting” type: permanent-note category: concept tags: [permanent, concept, prompting, few-shot, in-context-learning, llm] aliases: [“Few-Shot”, “In-Context Learning”, “ICL”, “k-shot”] created: 2026-03-01 maturity: evergreen
Few-Shot Prompting
Atomic Claim
Providing k labelled examples (k=3,6) within the prompt context enables LLMs to adapt their classification behaviour to the target task distribution without gradient updates — a form of in-context learning that is the primary variable being manipulated in PUMA Stages 1 and 2.
💡 The Concept
Few-shot prompting places k (input, label) pairs in the prompt before the test instance. The model uses these as implicit demonstrations of the desired mapping.
Why it works: LLMs learn to identify the pattern (task format, label space, reasoning style) from demonstrations and apply it to novel instances.
Key design decisions for PUMA:
- Example selection — Stratified sampling (one per priority class for k=4, two per class for k=8)
- Example ordering — Random order to prevent recency bias
- Example diversity — Cover edge cases, not just easy examples
- Label format — Consistent with expected output format
📋 Template for Issue Triage (k=3)
You are an expert software project manager.
Classify the priority of Jira issues.
Priority classes: Critical | High | Medium | Low
Examples:
---
Issue: "Database connection pool exhausted — 100% of API requests failing"
Priority: Critical
Issue: "Search results take 8 seconds to load instead of 2 seconds"
Priority: High
Issue: "Tooltip text is cut off at the right edge in Firefox"
Priority: Low
---
Now classify this issue:
Issue: "{issue_title} — {issue_description}"
Priority:
🔗 Connected Ideas
Complements: PN-CoT-FewShot-Prompting (CoT section) | Contrasts with: Zero-shot Key paper: LN-Calikli-2025-RequestFormats — request format affects estimation non-monotonically Tested in: EX-Hypotheses-H1-H2 (S2, S3 strategies) PM application: PN-IssueTriage-StoryPoints | Framework: PN-RCOIF-Framework Datasets: LN-Datasets-JiraSR-TAWOS | Structure: ST-Prompting-Strategies MOC: MOC-Methods-Frameworks Foundational paper: LN-Wei-2022-ChainOfThought — Wei et al. (2022): original CoT paper, scaling analysis, zero-shot CoT