LN: Wei et al. (2022) — Chain-of-Thought Prompting

Bibliographic Reference

Citation: Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems. https://arxiv.org/abs/2201.11903

Pass 1 — Bird’s Eye View (5 Cs)

C	Assessment
Category	Prompting technique paper
Context	Google Brain. Demonstrates that including intermediate reasoning steps (“thinking step by step”) in few-shot examples dramatically improves LLM performance on multi-step reasoning tasks
Correctness	Evaluated on arithmetic (GSM8K), commonsense (StrategyQA), and symbolic reasoning; consistent gains across model scales ≥100B parameters
Contributions	(1) Chain-of-thought (CoT) prompting as an emergent capability; (2) Few-shot CoT: reasoning examples in the prompt; (3) Zero-shot CoT (“Let’s think step by step”); (4) Scaling law for CoT: only effective above ~100B parameters
Clarity	Excellent — clear methodology, ablations, and discussion of failure modes

Relevance: ⭐⭐⭐⭐⭐

CoT is PUMA’s Strategy 4 (most complex prompting condition). For effort estimation (H2), reasoning through “this issue is similar to X because…” explicitly mirrors the human PM estimation process — and Wei et al. provide the empirical justification for why this works.

Pass 2 — Key Concepts

What Is Chain-of-Thought?

Standard few-shot prompting provides input-output examples:

Input: "Bug: login fails after password reset"
Output: "Bug, High"

CoT prompting includes intermediate reasoning:

Input: "Bug: login fails after password reset"
Reasoning: "Login failure is a blocking user experience issue. Password reset
            affects authentication — a core system function. This warrants
            High priority, not Critical (no data loss or security breach)."
Output: "Bug, High"

The reasoning chain is not just explanatory — it forces the model to articulate its decision path, which constrains the output to be consistent with the reasoning.

The Scaling Law for CoT

A critical finding: CoT provides no benefit for models below ~100B parameters. Below this threshold, the reasoning chain is generated incoherently. Above it, CoT consistently improves performance.

Implication for PUMA: Smaller models (Llama 3.2 8B, Mistral 7B, Phi-3.5 Mini) may not benefit from CoT to the same degree as GPT-4o or Claude. This is a testable hypothesis in PUMA’s experimental design.

Zero-Shot CoT

Simply appending “Let’s think step by step” to any prompt elicits basic CoT behaviour without examples — a remarkable emergent capability. This suggests the reasoning structure is latent in the model, not dependent on specific examples.

PUMA’s Prompting Strategies

Strategy	CoT Role	PUMA H1/H2
S1: Zero-Shot	No reasoning	Baseline
S2: Few-Shot-3	No reasoning, 3 examples	Pattern matching
S3: Few-Shot-6	No reasoning, 6 examples	Pattern matching
S4: CoT + Few-Shot	Reasoning chain + examples	Most complex

Wei et al. directly motivates Strategy S4 and provides the theoretical basis for why PUMA expects CoT to outperform S1–S3 on estimation tasks.

PUMA Integration

Ch.2 Literature Review: CoT is the theoretical foundation for PUMA’s S4 prompting strategy
Ch.3 Methods: Experimental condition S4 implemented per Wei et al.’s few-shot CoT format
H2 hypothesis: Expect CoT to improve MAE for story point estimation more than triage classification

PN-CoT-FewShot-Prompting — permanent note integrating CoT, few-shot, zero-shot strategies
PN-COSTAR-SelfConsistency — self-consistency as extension of CoT
LN-Yao-2023-TreeOfThoughts — ToT as generalisation of CoT to tree search
LN-Shinn-2023-Reflexion — Reflexion as iterative CoT with self-critique
EX-Hypotheses-H1-H2 — PUMA experimental conditions

PUMA Vault

Explorador

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

LN: Wei et al. (2022) — Chain-of-Thought Prompting

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Key Concepts

What Is Chain-of-Thought?

The Scaling Law for CoT

Zero-Shot CoT

PUMA’s Prompting Strategies

PUMA Integration

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces

PUMA Vault

Explorador

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

LN: Wei et al. (2022) — Chain-of-Thought Prompting

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Key Concepts

What Is Chain-of-Thought?

The Scaling Law for CoT

Zero-Shot CoT

PUMA’s Prompting Strategies

PUMA Integration

Related Notes

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces