LN: Zelikman et al. (2024) — Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Bibliographic Reference

Citation: Zelikman, E., Harik, G., Shao, Y., Jayasiri, V., Haber, N., & Goodman, N. D. (2024). Quiet-STaR: Language models can teach themselves to think before speaking. arXiv:2403.09629. COLM 2024. https://arxiv.org/abs/2403.09629


Pass 1 — Bird’s Eye View (5 Cs)

CAssessment
CategoryTraining methodology proposal
ContextExtends STaR (Zelikman et al., 2022); unsupervised chain-of-thought training
CorrectnessEvaluated on CommonsenseQA, GSM8K. Results are incremental improvements.
Contributions(1) LLMs generate internal rationales for every token during training; (2) Rationales that improve predictions are reinforced; (3) Emergent reasoning without supervised CoT examples
ClarityComplex implementation but well-explained theory.

Relevance: ⭐⭐⭐

Relevant as background on why CoT works. Not directly applicable to PUMA MVP (no training, only prompting). Useful for future work (fine-tuning section).


PUMA Connection

Quiet-STaR explains mechanistically why adding “Think step by step” to prompts (Zero-Shot CoT) improves classification quality. This supports the theoretical justification for PUMA’s Strategy 4 (CoT prompting). Reference for Ch.2 (prompting strategies background).

MOCs