LN: Zelikman et al. (2024) — Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Bibliographic Reference
Citation: Zelikman, E., Harik, G., Shao, Y., Jayasiri, V., Haber, N., & Goodman, N. D. (2024). Quiet-STaR: Language models can teach themselves to think before speaking. arXiv:2403.09629. COLM 2024. https://arxiv.org/abs/2403.09629
Pass 1 — Bird’s Eye View (5 Cs)
| C | Assessment |
|---|---|
| Category | Training methodology proposal |
| Context | Extends STaR (Zelikman et al., 2022); unsupervised chain-of-thought training |
| Correctness | Evaluated on CommonsenseQA, GSM8K. Results are incremental improvements. |
| Contributions | (1) LLMs generate internal rationales for every token during training; (2) Rationales that improve predictions are reinforced; (3) Emergent reasoning without supervised CoT examples |
| Clarity | Complex implementation but well-explained theory. |
Relevance: ⭐⭐⭐
Relevant as background on why CoT works. Not directly applicable to PUMA MVP (no training, only prompting). Useful for future work (fine-tuning section).
PUMA Connection
Quiet-STaR explains mechanistically why adding “Think step by step” to prompts (Zero-Shot CoT) improves classification quality. This supports the theoretical justification for PUMA’s Strategy 4 (CoT prompting). Reference for Ch.2 (prompting strategies background).