🎬 A Visual Tour of Modern LLM Architectures
Video Details
Channel: Sebastian Raschka
URL: https://www.youtube.com/watch?v=CepbWmGie0E
Relevance: ⭐⭐⭐⭐
Summary
Sebastian Raschka (PhD, ML researcher) provides a visual walkthrough of modern LLM architectures: transformer internals, attention mechanisms, positional encodings, the decoder-only architecture (GPT-style), instruction fine-tuning, RLHF, and the transition to mixture-of-experts models. Uses high-quality animations throughout.
PUMA Relevance
Essential background for understanding why PUMA’s local models (Llama 3.2 8B, Mistral 7B) behave the way they do. The instruction fine-tuning section explains why instruct-tuned variants outperform base models on PUMA’s classification task. The attention mechanism explanation provides intuition for why few-shot examples in the context window improve classification.