🎬 Mac Mini M4 básico para LLM — Probamos modelos de lenguaje para ver sus límites
Video Details
Channel: La Hora Maker URL: https://www.youtube.com/watch?v=ODSqFVW_46A Relevance: ⭐⭐⭐⭐⭐
Summary
Systematic benchmarking of various LLM models on a base Mac Mini M4 (16GB unified memory). Tests: inference latency (tokens/second), maximum context length before degradation, model quality on coding and reasoning tasks. Models tested include Llama 3.2 8B, Mistral 7B, Phi-3.5 Mini, and Qwen 7B — all quantized at Q4_K_M.
PUMA Relevance
Directly validates PUMA’s hardware assumptions for local inference. The benchmark data confirms that Llama 3.2 8B Q4_K_M runs at ~15 tokens/second on M4 Mac Mini — sufficient for PUMA’s Stage 1 triage (50-token classification output ≈ 3 seconds per issue). The Phi-3.5 Mini fallback (~40 tokens/second) is validated for the latency-critical path.