🎬 Construyendo IA Fiable: Evals, Trazabilidad y Observabilidad
Video Details
Channel: LambdaLoopers URL: https://www.youtube.com/watch?v=qZ2Eu3kqA_g Relevance: ⭐⭐⭐⭐⭐
Summary
LambdaCast episode 34 covering the full stack for reliable LLM systems: evaluation frameworks (Promptfoo, RAGAS), observability tooling (Arize Phoenix, LangSmith), and traceability requirements for production. Discusses the difference between offline evals (before deployment) and online monitoring (after deployment).
PUMA Relevance
Core reference for PUMA’s evaluation and observability design. Arize Phoenix (used in PUMA) is discussed in detail. The offline/online eval distinction maps to PUMA’s design: offline = Wilcoxon tests on experiment results, online = Arize Phoenix traces during live triage. The traceability requirements align with PUMA Constitution Article 7 (transparency).