🎬 Construyendo IA Fiable: Evals, Trazabilidad y Observabilidad

Video Details

Channel: LambdaLoopers URL: https://www.youtube.com/watch?v=qZ2Eu3kqA_g Relevance: ⭐⭐⭐⭐⭐


Summary

LambdaCast episode 34 covering the full stack for reliable LLM systems: evaluation frameworks (Promptfoo, RAGAS), observability tooling (Arize Phoenix, LangSmith), and traceability requirements for production. Discusses the difference between offline evals (before deployment) and online monitoring (after deployment).


PUMA Relevance

Core reference for PUMA’s evaluation and observability design. Arize Phoenix (used in PUMA) is discussed in detail. The offline/online eval distinction maps to PUMA’s design: offline = Wilcoxon tests on experiment results, online = Arize Phoenix traces during live triage. The traceability requirements align with PUMA Constitution Article 7 (transparency).


MOCs