🎬 Construyendo IA Fiable: Evals, Trazabilidad y Observabilidad — LambdaCast 34

Video Details

Channel: LambdaLoopers
URL: https://www.youtube.com/watch?v=qZ2Eu3kqA_g
Relevance: ⭐⭐⭐⭐⭐


Summary

LambdaCast episode covering the complete reliability stack for LLM systems: offline evaluation (Promptfoo, RAGAS) before deployment, online monitoring (Arize Phoenix, LangSmith) in production, traceability requirements (logging every prompt→response pair with metadata), and incident response when agents misbehave. Includes case studies from production deployments.


PUMA Relevance

Primary reference for PUMA’s evaluation and observability design. The offline eval → online monitoring progression maps exactly to PUMA’s design: Promptfoo for offline strategy comparison (Stage 1–3), Arize Phoenix for online trace recording (Stage 4–5). The traceability requirements (logging all prompt→response pairs) align with PUMA Constitution Article 7 (transparency for academic reproducibility).


MOCs