ST: Reproducibility in LLM-SE Research — Structure Note

Theme: Why is reproducibility a crisis in LLM-SE research, and how does PUMA address it?

Core Claims

Reproducibility Crisis

Only 5/18 LLM-SE papers with published artefacts are actually executable (Angermeir, 2025). PUMA is designed to be fully reproducible by construction.

Only 5/18 LLM-SE papers with artefacts are executable (Angermeir 2025) → PN-KeyConcepts-Agents-Reproducibility-RedTeam
Local inference + pinned models = bit-identical reproduction → PN-LLM-Local-vs-Cloud
seed=42 + temperature=0 ensure determinism → SP-PUMA-Constitution
RAG complicates reproducibility when retrieval index changes → PN-RAG-Embeddings-VectorDB

Failure Taxonomy (Angermeir 2025)

Failure type	PUMA mitigation
Missing dependencies	requirements.txt with pinned versions
Undocumented preprocessing	Documented scripts in repo
Non-deterministic LLM calls	temperature=0, seed=42, local Ollama
Absent random seeds	seed=42 everywhere

→ LN-Angermeir-2025-Reproducibility → SP-PUMA-Constitution (Art. 1) → PR-PUMA-Ch3-Methods (§3.8) → MOC-Methods-Frameworks

PUMA Vault

Explorador

ST: Reproducibility in LLM-SE Research — Structure Note

ST: Reproducibility in LLM-SE Research — Structure Note

Core Claims

Failure Taxonomy (Angermeir 2025)

Vista Gráfica

Tabla de Contenidos

Retroenlaces