PUMA is an applied Agentic Science system positioned at LeCun’s Level 2 of AI scientific capability
Core Insight
LeCun (2026) identifies three levels of AI scientific capability: (1) assistant, (2) model, (3) agent. PUMA’s triage and estimation systems operate at Level 2 — AI as a scientific model that captures domain regularities (PM issue patterns) better than simple baselines. PUMA’s Stage 5 Smart PMO approaches Level 3 — autonomous orchestration of PM workflow cycles. This positioning places PUMA within the validated Agentic Science trajectory, providing theoretical and empirical justification for its claims.
Evidence Chain
| Level | General Science Example | PUMA PM Equivalent |
|---|---|---|
| L1: Assistant | Claude writing literature review | PUMA using Claude/NotebookLM for SLR |
| L2: Scientific Model | AlphaFold predicting protein structures | PUMA triage agent classifying issue priority |
| L2: Scientific Model | GraphCast forecasting weather | PUMA estimation agent predicting story points |
| L3: Autonomous Agent | AI Scientist generating ML papers | PUMA Stage 5 Smart PMO orchestrating sprints |
Why Level 2 is Sufficient for PUMA’s Claim
Felin & Holweg (2024) argue AI cannot generate novel causal theories — and they are largely correct for Level 3 claims. But PUMA’s claim is Level 2:
- “LLM agents can classify Jira issue priority more accurately than majority-class baseline”
- “LLM agents can estimate story points more accurately than historical average”
These are prediction tasks with ground truth — exactly the domain where Level 2 AI models (AlphaFold, GraphCast, GNoME) have proven highly effective. The argument against Felin & Holweg does not need to go to Level 3 for PUMA’s validation.
Thesis Section Mapping
| PUMA Section | PEC2 Connection |
|---|---|
| 1.1 Context and Justification | AI science trajectory → PM automation as same pattern |
| 1.3 Ethical-Social Impact | Klinger (2025): social requirements for responsible AI in science |
| 2. Materials and Methods | LeCun three-level framework as theoretical grounding |
| 4. Conclusions and Future Work | Stage 5 approaching Level 3; reproducibility as scientific contribution |
Related Notes
- PN-AI-Scientific-Knowledge-Generation — AI discovery examples (AlphaFold, GraphCast)
- PN-Agentic-Science-Paradigm — Closed-loop research paradigm
- PN-IssueTriage-StoryPoints — PUMA’s Level 2 prediction tasks
- PN-MultiAgent-ArchitecturePatterns — Stage 5 multi-agent foundation
- LN-Zhang-2025-AgenticScienceSurvey — theoretical framework
- LN-Felin-2024-TheoryIsAllYouNeed — counter-argument (Level 2 suffices)
- LN-Jumper-2021-AlphaFold — L2 analogue (prediction)
- LN-Lam-2023-GraphCast — L2 analogue (prediction)
- PR-PUMA-Ch1-Introduction — §1.1 context + §1.3 ethics
- PR-PUMA-Ch2-Ch3-Ch4-Ch5
- EX-Hypotheses-H1-H2 — H1+H2 as Level 2 predictions
- Smart-PMO-Vision — Stage 5 approaching Level 3