LN — Plasma Control with Deep RL (Degrave et al., 2022)
Full Reference: Degrave, J., Felici, F., Buchli, J., et al. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602, 414–419. https://doi.org/10.1038/s41586-021-04301-9
Pass 1 — Bird’s Eye
Main Claim
A deep RL agent learns to control plasma configurations in the TCV tokamak, including configurations not previously achievable with classical linear controllers.
| Property | Detail |
|---|---|
| Type | Research paper — Physics / Reinforcement Learning |
| Relevance to PUMA | ⭐⭐ Medium — demonstrates AI discovering novel operational strategies in a domain with complex physics; analogous to PUMA discovering optimal triage strategies |
Pass 2 — Key Content
System
- Deep RL agent trained in simulation (MHD physics model), deployed on real tokamak
- Learns control policies: coil current adjustments to maintain plasma shape
- Operates at 10kHz control frequency
Novel Contributions
- First demonstration of RL control for tokamak plasma (milestone)
- Agent produced novel plasma configurations not previously achievable with classical controllers
- Multi-objective control (various plasma shapes) via single policy
Knowledge Generated
- The novel configurations represent genuinely new operational knowledge for fusion physics
- Plasma shapes discovered by RL were not anticipated by domain experts — constitutes new knowledge
PUMA Relevance
PUMA Analogy
The pattern is analogous to PUMA:
- RL agent discovers novel plasma configurations not anticipated by human experts
- PUMA’s CoT agent might discover novel triage reasoning chains not used by human PMs
The key difference: PUMA’s “discovery” is evaluated via F1-macro rather than physical plasma stability — but the knowledge generation dynamic is similar.
APA7 Citation
Degrave, J., Felici, F., Buchli, J., et al. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602, 414–419. https://doi.org/10.1038/s41586-021-04301-9