LN: Yao et al. (2022) — ReAct: Synergizing Reasoning and Acting

Bibliographic Reference

Citation: Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing reasoning and acting in language models. arXiv:2210.03629. https://arxiv.org/abs/2210.03629 Venue: Presented at ICLR 2023.

Pass 1 — Bird’s Eye View (5 Cs)

C	Assessment
Category	System proposal + empirical evaluation
Context	Builds on CoT (Wei et al., 2022) and task-specific action spaces
Correctness	Rigorous evaluation on HotpotQA, FEVER, ALFWorld, WebShop
Contributions	(1) ReAct paradigm: interleaves Thought–Action–Observation cycles; (2) 69% on HotpotQA vs 28% for Act-only; (3) Reduces hallucination via grounding in external observations
Clarity	Excellent. Clear prompting examples.

Relevance: ⭐⭐⭐⭐⭐

Foundation paper for PUMA Stage 4–5 agent architecture.

Pass 2 — Content

Core Idea

ReAct = Reasoning + Acting. At each step an LLM alternates between:

Thought: internal reasoning about what to do
Action: calling an external tool (search, API, database)
Observation: reading the tool output and continuing

This loop continues until the agent reaches a final answer. The key insight: verbal reasoning traces reduce hallucination because the agent’s next action is grounded in what it actually retrieved, not in what it “remembers.”

Key Results

HotpotQA: ReAct 69% vs. CoT 57% vs. Act-only 28%
FEVER: ReAct 80% vs. CoT 66%
ALFWorld: ReAct outperforms imitation learning on 6/6 task types
WebShop: ReAct 40.7% vs. Act-only 35.3%

Failure Modes

Repetitive loops when the tool returns unexpected results
Inability to recover from an incorrect reasoning path (→ addressed by Reflexion, Shinn et al. 2023)

Pass 3 — Virtual Reconstruction

ReAct’s contribution is essentially a prompt format: interleaving reasoning and action in the few-shot examples is all that is needed to make standard LLMs exhibit agent-like behaviour. This is practically significant for PUMA: our agents can use ReAct-style prompting without any fine-tuning.

Q1 (How can I use this?): ReAct is the base pattern for PUMA Stage 4 triage agent. When the agent retrieves similar historical issues (RAG step), the Thought–Action–Observation loop structures that retrieval into the classification decision.

Q2 (Does it do what it claims?): The HotpotQA benchmark involves factual QA with Wikipedia. The transfer to PM triage is non-trivial — the “observations” in PM would be retrieved Jira issues, not Wikipedia paragraphs. The gain may differ.

Q3 (What if?): What if the LLM generates internally inconsistent Thought steps? PUMA could implement a simple consistency check: verify that the final label in the Answer step matches the reasoning in the Thought step.

PUMA Integration

Stage 4: ReAct loop structures RAG-enhanced triage (Thought → retrieve similar issues → classify)
Stage 5: SmartPMO agents use ReAct as the base interaction pattern
Related: PN-KeyConcepts-Agents-Reproducibility-RedTeam
Related: PN-RAG-Embeddings-VectorDB

PN-ReAct-AgentPattern — permanent note synthesising this paper
PN-MultiAgent-ArchitecturePatterns — Stage 4/5 architecture
PN-SDD-Framework
EX-Stages-Overview — Stage 4 RAG agent uses ReAct
Smart-PMO-Vision — Stage 5 SmartPMO agents
LN-Masterman-2024-AgentArchSurvey — agent architecture context
SP-Architecture — PUMA system architecture

PUMA Vault

Explorador

ReAct: Synergizing Reasoning and Acting in Language Models

LN: Yao et al. (2022) — ReAct: Synergizing Reasoning and Acting

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Content

Core Idea

Key Results

Failure Modes

Pass 3 — Virtual Reconstruction

PUMA Integration

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces

PUMA Vault

Explorador

ReAct: Synergizing Reasoning and Acting in Language Models

LN: Yao et al. (2022) — ReAct: Synergizing Reasoning and Acting

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Content

Core Idea

Key Results

Failure Modes

Pass 3 — Virtual Reconstruction

PUMA Integration

Related Notes

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces