Agentic Design Patterns: ReAct, Plan-then-Execute, ReWOO, LLMCompiler, and Reflexion

What is an Agent?

In 2025, we have come to think of an agent as an AI system that autonomously pursues goals through iterative cycles of reasoning and action. Unlike traditional LLMs that simply respond to prompts, agents can break down complex tasks, use tools to gather information or perform actions, and adapt their approach based on results—all without human intervention at each step.

Technically, most modern agents follow the ReAct pattern (Reasoning + Acting): the system generates thoughts about what to do next, executes an action using available tools, observes the results, and repeats this cycle until the goal is achieved. This closed-loop process enables agents to handle multi-step workflows that would otherwise require manual orchestration.

This newly defined “agentic” AI promises to automate entire processes, leading to wide adoption—at least in LinkedIn posts.

However, a simple ReAct loop isn’t always suitable. This guide explores five core patterns—ReAct, Plan-then-Execute, ReWOO, LLMCompiler, and Reflexion—with practical guidance on when to use each to optimize for cost, speed, or quality.

The Challenge: Moving Beyond Basic Agents

Current ReAct-style agents often struggle with complex, multi-step queries requiring comprehensive analysis and actionable recommendations. Consider two contrasting experiences:

Example 1: Simple query, inadequate response

Query: “What are the best coffee beans for espresso and where can I buy them locally?”
Result: Agent refused to search inventory or provide store locations, citing unclear parameters
Problem: User received no actionable insights despite having relevant data available

Example 2: Heavy prompt engineering gives great results

Approach: Manually guided agent through structured workflow (search beans → filter by roast → check inventory → find stores → compare prices)
Result: High-quality, data-driven recommendation with detailed supporting evidence
Challenge: Required significant user effort to orchestrate

The gap? Example 2 required manual orchestration of what should be the agent’s natural capability. Planning patterns can bridge this gap by building structured workflows directly into the agent’s reasoning process.

Pattern 1: ReAct (Reason + Act) — The Current Standard

What It Is

ReAct augments LLM action spaces by interleaving explicit reasoning traces (thoughts) with environment interactions (actions and observations). The agent generates natural language thoughts explaining its reasoning, executes corresponding actions, receives observations from the environment, and repeats until task completion.

Technical innovation: Extends the agent’s action space from just environmental actions to include a language space for reasoning traces. Each thought decomposes goals into subgoals, tracks progress, injects commonsense knowledge, and handles exceptions—all visible to humans.

Architecture Flow

graph TD
    A[User Query] --> B[Think: Reason about next step]
    B --> C[Act: Select and execute tool]
    C --> D[Observe: Get tool result]
    D --> E{Task Complete?}
    E -->|Yes| F[Return Answer]
    E -->|No| B

    style A fill:#fff,stroke:#333,stroke-width:2px
    style B fill:#e8e8e8,stroke:#333,stroke-width:2px
    style C fill:#d0d0d0,stroke:#333,stroke-width:2px
    style D fill:#b8b8b8,stroke:#333,stroke-width:2px
    style E fill:#a0a0a0,stroke:#333,stroke-width:2px
    style F fill:#888,stroke:#333,stroke-width:2px

Strengths

Maximum adaptability: Handles unknown task complexity and ambiguous queries by discovering requirements during exploration
Transparent reasoning: Every tool call has an associated thought trace, enabling debugging, compliance auditing, and human-in-the-loop intervention
Multi-hop reasoning: Naturally chains information across 3+ sources (HotpotQA: 27.4% accuracy, ALFWorld: 71% success vs 45% for action-only)
Reduced hallucination: Grounds reasoning in external knowledge sources (6% false positives vs 14% for Chain-of-Thought alone)
Interactive environments: Excels in text-based games, web navigation, and embodied AI tasks

Limitations

Token inefficiency: Requires LLM call for EACH tool invocation (typically 3-7 steps), increasing both latency and API costs significantly
Myopic planning: Only plans for 1 sub-problem at a time without upfront global reasoning, leading to sub-optimal trajectories
Tool selection overload: Performance degrades with 7+ tools (calendar scheduling drops to 2% with 7+ domains)
Higher reasoning errors: 47% vs 16% failure rate compared to pure Chain-of-Thought due to structural constraints
Repetitive loops: Can generate same action sequence repeatedly (23% of HotpotQA failures)
Requires large models: Models smaller than 62B parameters show poor performance

When to Use ReAct

Exploratory tasks where complexity is unknown upfront
Dynamic scenarios where next steps depend on previous results
Interpretability critical for debugging, compliance auditing, or human-in-the-loop oversight
Interactive environments like text games, web navigation, or embodied AI

When to Avoid ReAct

Cost-sensitive high-volume systems (use ReWOO or LLMCompiler for 80% cost reduction)
Well-scoped repeatable workflows (use Plan-then-Execute for deterministic execution)
Speed-critical applications (use LLMCompiler)
Large tool sets (>7 tools) where tool selection degrades performance

Pattern 2: Plan-then-Execute

What It Is

Plan-then-Execute is a full agentic framework with three distinct components: Planner, Executor, and Replanner. The planner generates initial multi-step plans as structured lists. The executor (typically a ReAct agent) carries out individual steps using available tools. The replanner examines completed steps and decides whether to continue with remaining steps, generate a revised plan, or respond with final results.

Inspired by Plan-and-Solve Prompting: The architecture draws inspiration from Wang et al.’s Plan-and-Solve (PS/PS+) prompting technique (ACL 2023), which improved zero-shot arithmetic reasoning from 70.4% to 76.7% by having LLMs explicitly plan before solving. The original PS+ prompt reduced calculation errors from 7% to 5% and missing-step errors from 12% to 7%.

Architecture Flow

graph TD
    A[User Query] --> B[Planner: Generate multi-step plan]
    B --> C[Plan visible upfront]
    C --> D[Executor: Execute step 1]
    D --> E[Executor: Execute step 2]
    E --> F[Executor: Execute step N]
    F --> G{Replanner: Evaluate results}
    G -->|Success| H[Return final answer]
    G -->|Need more info| B
    G -->|Adjust approach| B

    style A fill:#fff,stroke:#333,stroke-width:2px
    style B fill:#e8e8e8,stroke:#333,stroke-width:2px
    style C fill:#d8d8d8,stroke:#333,stroke-width:2px
    style D fill:#c8c8c8,stroke:#333,stroke-width:2px
    style E fill:#c8c8c8,stroke:#333,stroke-width:2px
    style F fill:#c8c8c8,stroke:#333,stroke-width:2px
    style G fill:#a0a0a0,stroke:#333,stroke-width:2px
    style H fill:#888,stroke:#333,stroke-width:2px

Strengths

Speed advantage: Multi-step workflows execute faster since the large planning LLM is only called during planning and replanning phases
Cost optimization: Can use more sophisticated models for planning and smaller models for execution (30-50% cost reduction)
Quality improvement: Forces planner to “think through” ALL steps upfront creating more coherent multi-step solutions
Deterministic and auditable: Plan is visible upfront before any execution, making it easier to test, debug, and validate against business requirements

Limitations

Sequential execution bottleneck: Tasks execute one after another (ReWOO and LLMCompiler address this)
Planning rigidity: Brittle if initial plan is wrong; limited adaptability if user query needs different tools mid-flight (requires costly replanning)
Planning overhead: Not justified for simple single-step queries where ReAct or direct function calling would be faster
Context window limitations: Performance degrades as domain and tool counts increase
Replanning ambiguity: Deciding when to replan versus respond lacks clear criteria

When to Use Plan-then-Execute

Multi-step complex tasks with 5+ decomposable reasoning steps (research, data pipelines, long-horizon analysis)
Arithmetic reasoning where explicit planning reduces missing-step and calculation errors
Repeatable workflows with predefined procedures (customer support, batch reports)
Audit-critical scenarios requiring deterministic, visible-upfront plans for validation
Cost optimization priority using model tiering (larger model for planning, smaller model for execution)
Stable environments where plans remain valid during execution and accuracy outweighs latency

When to Avoid Plan-then-Execute

Simple single-step queries where planning overhead isn’t justified (use direct function calling)
Highly dynamic environments where plans quickly become obsolete or need constant revision
Exploratory tasks without clear structure requiring adaptive discovery (use ReAct)
Speed-critical applications with parallelizable tasks (use LLMCompiler for faster execution)

Pattern 3: ReWOO (Reasoning Without Observation)

What It Is

ReWOO introduces a three-module architecture that completely separates planning from execution. The Planner generates a complete multi-step plan before any tool execution using “foreseeable reasoning”—predicting needed information without observing actual results. Plans use variable placeholders (#E1, #E2, #E3) to reference future evidence, enabling subsequent steps to depend explicitly on prior results without waiting for actual observations.

Critical innovation: Variable substitution—planning occurs using placeholders rather than actual tool outputs, eliminating the need to wait for observations during the reasoning phase. Tasks can reference previous outputs using syntax like #E2 (e.g., Search[Stats for #E2]).

Three-Module Architecture

Planner: Generates complete reasoning graph with variable placeholders (#E1, #E2, #E3) before any tool execution. Plans what information is needed without seeing actual results.
Worker: Executes tools based on the Planner’s blueprint, populating evidence variables with actual results—this phase involves no LLM reasoning, just pure execution.
Solver: Receives the complete plan plus all evidence and synthesizes the final answer, prompted to use evidence “with caution” to handle potential errors. Can partially compensate for Planner or Worker failures.

Architecture Flow

graph LR
    A[User Query] --> B[Planner]
    B --> C[Worker]
    C --> D[Solver]
    D --> E[Final Answer]

    B1[Plan with placeholders<br/>#E1, #E2, #E3] -.-> B
    C1[Execute tools<br/>populate evidence] -.-> C
    D1[Synthesize with<br/>complete evidence] -.-> D

    style A fill:#fff,stroke:#333,stroke-width:2px
    style B fill:#e8e8e8,stroke:#333,stroke-width:2px
    style C fill:#c8c8c8,stroke:#333,stroke-width:2px
    style D fill:#a8a8a8,stroke:#333,stroke-width:2px
    style E fill:#888,stroke:#333,stroke-width:2px
    style B1 fill:#f8f8f8,stroke:#666,stroke-width:1px,stroke-dasharray: 5 5
    style C1 fill:#f8f8f8,stroke:#666,stroke-width:1px,stroke-dasharray: 5 5
    style D1 fill:#f8f8f8,stroke:#666,stroke-width:1px,stroke-dasharray: 5 5

Performance Metrics

Token efficiency: On HotpotQA, ReWOO consumed 1,986 tokens vs ReAct’s 9,795 tokens (5× token efficiency), translating to $3.97 per 1,000 queries vs $19.59 for ReAct (80% cost reduction).

Accuracy improvements: HotpotQA 42.4% vs 40.8% for ReAct, TriviaQA 66.6% vs 59.4%, StrategyQA 66.6% vs 64.6%, SOTUQA 70.2% vs 64.8% (8% absolute improvement).

Strengths

Dramatic token efficiency: 5× token reduction (1,986 vs 9,795 tokens), 80% cost reduction ($3.97 vs $19.59 per 1K queries)
Focused context per task: Each task has only required context (input + variable values) rather than full history
Improved accuracy: 8% absolute improvement across benchmarks
Eliminates prompt redundancy: Question and context fed only twice (Planner and Solver) vs every step in ReAct
Robustness under tool failure: 29.2% accuracy drop vs ReAct’s 40.8% drop when tools fail, saves 110 tokens during failure
Explicit dependency tracking: Variable flow through #E references makes reasoning traceable and debugging straightforward

Limitations

Sequential execution bottleneck: Tasks execute one after another (total time = sum of tool times)—LLMCompiler addresses this with parallelization
Planning rigidity: Once committed to a plan, execution proceeds regardless of observations; cannot dynamically switch tools or revise approaches mid-execution
Initial plan blind spots: Reasoning happens before any observation, might miss edge cases that would be discovered during exploration
Tool count sensitivity: Performance degraded from 42% with 2 tools to 37% with 7 tools
Real-time interactive applications: Not suitable for scenarios requiring adaptive strategy based on intermediate results
No dynamic replanning: Unlike Plan-then-Execute, ReWOO doesn’t have a replanner component

When to Use ReWOO

Predictable multi-hop question answering where information dependencies are clear (“Find X, then use X to find Y”)
Complex multi-theory queries requiring synthesis across multiple data sources
High-volume production systems where cost is critical (80% cost reduction vs ReAct)
Curated tool environments with 2-5 well-defined complementary tools

When to Avoid ReWOO

Exploratory tasks requiring adaptive discovery or trial-and-error (use ReAct)
Large tool sets with >5 options
Dynamic environments needing adaptive strategy based on intermediate results
Real-time interactive applications or scenarios with highly uncertain tool reliability

Pattern 4: LLMCompiler (Parallel Function Calling)

What It Is

LLMCompiler draws inspiration from classical compiler design to optimize agent execution through parallel function calling. The framework decomposes user queries into Directed Acyclic Graphs (DAGs) representing tasks with explicit inter-dependencies, then executes independent tasks concurrently. This extends beyond ReWOO’s sequential execution to achieve true parallelization while maintaining dynamic replanning capabilities.

Critical innovation vs ReWOO: LLMCompiler supports two key capabilities explicitly: (1) parallel function calling reducing latency and cost, and (2) dynamic replanning for problems whose execution flow cannot be determined statically upfront.

Three-Component Architecture

Planner: Generates task sequences with dependencies forming a DAG, identifying necessary tasks, input arguments, and inter-dependencies using placeholder variables ($1, $2, $3). Can stream tasks as they’re generated, hiding planning latency behind tool execution through instruction pipelining.
Task Fetching Unit: Schedules and dispatches tasks as soon as dependencies are satisfied using a greedy policy. Replaces placeholder variables with actual outputs from completed tasks without requiring dedicated LLM calls.
Executor: Receives independent tasks and runs them asynchronously in parallel, with each task having dedicated memory for intermediate outcomes.

Architecture Flow

graph TD
    A[User Query] --> B[Planner: Generate DAG with dependencies]
    B --> C{Task Fetching Unit}
    C --> D[Task 1: $1]
    C --> E[Task 2: $2]
    C --> F[Task 3: $3]
    D --> G{Dependencies Satisfied?}
    E --> G
    F --> G
    G -->|Yes| H[Executor: Run parallel tasks]
    H --> I[Task 4: Uses $1, $2]
    H --> J[Task 5: Uses $3]
    I --> K{Replanner: Continue or Finish?}
    J --> K
    K -->|Finish| L[Final Answer]
    K -->|Continue| C

    style A fill:#fff,stroke:#333,stroke-width:2px
    style B fill:#e8e8e8,stroke:#333,stroke-width:2px
    style C fill:#d0d0d0,stroke:#333,stroke-width:2px
    style D fill:#c0c0c0,stroke:#333,stroke-width:2px
    style E fill:#c0c0c0,stroke:#333,stroke-width:2px
    style F fill:#c0c0c0,stroke:#333,stroke-width:2px
    style G fill:#b0b0b0,stroke:#333,stroke-width:2px
    style H fill:#a0a0a0,stroke:#333,stroke-width:2px
    style I fill:#989898,stroke:#333,stroke-width:2px
    style J fill:#989898,stroke:#333,stroke-width:2px
    style K fill:#888,stroke:#333,stroke-width:2px
    style L fill:#707070,stroke:#333,stroke-width:2px

Performance Claims

Up to 3.7× latency speedup (Movie Recommendation: 5.47s vs 20.47s for ReAct)
6.73× cost reduction on some benchmarks
35% faster execution than OpenAI’s proprietary parallel function calling
9% accuracy improvement (ParallelQA: 68.14% vs 59.59% for ReAct)

Strengths

Dramatic performance efficiency: Up to 3.7× latency speedup and 3-7× cost reduction through parallel execution—total time equals longest single tool per dependency level rather than sum of all tools
Quality improvements from upfront planning: DAG planning prevents common ReAct failure modes including premature early stopping (85% of cases), repetitive loops (10% of cases), and context pollution from intermediate observations
Instruction pipelining optimization: Streaming task generation hides planning latency behind tool execution, with Task Fetching Unit dispatching tasks as soon as dependencies are satisfied
Dynamic replanning capability: Unlike ReWOO’s rigid commit-and-execute, supports replanning for problems whose execution flow cannot be determined statically
Architecture flexibility: Model-agnostic design demonstrated across multiple model families enabling cost-quality trade-offs through model tiering

Limitations

Parallelization limitations (Amdahl’s Law): Planner overhead (~1.88s) can’t be parallelized, straggler effects mean slowest task determines completion time (1.13s vs 0.61s average), and speedup is highly workload-dependent (3.7× for 8-way parallel vs 1.8× for 2-way)
Implementation and debugging complexity: Requires DAG scheduler, task fetching logic, and parallel execution infrastructure making it significantly more complex than ReAct/P-t-E—parallel execution complicates error tracing
Requires parallelizable workflows: Sequential dependencies or complex causal chains see minimal benefit (Game of 24: 2.89× vs Movie Recommendation: 3.7×)
Production readiness concerns: Newer pattern (Dec 2023) with less battle-testing than ReAct, unknown tool count sensitivity at 7+ tools
Replanning overhead trade-off: Dynamic replanning adds latency compared to ReWOO’s commit-and-execute approach

When to Use LLMCompiler

Speed-critical applications demanding fastest execution (1.8-3.7× faster with parallel workflows)
Embarrassingly parallel workflows with multiple independent data fetches running concurrently
Cost-sensitive high-volume systems where 3-6× cost reduction matters at scale
Clear task dependencies where dependency graphs are predictable

When to Avoid LLMCompiler

Sequential workflows where tasks have long dependency chains (minimal parallelization benefit)
Exploratory tasks with unpredictable dependencies making DAG planning difficult (use ReAct)
Resource-constrained environments unable to support parallel execution infrastructure
Immature tooling concerns if production validation and battle-testing are critical

Pattern 5: Reflexion (Self-Reflective Iterative Improvement)

What It Is

Reflexion introduces verbal self-reflection and iterative refinement to agent architectures. After generating an initial solution, the agent reflects on failures by producing natural language feedback about what went wrong and how to improve. This reflection is stored in an episodic memory buffer and provided as context for subsequent trials, enabling the agent to learn from mistakes within a task without parameter updates.

Critical innovation: Unlike traditional RL which updates model weights through backpropagation, Reflexion stores verbal reflections in episodic memory and provides them as additional context. This enables learning within a task through language-based feedback rather than requiring retraining.

Three-Component Architecture

Actor: Generates text and actions based on state observations and reflection memory (typically a ReAct agent)
Evaluator: Scores outputs using task-specific heuristics, learned reward models, or binary success/failure signals. Provides feedback on what worked and what didn’t.
Self-Reflection: Generates verbal reinforcement cues from evaluation signals and trajectory history. Creates natural language summaries of failure patterns (e.g., “Search query was too specific, try broader terms” or “Missed validating data sources before analysis”)

Architecture Flow

graph TD
    A[User Query] --> B[Actor: Generate initial solution using ReAct/P-t-E/etc]
    B --> C[Evaluator: Score output success/failure/quality]
    C --> D{Success criteria met?}
    D -->|Yes| E[Return Final Answer]
    D -->|No| F[Self-Reflection: Analyze failures generate verbal feedback]
    F --> G[Store reflection in Episodic Memory Buffer]
    G --> H[Actor: Retry with reflection context Trial 2, 3, ... N]
    H --> I[Evaluator: Re-score new attempt]
    I --> J{Success or max trials reached?}
    J -->|Success| E
    J -->|Max trials| K[Return best attempt]
    J -->|Continue| F

    style A fill:#fff,stroke:#333,stroke-width:2px
    style B fill:#e8e8e8,stroke:#333,stroke-width:2px
    style C fill:#d8d8d8,stroke:#333,stroke-width:2px
    style D fill:#c8c8c8,stroke:#333,stroke-width:2px
    style E fill:#707070,stroke:#333,stroke-width:2px
    style F fill:#b8b8b8,stroke:#333,stroke-width:2px
    style G fill:#a8a8a8,stroke:#333,stroke-width:2px
    style H fill:#989898,stroke:#333,stroke-width:2px
    style I fill:#888,stroke:#333,stroke-width:2px
    style J fill:#787878,stroke:#333,stroke-width:2px
    style K fill:#707070,stroke:#333,stroke-width:2px

Performance Claims

20-25% success rate improvement on complex tasks (ALFWorld: 97% vs 75% baseline, +22%)
Game of 24 improved from 4% to 74% with 3 reflections (+70 percentage points)
HumanEval code generation reached 91% pass@1 (vs 80% without reflection, +11%)
3-12× cost increase due to multiple trial iterations

Strengths

Dramatic quality improvements through iterative refinement: 20-70 percentage point success rate gains (Game of 24: 4%→74%, ALFWorld: 75%→97%) by learning from failures across trials—particularly effective for long-horizon tasks requiring 50+ steps
Verbal self-reflection with episodic memory: Core innovation enabling learning within a task without parameter updates—agent generates human-interpretable failure analyses stored in memory buffer, preventing repeated mistakes
Complementary architecture wrapper: Unlike other patterns that replace ReAct/P-t-E, Reflexion wraps around any existing Actor pattern as a quality-enhancing meta-layer—can add reflection to ReAct, Plan-then-Execute, or ReWOO without changing core architecture
Adaptive learning from evaluation signals: Supports flexible evaluation approaches including task-specific heuristics, learned reward models, or LLM-as-evaluator—reflections guide Actor toward more promising action spaces

Limitations

Multi-trial cost-latency multiplication: 3-12× cost increase and proportional latency impact (3 trials = 3× execution time) makes it incompatible with real-time requirements or cost-constrained high-volume systems
Evaluation quality dependency: Requires reliable evaluator providing meaningful signals—weak evaluators produce poor reflections leading to no improvement or degradation
No cross-task generalization: Reflections are task-specific ephemeral learning—agent doesn’t improve at new tasks unlike fine-tuning which generalizes across problem types
Only valuable for failure-prone tasks: ROI exists only when baseline success rate <80%—high-success tasks see minimal benefit from reflection overhead
Production deployment challenges: Long reflection histories consume context window requiring pruning strategies, unknown interaction with large tool sets, and memory buffer management complexity

When to Use Reflexion

Quality-critical applications where accuracy/completeness outweigh cost (executive reports, compliance docs)
High first-attempt failure rate (<80% success) where reflection enables learning from mistakes
Complex multi-dimensional analysis where missing aspects is common failure mode
Latency-tolerant scenarios like batch processing or overnight report generation

When to Avoid Reflexion

Cost-constrained applications where 3-12× cost increase is unacceptable for high-volume queries
Real-time requirements where multiple trial latency is incompatible with user-facing interactions
High baseline success rate (>80%) where marginal benefit doesn’t justify cost
No quality evaluator available—requires reliable scoring mechanism for meaningful reflections

Other Notable Patterns

Tree-of-Thought / Graph-of-Thought

Explore multiple reasoning branches with backtracking and scoring. Useful for generating multiple hypotheses and selecting the best via evaluation.

graph TD
    A[User Query] --> B[Thought Branch 1]
    A --> C[Thought Branch 2]
    A --> D[Thought Branch 3]
    B --> E[Evaluate]
    C --> F[Evaluate]
    D --> G[Evaluate]
    E --> H{Select Best}
    F --> H
    G --> H
    H --> I[Answer]

    style A fill:#fff,stroke:#333,stroke-width:2px
    style B fill:#d8d8d8,stroke:#333,stroke-width:2px
    style C fill:#d8d8d8,stroke:#333,stroke-width:2px
    style D fill:#d8d8d8,stroke:#333,stroke-width:2px
    style E fill:#b0b0b0,stroke:#333,stroke-width:2px
    style F fill:#b0b0b0,stroke:#333,stroke-width:2px
    style G fill:#b0b0b0,stroke:#333,stroke-width:2px
    style H fill:#888,stroke:#333,stroke-width:2px
    style I fill:#707070,stroke:#333,stroke-width:2px

Use case: “Generate 3 different architectural approaches for scaling microservices” → evaluate each → pick best. Game of 24 improved from 4% → 74% through search-based reasoning.

Decision Framework: Choosing the Right Pattern

Why Planning Patterns Matter (vs ReAct)

Planning patterns (Plan-then-Execute, ReWOO, LLMCompiler) promise improvements over traditional ReAct-style agents:

Speed: Execute multi-step workflows faster since the large agent doesn’t need to be consulted after each action
Cost: Significant cost savings over ReAct through model tiering (30-50% cost reduction without accuracy loss)
Quality: Can perform better overall by forcing the planner to explicitly “think through” all steps required

Performance Comparison Summary

⏰ Speed: LLMCompiler > ReWOO ≥ P-t-E > ReAct »> Reflexion

💸 Cost: ReWOO ≥ LLMCompiler > P-t-E > ReAct »> Reflexion

🏆 Quality: Reflexion > LLMCompiler ≥ ReWOO ≥ P-t-E ≥ ReAct

Pattern Comparison Table

Pattern	Best For	Speed	Cost	Quality	Complexity	Tool Limit
ReAct (current)	Exploratory, unknown complexity, adaptive workflows	Moderate (3-7 steps)	Moderate	Good	Low	7 MCP servers at limit
Plan-then-Execute	Well-scoped repeatable workflows, compliance/audit	Fast (fewer LLM calls)	Low (40-50% reduction)	Good-Excellent	Medium	Unknown
ReWOO	Predictable multi-hop, token efficiency critical	Fast (no reasoning loops)	Very Low (80% reduction)	Good-Excellent	Medium	7 tools at degradation threshold
LLMCompiler	Speed-critical, parallel workflows, clear dependencies	Fastest (1.8-3.7× speedup)	Very Low (3-6× reduction)	Good-Excellent	High	Unknown
Reflexion	Quality-critical, failure-prone tasks, batch processing	Slowest (2-10 trials)	Very High (3-12× increase)	Excellent	Medium	Unknown
Multi-Agent Supervisor	>10 tools, domain specialization needed	Moderate	Moderate	Excellent	High	Specialist: 3-5 tools each

Quick Decision Cues

Ambiguity high, info unknown → ReAct ✅ — adapts during exploration
Workflow known, repeatable → Plan-then-Execute — predictable + cost-efficient
Complex reasoning with predictable operations → ReWOO — 80% cost reduction, 5× token efficiency
Speed critical, clear task dependencies → LLMCompiler — 1.8-3.7× speedup via parallel DAG execution
Quality critical, time flexible → Reflexion on top of any pattern — 20-70% success rate improvement at 3-12× cost
Exploration and solution diversity needed → Tree/Graph-of-Thought — hypothesis generation
>7 tools, domain specialization → Multi-Agent Supervisor — split into specialists with 3-5 tools each

Pattern Selection Examples

Exploratory research (“Find security vulnerabilities in codebase”) → ReAct — unknown complexity, adaptive discovery
Multi-source analysis (“Compare pricing across competitors + market trends + customer reviews”) → LLMCompiler — parallel data fetching
Well-scoped reports (“Q3 sales performance analysis”) → Plan-then-Execute — predictable, auditable workflow
Open-ended questions (“What’s the best database for my use case?”) → ReAct — needs exploration and context gathering
Speed-critical lookups (“Real-time stock portfolio dashboard”) → LLMCompiler — fastest parallel execution
Quality-critical outputs (“Investment recommendation report”) → Reflexion — iterative refinement for accuracy

Emerging Hybrid Approaches

Real-world production systems increasingly combine multiple patterns rather than using them in isolation. These hybrid architectures leverage complementary strengths while mitigating individual weaknesses:

1. ReWOO + ReAct Fallback (Graceful Degradation)

Pattern: Start with ReWOO for efficiency; fallback to ReAct on failure

Trigger: If ReWOO plan execution returns empty results or evaluator scores output as low-quality

Benefit: Get 80% cost reduction on successful cases, full adaptability on edge cases

Use case: Predictable multi-hop queries (95% success with ReWOO) with ReAct handling edge cases requiring adaptive exploration

Implementation: Wrap ReWOO in try-catch; on failure, invoke ReAct with full context

2. LLMCompiler + Reflexion (Speed + Quality)

Pattern: Use LLMCompiler for fast parallel execution; add Reflexion layer for quality-critical outputs

Benefit: 1.8-3.7× speedup with 20-25% quality improvement on complex analyses

Use case: Financial reports or research briefs requiring both speed (user-facing) and quality (accuracy-critical)

Trade-off: 1st trial fast (LLMCompiler), 2nd trial expensive (full reflection) but only on failures

Implementation: LLMCompiler as Actor in Reflexion framework; evaluator triggers re-trial if needed

3. Plan-then-Execute with Multi-Agent Workers (Scale + Structure)

Pattern: Planner generates structured plan; route steps to specialized worker agents; replanner coordinates

Benefit: Handles >10 tools by domain specialization while maintaining deterministic workflows

Use case: Comprehensive market research requiring Web Search agent + Data Analysis agent + Report Synthesis agent

Tool distribution:

Worker 1 (Research Agent): 3 tools (WebSearch, DocumentRetrieval, PDFExtraction)
Worker 2 (Analysis Agent): 3 tools (DataAggregation, StatisticalAnalysis, Visualization)
Worker 3 (Synthesis Agent): 2 tools (ReportGeneration, ChartCreation)

Implementation: Planner identifies which specialist per step; supervisor routes to workers; replanner evaluates

4. Reflexion with Model Diversity (X-MAS Pattern)

Pattern: Each Reflexion trial uses different LLM for ensemble quality

Benefit: 70% vs 23.33% accuracy from heterogeneous models (X-MAS research, 2025)

Use case: Critical analyses where consensus across models increases confidence

Trade-off: 3-5 trials × 3 models = 9-15× cost, but dramatic quality improvement

Implementation: Different frontier model per trial with cross-model reflections for ensemble learning

5. ReWOO + Dynamic Tool Loading (Adaptive Efficiency)

Pattern: ReWOO Planner generates plan; Worker dynamically loads only required tools

Benefit: Mitigates tool selection degradation (42% → 37% with 7 tools) by reducing active tool count per query

Use case: Multi-domain analysis where different queries need different tool subsets

Tool loading: Query about “Weather patterns” loads only [WeatherAPI, HistoricalData, Forecasting]; “Stock analysis” loads [MarketData, NewsAPI, FinancialStatements]

Implementation: Planner identifies required tools; Worker initializes only subset; Solver synthesizes

6. Hierarchical Planning (Two-Level P-t-E)

Pattern: Strategic Planner creates high-level phases; Tactical Planner details each phase; Executor runs steps

Benefit: Addresses planning rigidity by allowing phase-level replanning without full plan regeneration

Use case: Long-horizon analyses (quarterly reviews, annual reports, research projects) with evolving requirements

Example flow:

Strategic Plan: [Phase 1: Data Collection] → [Phase 2: Analysis] → [Phase 3: Report Synthesis]
Tactical Plan for Phase 1: [Fetch market data] → [Download competitor reports] → [Extract key metrics]
After Phase 1: Tactical Replanner adjusts Phase 2 based on Phase 1 outcomes

Implementation: Nested P-t-E agents; strategic replanner decides whether to continue/revise next phase

7. ReAct with Cached Plans (Learning Pattern Library)

Pattern: ReAct agent builds episodic memory of successful reasoning traces; retrieves similar patterns for new queries

Benefit: Combines ReAct’s adaptability with P-t-E’s efficiency through learned templates

Use case: Recurring query types (“market analysis for X sector”, “code review for Y framework”) that follow similar trajectories

Memory structure: Vector database storing {query_embedding, successful_tool_sequence, outcome_quality}

Retrieval: New query → find top-3 similar past queries → inject their tool sequences as “suggested approach” → ReAct adapts if needed

Implementation: LangChain Memory + vector store; inject retrieved sequences into system prompt

Choosing Hybrid Approaches

Production maturity critical → ReWOO + ReAct Fallback (battle-tested components)
Budget available, quality paramount → LLMCompiler + Reflexion (best of both worlds)
Tool count >10 → P-t-E + Multi-Agent Workers (specialization at scale)
Mission-critical decisions → Reflexion + Model Diversity (consensus across LLMs)
Recurring query patterns → ReAct + Cached Plans (learn from experience)
Long-horizon workflows → Hierarchical Planning (phase-level adaptation)

Conclusion

The choice of agentic design pattern significantly impacts your system’s cost, speed, and quality. While ReAct remains a solid default for exploratory tasks, planning patterns like ReWOO and LLMCompiler offer dramatic efficiency gains (80% cost reduction, 3.7× speedup) for predictable workflows. For quality-critical applications, Reflexion’s iterative improvement delivers 20-70% success rate gains at higher cost.

The future lies in hybrid approaches that combine complementary strengths—using ReWOO for efficiency with ReAct fallback for edge cases, or LLMCompiler for speed enhanced with Reflexion for quality. As these patterns mature and tool ecosystems expand beyond 7+ tools, multi-agent architectures with specialized workers become increasingly essential.

Key takeaway: There’s no one-size-fits-all solution. Understand your constraints (cost, latency, quality requirements), evaluate your task characteristics (predictable vs exploratory, sequential vs parallelizable), and choose—or combine—patterns accordingly.

Key Insights for Production Systems

Tool Limits Matter:

ReAct: At upper limit (research shows 2% performance with 7+ domains in calendar scheduling)
ReWOO: At degradation threshold (42% → 37% performance with 2 → 7 tools)
LLMCompiler: Unknown tool count sensitivity; research focused on smaller tool sets
Multi-Agent Supervisor: Best option if expanding beyond 7 tools—split into specialists

Production Trends (End of 2024):

LangGraph adoption: 43% of organizations
ReAct pattern: 39.8% of production implementations
Top concern: Quality/performance (45.8%), Cost second (22.4%)
Best practice: 5-10 tools per agent, multi-agent for larger tool sets

References

Core Pattern Papers

ReAct: Yao et al., ICLR 2023 - “ReAct: Synergizing Reasoning and Acting in Language Models”
- Introduces interleaved reasoning and acting with 27.4% HotpotQA, 71% ALFWorld, 6% hallucination rate
- Establishes baseline for modern agentic systems with thought-action-observation cycle
Plan-and-Solve: Wang et al., ACL 2023 - “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning”
- PS+ prompting technique: 70.4% → 76.7% average accuracy on arithmetic reasoning
- MultiArith: 83.8% → 91.8%, GSM8K: 56.4% → 59.3%
- Inspiration for Plan-then-Execute agentic framework
ReWOO: Xu et al., 2023 - “ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models”
- 5× token efficiency (1,986 vs 9,795 tokens), 80% cost reduction ($3.97 vs $19.59 per 1K queries)
- Variable substitution (#E1, #E2, #E3) enables foreseeable reasoning without observations
- HotpotQA: 42.4% vs 40.8% ReAct with dramatic cost savings
LLMCompiler: Kim et al., UC Berkeley, ICML 2024 - “An LLM Compiler for Parallel Function Calling”
- Up to 3.7× latency speedup, 6.73× cost reduction through DAG-based parallel execution
- Beats OpenAI parallel function calling by 35% through instruction pipelining
- HotpotQA: 1.80× speedup with 3.37× cost reduction
Reflexion: Shinn et al., NeurIPS 2023 - “Reflexion: Language Agents with Verbal Reinforcement Learning”
- Self-reflection with episodic memory: ALFWorld 97% vs 75% baseline (+22%)
- Game of 24: 4% → 74% with 3 reflections, HumanEval: 91% vs 80%
- Verbal feedback without model fine-tuning, 3-12× cost increase

Foundational Techniques

Chain-of-Thought: Wei et al., NeurIPS 2022 - “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”
- Establishes step-by-step reasoning as core prompting technique
- Foundation for ReAct’s reasoning traces and Plan-and-Solve improvements
Tree-of-Thoughts: Yao et al., NeurIPS 2023 - “Tree of Thoughts: Deliberate Problem Solving with Large Language Models”
- Explores multiple reasoning branches with backtracking
- Game of 24: 4% → 74% through search-based reasoning

Multi-Agent & Advanced Architectures

Multi-Agent Collaboration: LangGraph documentation on supervisor patterns
- Hub-and-spoke topology with specialist agents handling 3-5 tools each
- 50% performance improvement when tools properly grouped by domain
- Addresses tool selection overload at 7+ tools
X-MAS (Heterogeneous Multi-Agent Systems): 2025 - “X-MAS: Solving Math Word Problems via Cross-Model Augmented Self-Correction”
- 70% accuracy with heterogeneous models vs 23.33% homogeneous (+46.67 points)
- Model diversity through ensemble of different frontier models
- Validates Reflexion with Model Diversity hybrid approach

Security & Reliability

Plan-then-Execute Security: Del Rosario et al., 2025 - “Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations”
- Control-flow integrity through planning-execution separation
- Defense-in-depth strategies: least privilege, sandboxing, human-in-the-loop
- Resilience to indirect prompt injection attacks

Benchmarking & Evaluation

AI Agents That Matter: Princeton & Allen Institute for AI, July 2024
- Simple baselines often outperform complex architectures when cost-controlled
- Emphasizes importance of rigorous evaluation and fair comparison
- Challenges inflated performance claims in agent research
τ-bench: Sierra AI, 2024
- Industry standard for realistic agent evaluation with retail/airline scenarios
- <50% success rates reveal gap between research benchmarks and production reality
- Emphasizes need for practical, grounded agent assessment

Implementation Resources

LangChain Blog: Planning Agents (Feb 2024)
- Compares Plan-and-Execute, ReWOO, and LLMCompiler implementations
- Production insights on speed/cost/quality tradeoffs
- Model tiering strategies for 40-50% cost reduction
LangGraph Tutorials: Official Documentation
- ReAct Agent from Scratch
- Plan-and-Execute - Full implementation with state tracking
- ReWOO - Variable substitution examples
- LLMCompiler - DAG scheduling implementation
BabyAGI: GitHub Repository (Nakajima, 2023)
- Early autonomous agent with task management and prioritization
- Inspiration for task decomposition patterns

Evaluation Tooling

τ-bench: Sierra AI, 2024 - Realistic retail/airline agent evaluation
LangSmith: LangChain’s observability and testing platform for agent traces
LangFuse: Open-source LLM observability and monitoring
HumanEval/MBPP: Code generation benchmarks (used for Reflexion evaluation)
HotpotQA: Multi-hop question answering requiring 2-3 Wikipedia passages
ALFWorld: Embodied AI tasks in text-based household environments (134 tasks)
WebShop: E-commerce navigation with 1.18M real products
GSM8K: Grade-school math word problems (2-8 reasoning steps)

Production Insights

LangSmith Production Trends (End of 2024):
- LangGraph adoption: 43% of organizations
- ReAct pattern: 39.8% of production implementations
- Top concern: Quality/performance (45.8%), Cost second (22.4%)
Tool Selection Research (2024):
- Performance degrades with 10+ tools even with capable models
- Calendar scheduling: 2% success with 7+ domains (GPT-4o)
- Best practice: 5-10 tools per agent, multi-agent for larger tool sets

This article is based on research and practical experience implementing agentic systems. For the complete source material with additional details, visit the PDF source.

What is an Agent?#

The Challenge: Moving Beyond Basic Agents#

Pattern 1: ReAct (Reason + Act) — The Current Standard#

What It Is#

Architecture Flow#

Strengths#

Limitations#

When to Use ReAct#

When to Avoid ReAct#

Pattern 2: Plan-then-Execute#

What It Is#

Architecture Flow#

Strengths#

Limitations#

When to Use Plan-then-Execute#

When to Avoid Plan-then-Execute#

Pattern 3: ReWOO (Reasoning Without Observation)#

What It Is#

Three-Module Architecture#

Architecture Flow#

Performance Metrics#

Strengths#

Limitations#

When to Use ReWOO#

When to Avoid ReWOO#

Pattern 4: LLMCompiler (Parallel Function Calling)#

What It Is#

Three-Component Architecture#

Architecture Flow#

Performance Claims#

Strengths#

Limitations#

When to Use LLMCompiler#

When to Avoid LLMCompiler#

Pattern 5: Reflexion (Self-Reflective Iterative Improvement)#

What It Is#

Three-Component Architecture#

Architecture Flow#

Performance Claims#

Strengths#

Limitations#

When to Use Reflexion#

When to Avoid Reflexion#

Other Notable Patterns#

Tree-of-Thought / Graph-of-Thought#

Decision Framework: Choosing the Right Pattern#

Why Planning Patterns Matter (vs ReAct)#

Performance Comparison Summary#

Pattern Comparison Table#

Quick Decision Cues#

Pattern Selection Examples#

Emerging Hybrid Approaches#

1. ReWOO + ReAct Fallback (Graceful Degradation)#

2. LLMCompiler + Reflexion (Speed + Quality)#

3. Plan-then-Execute with Multi-Agent Workers (Scale + Structure)#

4. Reflexion with Model Diversity (X-MAS Pattern)#

5. ReWOO + Dynamic Tool Loading (Adaptive Efficiency)#

6. Hierarchical Planning (Two-Level P-t-E)#

7. ReAct with Cached Plans (Learning Pattern Library)#

Choosing Hybrid Approaches#

Conclusion#

Key Insights for Production Systems#

References#

Core Pattern Papers#

Foundational Techniques#

Multi-Agent & Advanced Architectures#

Security & Reliability#

Benchmarking & Evaluation#

Implementation Resources#

Evaluation Tooling#

Production Insights#

What is an Agent?

The Challenge: Moving Beyond Basic Agents

Pattern 1: ReAct (Reason + Act) — The Current Standard

What It Is

Architecture Flow

Strengths

Limitations

When to Use ReAct

When to Avoid ReAct

Pattern 2: Plan-then-Execute

What It Is

Architecture Flow

Strengths

Limitations

When to Use Plan-then-Execute

When to Avoid Plan-then-Execute

Pattern 3: ReWOO (Reasoning Without Observation)

What It Is

Three-Module Architecture

Architecture Flow

Performance Metrics

Strengths

Limitations

When to Use ReWOO

When to Avoid ReWOO

Pattern 4: LLMCompiler (Parallel Function Calling)

What It Is

Three-Component Architecture

Architecture Flow

Performance Claims

Strengths

Limitations

When to Use LLMCompiler

When to Avoid LLMCompiler

Pattern 5: Reflexion (Self-Reflective Iterative Improvement)

What It Is

Three-Component Architecture

Architecture Flow

Performance Claims

Strengths

Limitations

When to Use Reflexion

When to Avoid Reflexion

Other Notable Patterns

Tree-of-Thought / Graph-of-Thought

Decision Framework: Choosing the Right Pattern

Why Planning Patterns Matter (vs ReAct)

Performance Comparison Summary

Pattern Comparison Table

Quick Decision Cues

Pattern Selection Examples

Emerging Hybrid Approaches

1. ReWOO + ReAct Fallback (Graceful Degradation)

2. LLMCompiler + Reflexion (Speed + Quality)

3. Plan-then-Execute with Multi-Agent Workers (Scale + Structure)

4. Reflexion with Model Diversity (X-MAS Pattern)

5. ReWOO + Dynamic Tool Loading (Adaptive Efficiency)

6. Hierarchical Planning (Two-Level P-t-E)

7. ReAct with Cached Plans (Learning Pattern Library)

Choosing Hybrid Approaches

Conclusion

Key Insights for Production Systems

References

Core Pattern Papers

Foundational Techniques

Multi-Agent & Advanced Architectures

Security & Reliability

Benchmarking & Evaluation

Implementation Resources

Evaluation Tooling

Production Insights