Most conversational RAG systems work surprisingly well.  They answer simple questions perfectly… and then completely fall apart on anything that requires more than one step of reasoning.

The problem isn’t retrieval. It’s query structure.

In a conversational RAG search agent, the simplest and most direct path is straightforward: one query, one search, one response.

This is what we call simple search. While it likely covers the vast majority of use cases, sooner or later you will need to address the remaining, more complex scenarios. But what exactly makes a query complex? And how can we systematically categorize complex queries to gain confidence when testing?

Without a clear understanding of the patterns behind query decomposition, you risk compensating with quantity over quality—writing more and more tests in an attempt to build confidence, rather than designing fewer, well-targeted tests that truly validate the system’s behavior.

Without being exhaustive, here are the most common query patterns I was able to identify, and how a conversational search system should treat them.

A simple search contains a single intent and can be resolved with one retrieval step.

Who produced Star Wars?

There is no ambiguity, no dependency, and no need for orchestration. One query goes in, one search is performed, and one answer comes out.

This is the happy path—and the baseline against which everything else should be measured.

Multi-Query (but Not Composite)

Moving up the complexity ladder, the next pattern to understand is the multi-query.

Who is the Star Wars producer? What is a cinema?

This looks more complex at first glance, but it is not a composite query.

Why?

  • These are two unrelated questions
  • There is no semantic or logical dependency
  • Each question can be answered independently

From a system perspective, this should be treated as multiple simple searches, not a single composite one. The only added responsibility is segmentation, not reasoning.

This distinction is important: confusing multi-query inputs with composite queries often leads to unnecessary orchestration and over-engineering.

Composite Queries

A composite query is a single user input that contains multiple related information needs that share a common context which must be decomposed and resolved before producing a unified answer.

Who is the Star Wars producer and who played Han Solo?

Here, the system must:

  1. Identify multiple intents
  2. Decompose them into sub-queries
  3. Execute searches
  4. Merge the results into a coherent response

Not all composite queries are equal, though. They come in several distinct patterns, each with different implications for retrieval, reasoning, and testing.

Conjunctive Composite Queries

Pattern: Multiple related questions joined by and, also, or similar connectors.

Who produced Star Wars and who played Han Solo?

Each sub-query is independent, but they are logically grouped and expected to be answered together.

System behavior

  • Decompose into parallel searches
  • Aggregate results
  • Ensure no intent is dropped

This is often the entry point into composite query handling.

Dependent Composite Queries

Pattern: The second question depends on the answer to the first.

Who directed Star Wars and what other movies did he direct?

You cannot answer the second part without first resolving the entity from the first.

System behavior

  • Resolve the first entity
  • Carry context forward
  • Issue a follow-up query using the derived entity

This introduces statefulness, even in otherwise stateless systems.

Multi-Hop Queries

Pattern: Dependencies exist, but they are implicit, not explicitly stated.

Which actor from Star Wars won an Oscar?

To answer this, the system must:

  1. Identify actors in Star Wars
  2. Check which of them won an Oscar
  3. Return the matching result

This is where search turns into reasoned retrieval, and where RAG systems tend to fail if they rely solely on naive retrieval.

Comparative Composite Queries

Pattern: Two or more entities must be compared across a dimension.

Who had a longer acting career, Harrison Ford or Mark Hamill?

This requires:

  • Parallel retrieval
  • Normalization (dates, durations, metrics)
  • Explicit comparison logic

The challenge here is not finding information, but aligning it correctly.

Conditional Composite Queries

Pattern: A query that introduces a condition before executing a follow-up.

If Star Wars was produced by George Lucas, what other films did he produce?

System behavior

  • Evaluate the condition
  • Execute the second query only if it holds
  • Explain the outcome clearly

This adds control flow to search, something most RAG systems are not designed for by default.

Enumerative Composite Queries

Pattern: A request to list or enumerate multiple items.

List the main Star Wars movies and their release years.

Each item may require its own retrieval, but the real challenge is:

  • Completeness
  • Ordering
  • Consistent structure

These queries are deceptively simple but notoriously hard to validate.

Why This Taxonomy Matters

If all complex queries are treated the same, testing becomes reactive:

  • More tests
  • More edge cases
  • Less confidence

By classifying composite queries into clear patterns, you can instead test:

  • Correct query classification
  • Correct decomposition strategy
  • Correct orchestration
  • Correct result aggregation

This shifts confidence from test quantity to test quality, and gives you a principled way to reason about multi-hop search behavior in conversational RAG systems.

Mapping Query Patterns to Agent Architectures

Once you start identifying query patterns—multi-query, conjunctive, dependent, multi-hop, comparative, conditional—the next natural question is:

How should my agent be architected to handle them correctly?

The answer is not “one prompt vs many agents”.
In practice, multi-hop search is less about agent count and more about forcing an explicit plan → execute → verify loop, so the system doesn’t stop after the first hop or hallucinate the rest.

From experience, there are three strategies that work best in practice, ranging from simplest to most robust. Each maps naturally to different levels of query complexity.

Strategy 1: Single Agent with a Multi-Search Tool Loop

(Best default, covers most cases)

This is the simplest architecture that actually works for composite and multi-hop queries.

You use one agent, but with strong constraints:

  • It is explicitly allowed (and expected) to call search multiple times
  • It must decompose the query into sub-questions
  • It must verify that all sub-questions are answered before responding

Conceptually, the agent is forced into an internal loop:

Decompose → Search → Read → Synthesize → Verify gaps → Search again if needed

A critical detail is the termination rule:

If any required entity, constraint, or fact is unverified, do another search or explicitly state what’s missing.

What patterns this handles well

  • Conjunctive composite queries
  • Dependent composite queries
  • Shallow multi-hop queries
  • Most enumerative queries

Why this works so well

  • Minimal orchestration overhead
  • No cross-agent “telephone game”
  • Easy to debug: you can log search calls and checkpoints
  • Easy to test: you validate the sequence, not just the final answer

For most teams building conversational RAG systems, this should be the starting point.

Strategy 2: Planner + Executor Split

(Best reliability / complexity tradeoff)

When single-agent loops start failing—usually by:

  • stopping after the first hop,
  • missing constraints,
  • or answering without evidence—

the next step is not more agents, but separating planning from execution.

This strategy introduces two explicit roles (which can be two agents or one agent in two forced phases):

Planner

  • Analyzes the user query
  • Classifies the query pattern
  • Produces a search plan
  • Defines acceptance criteria (what must be verified)

Importantly:
The Planner is not allowed to answer the question.

Executor

  • Executes the plan step by step
  • Calls search multiple times
  • Collects evidence
  • Reports results back with sources

What patterns this excels at

  • Dependent composite queries
  • True multi-hop queries
  • Conditional queries
  • Queries with hidden assumptions

Why this helps

  • The Planner focuses purely on decomposition and hop structure
  • The Executor focuses on accuracy and evidence
  • You can unit-test plans independently from execution
  • Failure modes become observable and actionable

This is usually the point where systems stop feeling brittle.

Strategy 3: Multi-Agent Specialists

(Only when the search space is genuinely large)

This is the most powerful—and most dangerous—approach.

You introduce specialized agents, for example:

  • A query generator for broad recall
  • A retriever/reader for precise extraction
  • A critic or verifier whose job is to find missing hops or contradictions

These agents can run in parallel, feeding into a single final synthesizer.

What patterns justify this

  • Large enumerations
  • Comparative queries across many entities
  • Research-style multi-hop queries
  • High-recall scenarios where missing one item is unacceptable

Why this is rarely the right default

  • High token and latency cost
  • Harder to reason about failures
  • Risk of inconsistent partial answers
  • Requires strong control over who produces the final answer

This architecture should be earned, not assumed.

Pattern → Architecture Mapping

Putting it all together:

Query PatternRecommended Architecture
Simple searchSingle agent, single call
Multi-query (unrelated)Query segmentation + simple agents
Conjunctive compositeSingle agent, multi-search loop
Dependent compositeSingle agent loop → Planner/Executor
Multi-hopPlanner + Executor
ConditionalPlanner + Executor
Comparative / large enumerationMulti-agent specialists

The Real Lever: Verification, Not Agent Count

Across all three strategies, the most important design element is not how many agents you have, but whether the system is forced to verify completeness.

Practical techniques that matter more than architecture diagrams:

  • Explicit hop checklists
  • Clear stop conditions (“all hops have evidence”)
  • Structured evidence objects
  • A final “what’s unproven?” self-check
  • Controlled backtracking when assumptions fail

Most systems fail at multi-hop not because they lack agents, but because they answer too early.

Rule of Thumb

  • Most teams should use:
    One agent + multi-search loop + explicit verification
  • If correctness matters a lot:
    Add a Planner/Executor split
  • If recall and coverage matter a lot:
    Add specialists—but keep a single finalizer

Architecture should follow query structure, not the other way around.