Most conversational RAG systems work surprisingly well. They answer simple questions perfectly… and then completely fall apart on anything that requires more than one step of reasoning.
The problem isn’t retrieval. It’s query structure.
In a conversational RAG search agent, the simplest and most direct path is straightforward: one query, one search, one response.
This is what we call simple search. While it likely covers the vast majority of use cases, sooner or later you will need to address the remaining, more complex scenarios. But what exactly makes a query complex? And how can we systematically categorize complex queries to gain confidence when testing?
Without a clear understanding of the patterns behind query decomposition, you risk compensating with quantity over quality—writing more and more tests in an attempt to build confidence, rather than designing fewer, well-targeted tests that truly validate the system’s behavior.
Without being exhaustive, here are the most common query patterns I was able to identify, and how a conversational search system should treat them.

Simple Search
A simple search contains a single intent and can be resolved with one retrieval step.
Who produced Star Wars?
There is no ambiguity, no dependency, and no need for orchestration. One query goes in, one search is performed, and one answer comes out.
This is the happy path—and the baseline against which everything else should be measured.
Multi-Query (but Not Composite)
Moving up the complexity ladder, the next pattern to understand is the multi-query.
Who is the Star Wars producer? What is a cinema?
This looks more complex at first glance, but it is not a composite query.
Why?
- These are two unrelated questions
- There is no semantic or logical dependency
- Each question can be answered independently
From a system perspective, this should be treated as multiple simple searches, not a single composite one. The only added responsibility is segmentation, not reasoning.
This distinction is important: confusing multi-query inputs with composite queries often leads to unnecessary orchestration and over-engineering.
Composite Queries
A composite query is a single user input that contains multiple related information needs that share a common context which must be decomposed and resolved before producing a unified answer.
Who is the Star Wars producer and who played Han Solo?
Here, the system must:
- Identify multiple intents
- Decompose them into sub-queries
- Execute searches
- Merge the results into a coherent response
Not all composite queries are equal, though. They come in several distinct patterns, each with different implications for retrieval, reasoning, and testing.
Conjunctive Composite Queries
Pattern: Multiple related questions joined by and, also, or similar connectors.
Who produced Star Wars and who played Han Solo?
Each sub-query is independent, but they are logically grouped and expected to be answered together.
System behavior
- Decompose into parallel searches
- Aggregate results
- Ensure no intent is dropped
This is often the entry point into composite query handling.
Dependent Composite Queries
Pattern: The second question depends on the answer to the first.
Who directed Star Wars and what other movies did he direct?
You cannot answer the second part without first resolving the entity from the first.
System behavior
- Resolve the first entity
- Carry context forward
- Issue a follow-up query using the derived entity
This introduces statefulness, even in otherwise stateless systems.
Multi-Hop Queries
Pattern: Dependencies exist, but they are implicit, not explicitly stated.
Which actor from Star Wars won an Oscar?
To answer this, the system must:
- Identify actors in Star Wars
- Check which of them won an Oscar
- Return the matching result
This is where search turns into reasoned retrieval, and where RAG systems tend to fail if they rely solely on naive retrieval.
Comparative Composite Queries
Pattern: Two or more entities must be compared across a dimension.
Who had a longer acting career, Harrison Ford or Mark Hamill?
This requires:
- Parallel retrieval
- Normalization (dates, durations, metrics)
- Explicit comparison logic
The challenge here is not finding information, but aligning it correctly.
Conditional Composite Queries
Pattern: A query that introduces a condition before executing a follow-up.
If Star Wars was produced by George Lucas, what other films did he produce?
System behavior
- Evaluate the condition
- Execute the second query only if it holds
- Explain the outcome clearly
This adds control flow to search, something most RAG systems are not designed for by default.
Enumerative Composite Queries
Pattern: A request to list or enumerate multiple items.
List the main Star Wars movies and their release years.
Each item may require its own retrieval, but the real challenge is:
- Completeness
- Ordering
- Consistent structure
These queries are deceptively simple but notoriously hard to validate.
Why This Taxonomy Matters
If all complex queries are treated the same, testing becomes reactive:
- More tests
- More edge cases
- Less confidence
By classifying composite queries into clear patterns, you can instead test:
- Correct query classification
- Correct decomposition strategy
- Correct orchestration
- Correct result aggregation
This shifts confidence from test quantity to test quality, and gives you a principled way to reason about multi-hop search behavior in conversational RAG systems.
Mapping Query Patterns to Agent Architectures
Once you start identifying query patterns—multi-query, conjunctive, dependent, multi-hop, comparative, conditional—the next natural question is:
How should my agent be architected to handle them correctly?
The answer is not “one prompt vs many agents”.
In practice, multi-hop search is less about agent count and more about forcing an explicit plan → execute → verify loop, so the system doesn’t stop after the first hop or hallucinate the rest.
From experience, there are three strategies that work best in practice, ranging from simplest to most robust. Each maps naturally to different levels of query complexity.
Strategy 1: Single Agent with a Multi-Search Tool Loop
(Best default, covers most cases)
This is the simplest architecture that actually works for composite and multi-hop queries.
You use one agent, but with strong constraints:
- It is explicitly allowed (and expected) to call search multiple times
- It must decompose the query into sub-questions
- It must verify that all sub-questions are answered before responding
Conceptually, the agent is forced into an internal loop:
Decompose → Search → Read → Synthesize → Verify gaps → Search again if needed
A critical detail is the termination rule:
If any required entity, constraint, or fact is unverified, do another search or explicitly state what’s missing.
What patterns this handles well
- Conjunctive composite queries
- Dependent composite queries
- Shallow multi-hop queries
- Most enumerative queries
Why this works so well
- Minimal orchestration overhead
- No cross-agent “telephone game”
- Easy to debug: you can log search calls and checkpoints
- Easy to test: you validate the sequence, not just the final answer
For most teams building conversational RAG systems, this should be the starting point.
Strategy 2: Planner + Executor Split
(Best reliability / complexity tradeoff)
When single-agent loops start failing—usually by:
- stopping after the first hop,
- missing constraints,
- or answering without evidence—
the next step is not more agents, but separating planning from execution.
This strategy introduces two explicit roles (which can be two agents or one agent in two forced phases):
Planner
- Analyzes the user query
- Classifies the query pattern
- Produces a search plan
- Defines acceptance criteria (what must be verified)
Importantly:
The Planner is not allowed to answer the question.
Executor
- Executes the plan step by step
- Calls search multiple times
- Collects evidence
- Reports results back with sources
What patterns this excels at
- Dependent composite queries
- True multi-hop queries
- Conditional queries
- Queries with hidden assumptions
Why this helps
- The Planner focuses purely on decomposition and hop structure
- The Executor focuses on accuracy and evidence
- You can unit-test plans independently from execution
- Failure modes become observable and actionable
This is usually the point where systems stop feeling brittle.
Strategy 3: Multi-Agent Specialists
(Only when the search space is genuinely large)
This is the most powerful—and most dangerous—approach.
You introduce specialized agents, for example:
- A query generator for broad recall
- A retriever/reader for precise extraction
- A critic or verifier whose job is to find missing hops or contradictions
These agents can run in parallel, feeding into a single final synthesizer.
What patterns justify this
- Large enumerations
- Comparative queries across many entities
- Research-style multi-hop queries
- High-recall scenarios where missing one item is unacceptable
Why this is rarely the right default
- High token and latency cost
- Harder to reason about failures
- Risk of inconsistent partial answers
- Requires strong control over who produces the final answer
This architecture should be earned, not assumed.
Pattern → Architecture Mapping
Putting it all together:
| Query Pattern | Recommended Architecture |
| Simple search | Single agent, single call |
| Multi-query (unrelated) | Query segmentation + simple agents |
| Conjunctive composite | Single agent, multi-search loop |
| Dependent composite | Single agent loop → Planner/Executor |
| Multi-hop | Planner + Executor |
| Conditional | Planner + Executor |
| Comparative / large enumeration | Multi-agent specialists |
The Real Lever: Verification, Not Agent Count
Across all three strategies, the most important design element is not how many agents you have, but whether the system is forced to verify completeness.
Practical techniques that matter more than architecture diagrams:
- Explicit hop checklists
- Clear stop conditions (“all hops have evidence”)
- Structured evidence objects
- A final “what’s unproven?” self-check
- Controlled backtracking when assumptions fail
Most systems fail at multi-hop not because they lack agents, but because they answer too early.
Rule of Thumb
- Most teams should use:
One agent + multi-search loop + explicit verification - If correctness matters a lot:
Add a Planner/Executor split - If recall and coverage matter a lot:
Add specialists—but keep a single finalizer
Architecture should follow query structure, not the other way around.

