Most Conversational RAG Systems Work

Most conversational RAG systems work surprisingly well. They answer simple questions perfectly… and then completely fall apart on anything that requires more than one step of reasoning.

The problem isn’t retrieval. It’s query structure.

In a conversational RAG search agent, the simplest and most direct path is straightforward: one query, one search, one response.

This is what we call simple search. While it likely covers the vast majority of use cases, sooner or later you will need to address the remaining, more complex scenarios. But what exactly makes a query complex? And how can we systematically categorize complex queries to gain confidence when testing?

Without a clear understanding of the patterns behind query decomposition, you risk compensating with quantity over quality—writing more and more tests in an attempt to build confidence, rather than designing fewer, well-targeted tests that truly validate the system’s behavior.

Without being exhaustive, here are the most common query patterns I was able to identify, and how a conversational search system should treat them.

Simple Search

A simple search contains a single intent and can be resolved with one retrieval step.

Who produced Star Wars?

There is no ambiguity, no dependency, and no need for orchestration. One query goes in, one search is performed, and one answer comes out.

This is the happy path—and the baseline against which everything else should be measured.

Multi-Query (but Not Composite)

Moving up the complexity ladder, the next pattern to understand is the multi-query.

Who is the Star Wars producer? What is a cinema?

This looks more complex at first glance, but it is not a composite query.

Why?

These are two unrelated questions
There is no semantic or logical dependency
Each question can be answered independently

From a system perspective, this should be treated as multiple simple searches, not a single composite one. The only added responsibility is segmentation, not reasoning.

This distinction is important: confusing multi-query inputs with composite queries often leads to unnecessary orchestration and over-engineering.

Composite Queries

A composite query is a single user input that contains multiple related information needs that share a common context which must be decomposed and resolved before producing a unified answer.

Who is the Star Wars producer and who played Han Solo?

Here, the system must:

Identify multiple intents
Decompose them into sub-queries
Execute searches
Merge the results into a coherent response

Not all composite queries are equal, though. They come in several distinct patterns, each with different implications for retrieval, reasoning, and testing.

Conjunctive Composite Queries

Pattern: Multiple related questions joined by and, also, or similar connectors.

Who produced Star Wars and who played Han Solo?

Each sub-query is independent, but they are logically grouped and expected to be answered together.

System behavior

Decompose into parallel searches
Aggregate results
Ensure no intent is dropped

This is often the entry point into composite query handling.

Dependent Composite Queries

Pattern: The second question depends on the answer to the first.

Who directed Star Wars and what other movies did he direct?

You cannot answer the second part without first resolving the entity from the first.

System behavior

Resolve the first entity
Carry context forward
Issue a follow-up query using the derived entity

This introduces statefulness, even in otherwise stateless systems.

Multi-Hop Queries

Pattern: Dependencies exist, but they are implicit, not explicitly stated.

Which actor from Star Wars won an Oscar?

To answer this, the system must:

Identify actors in Star Wars
Check which of them won an Oscar
Return the matching result

This is where search turns into reasoned retrieval, and where RAG systems tend to fail if they rely solely on naive retrieval.

Comparative Composite Queries

Pattern: Two or more entities must be compared across a dimension.

Who had a longer acting career, Harrison Ford or Mark Hamill?

This requires:

Parallel retrieval
Normalization (dates, durations, metrics)
Explicit comparison logic

The challenge here is not finding information, but aligning it correctly.

Conditional Composite Queries

Pattern: A query that introduces a condition before executing a follow-up.

If Star Wars was produced by George Lucas, what other films did he produce?

System behavior

Evaluate the condition
Execute the second query only if it holds
Explain the outcome clearly

This adds control flow to search, something most RAG systems are not designed for by default.

Enumerative Composite Queries

Pattern: A request to list or enumerate multiple items.

List the main Star Wars movies and their release years.

Each item may require its own retrieval, but the real challenge is:

Completeness
Ordering
Consistent structure

These queries are deceptively simple but notoriously hard to validate.

Why This Taxonomy Matters

If all complex queries are treated the same, testing becomes reactive:

More tests
More edge cases
Less confidence

By classifying composite queries into clear patterns, you can instead test:

Correct query classification
Correct decomposition strategy
Correct orchestration
Correct result aggregation

This shifts confidence from test quantity to test quality, and gives you a principled way to reason about multi-hop search behavior in conversational RAG systems.

Mapping Query Patterns to Agent Architectures

Once you start identifying query patterns—multi-query, conjunctive, dependent, multi-hop, comparative, conditional—the next natural question is:

How should my agent be architected to handle them correctly?

The answer is not “one prompt vs many agents”.
In practice, multi-hop search is less about agent count and more about forcing an explicit plan → execute → verify loop, so the system doesn’t stop after the first hop or hallucinate the rest.

From experience, there are three strategies that work best in practice, ranging from simplest to most robust. Each maps naturally to different levels of query complexity.

Strategy 1: Single Agent with a Multi-Search Tool Loop

(Best default, covers most cases)

This is the simplest architecture that actually works for composite and multi-hop queries.

You use one agent, but with strong constraints:

It is explicitly allowed (and expected) to call search multiple times
It must decompose the query into sub-questions
It must verify that all sub-questions are answered before responding

Conceptually, the agent is forced into an internal loop:

Decompose → Search → Read → Synthesize → Verify gaps → Search again if needed

A critical detail is the termination rule:

If any required entity, constraint, or fact is unverified, do another search or explicitly state what’s missing.

What patterns this handles well

Conjunctive composite queries
Dependent composite queries
Shallow multi-hop queries
Most enumerative queries

Why this works so well

Minimal orchestration overhead
No cross-agent “telephone game”
Easy to debug: you can log search calls and checkpoints
Easy to test: you validate the sequence, not just the final answer

For most teams building conversational RAG systems, this should be the starting point.

Strategy 2: Planner + Executor Split

(Best reliability / complexity tradeoff)

When single-agent loops start failing—usually by:

stopping after the first hop,
missing constraints,
or answering without evidence—

the next step is not more agents, but separating planning from execution.

This strategy introduces two explicit roles (which can be two agents or one agent in two forced phases):

Planner

Analyzes the user query
Classifies the query pattern
Produces a search plan
Defines acceptance criteria (what must be verified)

Importantly:
The Planner is not allowed to answer the question.

Executor

Executes the plan step by step
Calls search multiple times
Collects evidence
Reports results back with sources

What patterns this excels at

Dependent composite queries
True multi-hop queries
Conditional queries
Queries with hidden assumptions

Why this helps

The Planner focuses purely on decomposition and hop structure
The Executor focuses on accuracy and evidence
You can unit-test plans independently from execution
Failure modes become observable and actionable

This is usually the point where systems stop feeling brittle.

Strategy 3: Multi-Agent Specialists

(Only when the search space is genuinely large)

This is the most powerful—and most dangerous—approach.

You introduce specialized agents, for example:

A query generator for broad recall
A retriever/reader for precise extraction
A critic or verifier whose job is to find missing hops or contradictions

These agents can run in parallel, feeding into a single final synthesizer.

What patterns justify this

Large enumerations
Comparative queries across many entities
Research-style multi-hop queries
High-recall scenarios where missing one item is unacceptable

Why this is rarely the right default

High token and latency cost
Harder to reason about failures
Risk of inconsistent partial answers
Requires strong control over who produces the final answer

This architecture should be earned, not assumed.

Pattern → Architecture Mapping

Putting it all together:

Query Pattern	Recommended Architecture
Simple search	Single agent, single call
Multi-query (unrelated)	Query segmentation + simple agents
Conjunctive composite	Single agent, multi-search loop
Dependent composite	Single agent loop → Planner/Executor
Multi-hop	Planner + Executor
Conditional	Planner + Executor
Comparative / large enumeration	Multi-agent specialists

The Real Lever: Verification, Not Agent Count

Across all three strategies, the most important design element is not how many agents you have, but whether the system is forced to verify completeness.

Practical techniques that matter more than architecture diagrams:

Explicit hop checklists
Clear stop conditions (“all hops have evidence”)
Structured evidence objects
A final “what’s unproven?” self-check
Controlled backtracking when assumptions fail

Most systems fail at multi-hop not because they lack agents, but because they answer too early.

Rule of Thumb

Most teams should use:
One agent + multi-search loop + explicit verification
If correctness matters a lot:
Add a Planner/Executor split
If recall and coverage matter a lot:
Add specialists—but keep a single finalizer

Architecture should follow query structure, not the other way around.

Most Conversational RAG Systems Work — Until They Don’t

Simple Search

Multi-Query (but Not Composite)

Composite Queries

Conjunctive Composite Queries

Dependent Composite Queries

Multi-Hop Queries

Comparative Composite Queries

Conditional Composite Queries

Enumerative Composite Queries

Why This Taxonomy Matters

Mapping Query Patterns to Agent Architectures

Strategy 1: Single Agent with a Multi-Search Tool Loop

What patterns this handles well

Why this works so well

Strategy 2: Planner + Executor Split

Planner

Executor

What patterns this excels at

Why this helps

Strategy 3: Multi-Agent Specialists

What patterns justify this

Why this is rarely the right default

Pattern → Architecture Mapping

The Real Lever: Verification, Not Agent Count

Rule of Thumb