AI Hallucinations: When No Answer Is the Best Answer

Read the entire series

Understanding Agentic AI: The Next Frontier of Artificial Intelligence

Agentic AI in Commerce: What It Is and Why It Matters

Amazon’s “Buy for Me” Marks the Rise of Agentic Commerce - And it’s Just the Beginning

Architecting for AI Agents: Why Structured and Unstructured Data Both Matter

Agentic AI vs Generative AI: What Is the Difference?

Agentic AI vs. AI Agents: How to Differentiate

AI Hallucinations: When No Answer Is the Best Answer

AI Innovation Is Only as Good as Your Search Relevance

How to Unlock Agentic AI With the Coveo MCP Server

Empowering Enterprise AI with Coveo: Guide to Agentic AI Systems and Key APIs

We all have that relative who weighs in on every conversation — invited or not — because they have an opinion about everything. Mine was my stepfather. Bless him, he just knew something about everything, and he always sounded so convincing. I loved him dearly even though it took me a decade to unlearn everything wrong that he taught me.

Generative AI is kind of like that. It can, and will, answer every question you ask without hesitation and only a person who understands the topic will know if the answer is correct (or maybe my stepfather?).

Large language models sometimes produce confident but factually incorrect outputs — so-called “hallucinations” — because they’re designed to predict the most probable next token based on patterns in their training data, not verify facts against an external source. Without a built-in fact-checking step or explicit grounding, they can assemble perfectly plausible-sounding sentences that have no basis in reality.

These models also lack an explicit mechanism for uncertainty or abstention. Rather than having a way to say “I don’t know,” they simply continue generating the highest-likelihood text, which often means inventing details when the training data offer no clear answer. Mitigating hallucinations typically involves integrating external retrieval, confidence estimation, or verification layers to ensure claims are supported by real evidence.

What Are AI Hallucinations?

Let’s unpack that a bit. LLMs are a type of generative AI that was trained on billions of pages of text to predict the next token (or word). LLMs don’t understand the text they are trained on, but by looking at so many examples, they learn to mimic the style, the context, and the flow of human language, according to the book Creative Prototyping with Generative AI.

In other words, they use complex statistics to analyze huge amounts of written language to derive a probable answer to a question. They have no ability to reason or discern if the answers they generate are correct. Importantly, they also cannot provide an answer that is not already present in the text that programmers used to train them.

So, when you ask a generative AI tool a question it does not have the answer to, it literally cannot tell you that it doesn’t know. Instead, it comes up with the most probable answer based on exactly, and only, the information inside of its model.

There is the now infamous May 2024 example of Google’s AI saying to use glue to keep cheese from sliding off of pizza.

By June of 2024, Google’s AI was incorporating text talking about the error into its search results, illustrating how incorrect information can lead to ever more incorrect answers.

Of course, glue is not a valid food ingredient, but this is a perfect example of an LLM deriving an answer from its available inputs without a basic understanding of what food is. It’s worth noting that doing the same search in May of 2025 yields no AI summary for the search.

AI hallucinations can also be present in agentic AI. Consider this: generative AI typically makes a single call to an LLM, and can come back with a hallucinated answer. Agentic AI often makes multiple calls to an LLM, which compounds the possibility of AI hallucinations occurring (not to mention the number of).

And since AI agents could have access to enterprise systems and agency to make changes in those systems, well, the effects of agentic AI hallucinations have further reaching consequences.

This brings us to the topic of guardrails — how we prevent AI hallucinations.

Guardrails Can Decrease Hallucinations

Often, when egregious errors get attention, programmers will include a specific exception into gen AI responses. This guardrail is great for preventing further bad PR, but doesn’t address the underlying problem, which is that LLMs don’t know what makes for realistic food ingredients.

Take the prompt “fire ants delicious”. Google AI’s summary happily tells us what fire ants taste like and that they are edible.

But ask if it’s safe to eat venomous ants, we are cautioned not to do so.

The gen AI is deriving a response from its given inputs, but the responses contradict each other. LLM programmers correct these specific off-base responses individually by writing rules. Rules can be very effective at targeting specific situations, but rule-based systems also have a problem accounting for concepts that developers and programmers have not anticipated.

Our fire ants example is a good illustration, since eating fire ants is not common. But not all questions are so obviously wrong to people, and people who receive incorrect answers to their prompts won’t have the benefit of a ginned up example to show how hallucinations work.

Guardrails Come From Developers and Users

Developers also design guardrails from the beginning by encoding ethical, safety, and policy guidelines. Jamie A Sandhu wrote, “A benefit of this is that developers can responsibly influence the outputs and responses of these technologies through design choices.” The classic example is that ChatGPT will not provide information on building bombs or committing crimes.

Guardrails also can come from users in the form of prompt engineering. In our fire ants example, I asked Google for “fire ant recipes”, “are fire ants good to eat”, and “fire ant nutrition” before using the two prompts that were successful for what I was looking for.

Prompt engineering is exactly what it sounds like — carefully crafting your question to “provide the model with context, instructions, and examples that help it understand your intent and respond in a meaningful way. Think of it as providing a roadmap for the AI, steering it towards the specific output you have in mind,” Google writes.

To me, the most interesting part of that is that users can give LLMs information to help make the models better. Gen AI takes prompts — which can be a phrase, a question, code, mathematical equations, structured data, or images — and uses them to provide context to make its response better. Users can then make additional queries to refine those results — and the model incorporates all of this into the next query and response.

When the general population was first introduced to gen AI a few years ago, we quickly realized that asking the right question was so important that a new discipline was conceived — prompt engineering — and others wrote books about how to do it well. It reminds me of the days when I needed to get information from SharePoint databases and had to take cookies to the one dude who knew how to query SharePoint for the data I needed.

And, like the need to crack into SharePoint databases, we know that it’s not sustainable or scalable to rely on one or two people for prompt engineering. Instead, being good at asking questions of gen AI is now part of everyday work, and that asking the right question can rein in the responses you get to more narrow, likely-accurate results.

Guardrails for Organizations Include RAG

There is a lot of conversation and research going into how to make gen AI results that people can know to be true. The biggest hindrance these efforts face, ironically, is their large, generalized data inputs. These models were trained primarily on data gathered from the internet.

To make gen AI work in more specific cases, we need context. The way we craft our prompts can help, as we just discussed, but so can providing data that centers on a specific area.

For organizations that means using an approach called Retrieval Augmented Generation, or RAG, to get results that are both relevant and accurate to their business. RAG helps mitigate AI’s biggest challenges by retrieving real, relevant data from inside and outside of the enterprise. “RAG helps large language models (LLMs) deliver more relevant responses at a higher quality”, according to IBM.

In other words, it fills a gap in how LLMs work by using your own data.

How RAG works

In a nutshell, RAG frameworks traditionally follow three steps:

Retrieval: The retrieval model searches data to identify the most relevant documents or text chunks related to the user query.
Augmentation: The retrieved information is then combined with the original query to create an enriched prompt that provides context for the generative model.
Generation: The generative model uses the enriched prompt to create a response that incorporates the relevant information from the retrieved data.

But in an enterprise environment, RAG encounters many of the same issues that more traditional information retrieval systems do:

Data scattered across many places
Data stored in many formats
Data serving many functions
Data that lacks context

Overcoming these obstacles requires an approach that incorporates

Unified index to bring together diverse content spread across your systems
Hybrid ranking models that use a variety of approaches to return the most relevant results to users
Ability to ingest data on a constant and consistent basis to ensure that an LLM is using current data
Advanced security controls so that everyone who is allowed to see data — and only those people — can see data

As part of the gen AI maturation process, we have begun to see software companies offer out-of-the-box RAG solutions that can be customized to enterprise needs. But, like gen AI itself, being able to trust the results may mean finding reliable partners for implementing artificial intelligence solutions that reduce artificial intelligence hallucinations and increase relevant results.

Coveo already has a proven track record doing so.

Relevant reading: Putting the ‘R’ in RAG: How Advanced Data Retrieval Turns GenAI Into Enterprise Ready Experiences

Coveo Helps Enterprises Create Successful RAG

GenAI in Financial Services: Vanguard

Coveo’s AI-Relevance™ Platform helped Vanguard, the asset management firm, to overcome these challenges and reduce AI hallucinations, across 14 million documents spread across systems such as SharePoint, ServiceNow, and various internal databases.

Rather than consolidating content into a single system, the platform acts as a central intelligence layer, enabling:

Unified index centralizing access across multiple knowledge sources without costly data migration
Personalized, context-driven knowledge discovery experiences tailored to each employee’s needs —only drawing on the content they’re authorized to see
AI-powered retrieval of trusted content before generating responses

Since implementation, Vanguard has seen high employee engagement and subsequent gains in efficiency, in part because they can ensure that their AI-generated content meets regulatory standards.

Relevant reading: Generative AI in Financial Services: How Vanguard Trailblazes at Secure and Compliant Innovation

GenAI for Website Experiences: United Airlines

United Airlines uses Coveo to provide relevant search results to end users, as well as protect its brand against both malformed questions and bad actors. The site gets a wide variety of queries, often kind of muddled. For example, a user might ask “can I fly with my pet?”.

There are three levels of responses, much as you would see on a Google search page result. First, Gen AI provides a summary response from United’s own content that it shows at the top of the query.

Second, users see the results that the Gen AI used to generate its summary response – providing the source, if you will, of the summary as well as the ability to take a deeper dive into the subject.

Third, users will see related topics, ranked by relevance. This approach uses United’s own content to enhance Gen AI results.

In addition, Coveo also helps guard against hallucinations when users ask ambiguous questions that may create confusion, such as “Can I fly with my kids in a checked bag.” This query has several potential meanings, including:

Can I fly with children?
Can I check a bag?
Or the implied questions, such as “Do I have to pay for checked bags?” and “If I fly with children, do I have to check my bags?”

We all know that United allows you to travel with children, and it allows you to check a bag. Kids younger than two can sit in your lap and sometimes travel at no extra cost, while older children must have their own seats.

The answer to both parts of this query then is, “It depends.”

Vaguely worded questions with multiple potential answers will likely return bad information, and Coveo helps protect against that by returning no summary at all. Instead, it suggests content related to both parts of the question so users can read to answer their questions.

Relevant reading: Under the Hood with United Airlines

AI-Relevance Platform Delivers Real Business Impact

Enterprises that supplement GenAI with advanced retrieval-augmented generation (RAG) don’t just improve response quality — they create a secure, scalable, and precise system of intelligence. But there’s a catch: the success of GenAI hinges on how well you retrieve the right data, in real time, from across your ecosystem.

This is where Coveo stands out.

Coveo transforms your fragmented data landscape into a single, enterprise-grade intelligence layer that delivers only the most relevant, verifiable knowledge — eliminating hallucinations, streamlining discovery, and turning AI into measurable ROI. Our approach ensures your AI is not only fast and fluent but also trustworthy and traceable.

If your GenAI strategy isn’t delivering real business impact yet, the retrieval method you choose may be why.

Curious what the best retrieval method looks like for RAG and LLM applications? Watch our on-demand webinar to get all the details.

Relevant viewing

The Best Retrieval Method for Your RAG & LLM Applications

Learn more

Ready to see a demo on how to gain the AI-Experience Advantage?

Yes! I don't want to frustrate users

I need to think about it