You’ve built your custom RAG pipeline. It works in dev. Then you try to:
| Task | Obstacle |
| Connect it to SharePoint | Permissions are a nightmare |
| Add Salesforce | Different API, different auth |
| Scale to 10M documents | Retrieval slows to a crawl |
| Handle users who shouldn’t see certain content | Now you’re rebuilding access control |
Building RAG infrastructure from scratch means solving problems that have nothing to do with your actual application.
At Coveo, we’ve taken 15+ years of building enterprise search infrastructure and turned it into a platform we call Retrieval Augmented Generation (RAG)-as-a-Service. With agentic architectures maturing, the RAG layer becomes essential. To respond to the market we put our cloud-native offering behind a Coveo-hosted MCP server, designed to bring more precision, security, and scalability to agentic projects.
It’s permission-aware, API-first, production-ready retrieval infrastructure. Built for enterprise engineers who want to extend the capabilities of their copilots, agents, assistants, and AI-powered search.
Not familiar with RAG? Start here instead.
Under the Hood
RAG-as-a-Service connects your content sources to your LLM, enriching each question with the right context before generation:

The Coveo platform handles:
- Incremental indexing with permission sync
- Lexical/keyword search and semantic retrieval
- Semantic embedding and vector search
- Machine learning and analytics
- Passage-level retrieval and ranking
- Runtime permission enforcement
Why Engineering Teams Choose RAG-as-a-Service
Your LLM is only as good as its context.
Whether you’re building a workplace assistant to help employees or a customer-facing chatbot, hallucinations aren’t acceptable. Coveo unifies content access in one place so your LLMs always have access to current and permission-aware information.
Governance isn’t optional.
You’re working with enterprise data. That means respecting access controls, data sensitivity, and compliance requirements. RAG-as-a-Service enforces document-level permissions and works with both structured and unstructured content sources, without requiring data migration.
Reduce time-to-production.
Skip the work of stitching together vector databases, connectors, access control layers, and retrieval logic. RAG-as-a-Service gives you search, passage retrieval, and answer delivery—modular components ready to plug into your app.
The Core APIs
Think of RAG-as-a-Service as infrastructure for building AI experiences. Each API handles a different part of the retrieval and generation workflow:
Passage Retrieval API (PR API)
PR API returns specific text passages from your indexed documents, ranked by relevance. Send a query, get back the most relevant snippets with permission filtering already applied. Built on top of the Search API infrastructure, so your existing query pipelines work here too.

The following example shows the payload to retrieve passages from a Coveo index:
{
"query": "What are the benefits of using solar energy?",
"filter": "@source==\"acme\"",
"additionalFields": [
"clickableuri"
],
"maxPassages": 5,
"searchHub": "Main",
"localization": {
"locale": "en-CA",
"timezone": "America/Montreal"
},
"context": {
"userAgeRange": "25-35",
"userRoles": [
"PremiumCustomer",
"ProductReviewer"
]
}
}
The following example shows a response payload from the Passage Retrieval API:
{
"items": [
{
"text": "Solar energy has several benefits including reducing electricity bills, providing a renewable energy source, and lowering carbon footprint.\",\n",
"relevanceScore": 0.95,
"document": {
"title": "The Benefits of Solar Energy",
"primaryid": "GAYEG6LHNB2DQ4LLNBKVEUSHKUXDCMZWGEZS4ZDFMZQXK3DU",
"clickableuri": "https://example.com/search/document-solar-energy"
}
},
{
// another item
}
],
"responseId": "c0857557-5579-4f5e-8958-9befd7d1d4a8"
}
Integrates with Amazon Bedrock, Microsoft Copilot, or your own orchestration layer.
Answer API (In Open Beta)
Answer API generates answers from your data using Relevance Generative Answering. Handles the LLM call and citation generation for you, or use it alongside your own LLM workflow.

The following example shows the request for a payload from the Answer API:
POST https://<YOUR_ORG_REST_URL>/answer/v1/configs/<CONFIG_ID>/generate
{
"q": "Which gloves are better for autumn?",
"searchHub": "sports",
"pipeline": "Sport goods pipeline",
// ...
} Returns streaming server-sent events with the generated answer text and citations. The stream sends header info, then answer text chunks, then citations, then an end-of-stream event.
Search API
The Search API provides standard relevance-ranked search with ML-powered ranking. Returns full document results instead of passages.
The following example shows a search request to the Search API:
payload = {
"q": query,
"searchHub": "sports",
"pipeline": "Sports goods pipeline"
} The following example shows a response payload from the Search API:
{
...
"duration": 35,
"groupByResults": [
...
],
"indexDuration": 11,
"requestDuration": 33,
"results": [
{
"clickUri": "https://example.com/bookstore/books/authors/arthur-conan-doyle/adventures-of-sherlock-holmes",
"excerpt": "The Adventures of Sherlock Holmes, a collection of 12 Sherlock Holmes tales ... written by Sir Arthur Conan Doyle and published in 1892 ...",
"excerptHighlights": [
{
"length": 10,
"offset": 4
}
],
"percentScore": 75.0698,
"printableUri": "https://example.com/bookstore/books/authors/arthur-conan-doyle/adventures-of-sherlock-holmes",
"printableUriHighlights": [
{
"length": 10,
"offset": 63
}
],
"raw": {
"date": 1532631456000,
"author": "Arthur Conan Doyle",
"documenttype": "Book",
"filename": "adventures-of-sherlock-holmes.html",
"filetype": "html",
"indexeddate": 1532631456000,
"language": [
"English"
],
"permantentid": "ecc3fac22085f2712c8cd2144f9d195593710963dc2202b5256f8a4f5f6",
"size": 50683,
"source": "Books",
"sourcetype": "Push",
"title": "The Adventures of Sherlock Holmes",
...
},
"score": 4904,
"title": "The Adventures of Sherlock Holmes",
"titleHighlights": [
{
"length": 10,
"offset": 4
}
],
"uniqueId": "42.19751$https://example.com/bookstore/books/authors/arthur-conan-doyle/adventures-of-sherlock-holmes",
"uri": "https://example.com/bookstore/books/authors/arthur-conan-doyle/adventures-of-sherlock-holmes"
},
...
],
"searchUid": "7beff9c1-98f3-401c-ac16-10b90a8b810f",
"totalCount": 60,
"totalCountFiltered": 60,
...
}
Good for exploration, fallback scenarios, or when users want to browse full results rather than get direct answers.
Fetch API
Full-document retrieval for use cases where you need complete context rather than passages. Currently in beta. Get more information here.
Each API is powered by Coveo’s platform: enterprise AI models, hundreds of connectors to SaaS platforms, advanced indexing, permission enforcement, and flexible deployment options.
Real Implementation Examples
Build Your Own Copilot
Use the Passage Retrieval API to feed your LLM with secure, filtered content from your internal sources. The API respects document-level permissions at query time, so users only see passages from documents they have access to.
How it works:
- User asks a question in your copilot interface
- Your app sends the query to the Passage Retrieval API with the user’s authentication context
- PR API returns relevant passages (max 20) with relevance scores
- You construct a prompt with these passages as context
- Send to your LLM of choice (OpenAI, Anthropic, your own model)
- Return the grounded response to the user
The Passage Retrieval API handles indexing, permission filtering, semantic ranking, and passage extraction. You control the prompt engineering and response generation. Works with Bedrock, Copilot, or custom orchestration.
Power a Chatbot That Actually Answers
Use Answer API for a complete question-answering flow. The Answer API uses an external LLM to generate responses, so you don’t need to manage LLM infrastructure.
How it works:
- User asks a question
- Your app calls the Answer API with the query
- Answer API retrieves relevant passages, generates an answer, and includes citations
- Returns a streaming response with answer text and source documents
- Your UI displays the answer with clickable citations
The Answer API handles indexing, permission filtering, semantic ranking, passage extraction, prompt engineering, and response generation. Works with Bedrock, Copilot, or custom orchestration.
Add Contextual Search to Any App
Use the Search API for standard search functionality across your content. Unlike the PR API which returns passages, the Search API returns complete document results with titles, URIs, and excerpts.
How it works:
- User enters a search query
- Your app sends the query to the Search API
- API returns ranked results with highlighted excerpts
- Display results as a list with titles, snippets, and links
Good for support portals, intranets, or product documentation where users want to explore results rather than get a single answer. The Search API also supports faceting and grouping for filtering large result sets.
Future-Proof with MCP Server
Use the Coveo MCP Server to equip your agent with the search APIs it needs. Through the Model Context Protocol, all Coveo retrieval APIs become standardized tools accessible to any MCP-compatible client, such as Claude Desktop, ChatGPT or custom agents.
How it works:
- Enable and configure the Coveo MCP Server in the Coveo platform
- Connect it to your MCP-compatible client (Claude Desktop, custom agent, etc.) by using API credentials
- Your client gets access to all four APIs as tools: search, passage retrieval, answer generation, and fetch
Why this approach:
- Future-proof: as you add new use cases, you already have access to all enterprise data without rewriting integrations
- Works with any MCP-compatible client, giving you flexibility to switch tools
- You still control the prompts that guide when each tools gets used
The MCP Server acts as a bridge between standardized AI tooling and Coveo’s retrieval infrastructure. It’s available as an open-source project you can run locally or deploy to your infrastructure.

What We Don’t Do
- RAG-as-a-Service is infrastructure, not a framework. We don’t lock you in.
- We don’t force you into a specific LLM provider—use Salesforce Agentforce, AWS Bedrock, Microsoft Azure, or your own models
- We don’t require you to migrate your data—connect to your existing sources in place
- We don’t lock you into a monolithic platform—use our APIs where they make sense
Use what works for your stack. Bring your own LLM, use your existing orchestration tools, and integrate at whatever layer makes sense for your application.
| When Building Your Own RAG Makes Sense | When RAG-as-a-Service Makes Sense |
| – You need extreme customization of retrieval algorithms beyond standard configuration – Your data patterns are highly unusual and don’t fit standard indexing approaches – You have dedicated ML engineers focused on search infrastructure – You want full control over every component in the stack | – You need to ship in weeks, not quarters – Managing permissions across 50+ data sources isn’t your team’s core focus – Your team wants to focus on the application layer, not infrastructure – You need enterprise-grade relevance and security working out of the box – You want deployment flexibility (fully managed SaaS or API-only integration) |
We handle the infrastructure complexity and support the flexibility to build, letting your team focus on building features that solve actual problems for your users.
Built to Scale With You
RAG-as-a-Service isn’t just a product. It’s the retrieval layer behind Coveo’s agentic AI strategy. The same infrastructure powers:
- Coveo’s Relevance Generative Answering (CRGA)
- Agent orchestration frameworks using Bedrock and Microsoft Copilot
- Enterprise search built by engineering teams at Fortune 500 companies
Start small and scale up, or go from development to enterprise-wide deployment without rearchitecting.
Ready to Build?
Get cloud-native retrieval infrastructure that reduces hallucinations and respects permissions. The same system trusted by engineering teams building production AI applications.

