New to Agentic AI? Before we dive into architecture, we recommend checking out my colleague Louis-Guillaume’s previous post. He explores the fundamentals of agentic systems, why they matter, and how Coveo’s APIs (Search, Passage Retrieval, Answer) provide the essential foundation for enterprise-ready agents.

The Journey from Demo to Production
When “Demo-Ready” Isn’t “Production-Ready”
We’ll follow the story of Barca, a fictional B2B SaaS company whose support organization is currently drowning in tickets, endless Slack threads, and documentation scattered across various sources of content.
Like many enterprises, Barca’s engineering team has already experimented with Generative AI. They tested a few “agent” proof-of-concepts (POCs) that looked impressive in isolated demos. However, nothing was truly production-ready. As soon as they attempted to roll these agents out to real users, the gaps became obvious:
- No consistent grounding: The agents often hallucinated because they lacked a connection to trusted, up-to-date knowledge.
- No permission enforcement: The POCs couldn’t distinguish between a junior support rep and a VP, risking data leaks.
- Fragmented retrieval: There was no unified layer to search across Jira, Confluence, SharePoint or any other simultaneously.
- Statelessness: With no reliable memory or identity, every conversation felt like the first one.
- Black box operations: There was no observability to track errors, latency, or reasoning steps.
- Ops Complexity: Because they built custom modules and orchestration scripts from scratch, the team was stuck managing ad-hoc scaling and infrastructure rather than improving the agent.
The Target
Barca realized they didn’t just need a “chatbot”; they needed a platform. They scrapped the POCs and defined a new set of requirements for an agentic architecture that is:
- Cloud-native, scalable, and secure: Able to handle enterprise loads without managing physical servers.
- Enterprise search ready: Deeply integrated with their existing knowledge layer.
- Unified: Capable of retrieving data from multiple knowledge sources through a single interface.
- Open and extensible: Built on open frameworks rather than bespoke, brittle orchestration scripts.
- Measurable: Fully observable so engineering can track performance, costs, and accuracy.
To achieve this, Barca turned to a specific stack: Amazon Bedrock AgentCore, the Strands Agents SDK, and the Coveo Model Context Protocol (MCP) Server *Beta.
Before We Jump Into The Stack
Before we build a production-grade agent, let’s clarify the key components :
1. Amazon Bedrock

- Amazon Bedrock is a fully managed, low-code service for building agents quickly.
- Bedrock Model Catalog offers a broad selection of models including: Anthropic, Meta, Amazon Nova, … Select the model that will perform better for your use case.
- AgentCore is the underlying infrastructure-as-code layer. It gives developers full control over the agent’s runtime, memory, and identity. Think of it as the “serverless hosting” specifically built for custom AI agents, allowing you to bring your own code and frameworks while AWS handles the security and scaling.
2. Strands Agents SDK is an open-source, code-first framework for building agents. Unlike other frameworks that rely on complex graphs or rigid flows, Strands is model-driven: it lets the LLM decide the plan. It is lightweight, Python-based, and designed to run perfectly inside the AgentCore Runtime.

3. The Coveo Platform & MCP Server: AI agents are only as good as the data they can access. Coveo provides the unified index that “grounds” the agent in truth.

- The Coveo MCP Server translates Coveo’s enterprise search capabilities into a standard tool that any agent can plug into.
- Search Tool: Hybrid lexical + semantic retrieval with a highly configurable relevance tuning system combining ML models and manual business rules.
- Passage Tool: Precise passage-level retrieval for grounding LLMs.
- Answer Tool: A headless version of Coveo Relevance Generative Answering that bundles retrieval + synthesis into one call.
- Fetch Tool : Fetch complete document from Coveo Index.
- Instead of building a custom retrieval script, the agent simply asks the Coveo MCP server for information, and Coveo handles the complex ranking, security trimming, and relevance.
Not familiar with MCP Server? Learn more.
Bedrock AgentCore and Coveo Stack

In this post, we’ll show how Barca builds a production-ready support agent that:
- Runs on Amazon Bedrock AgentCore Runtime
- Uses Strands Agents for agent logic and tools
- Connects to the Coveo MCP server to access Search, Passage Retrieval, and Answer APIs
- Uses AgentCore Memory and a custom hook to persist conversation context
- Uses AgentCore Identity + OAuth2 to retrieve secure, permissioned data
- Uses AgentCore Observability to help trace, debug, and monitor agent performance in production environments
We’ll look at the architecture, the main building blocks, and some concrete code to help you reproduce this pattern.
Architecture: How the Pieces Fit Together

1. Front-End Client
– Authenticates the user through Amazon Cognito (or another IdP) using OAuth2.
– Calls AgentCore Runtime with the user’s access token in the Authorization header.
2. AgentCore Runtime
– Validates the token via AgentCore Identity.
– Executes a Strands Agent inside a secure environment, with AgentCore Memory configured.
3. Strands Agent + Strands Tools
– Uses the mcp_client tool to connect to the Coveo MCP server.
– Uses a custom memory hook to read/write conversation state via AgentCore Memory.
– Authenticates the Coveo-scoped user through AgentCore Identity using OAuth2.
4. Coveo MCP Server
– Receives requests containing a Coveo-scoped OAuth2 token in the Authorization header.
– Uses that token to call Coveo’s APIs with the correct security & permissions.
The Implementation
Step 1: The AgentCore App and System Prompt
At the core of this agent lies a BedrockAgentCoreApp entry point, which defines how runtime invocations are handled within Amazon Bedrock’s AgentCore framework. This setup initializes the main application context.
from bedrock_agentcore import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload):
# your agent logic here
return {"result": "..."}
if __name__ == "__main__":
app.run()
Barca defines a system prompt for a support expert that lowers hallucination and must use tools:
def build_system_prompt() -> str:
return """You are a support expert at Barca.
Important behavioral rules:
- You have no internal or pretrained knowledge.
- You must only use information provided by tools or the recent conversation.
- Treat your internal knowledge base as empty.
- Do not perform web searches or use external knowledge.
If the tools and recent conversation contain no relevant information:
- Say: "I don’t have that information in the provided context."
Answering rules:
- Use only information retrieved from tools or explicitly stated in the conversation.
- Be concise.
- If unsure, admit uncertainty.
When querying tools:
- Rephrase the user question into a precise query that includes relevant product names, features, or entities mentioned in the conversation.
Available Tools
get_answer
- **Use for**: Direct factual questions, definitions, how-to queries
- **Returns**: Curated answer with citations from Coveo Answer API
- **Best for**: Single, focused questions with clear answers
get_passages
- **Use for**: Detailed explanations, comparisons, multi-step processes
- **Returns**: Relevant passages with full context and metadata
- **Best for**: Complex questions requiring synthesis from multiple sources
search
- **Use for**: Broad exploration, finding multiple resources
- **Returns**: Ranked search results with excerpts
- **Best for**: Open-ended or exploratory queries
### Recent conversation
"""
This aligns with the best practices from our previous blog: treating prompts as reusable, system-driven contracts that enforce grounding and control.
Step 2: Memory via AgentCore Memory + Strands Hooks
Barca wants the agent to “remember” recent discussion turns for better context.
They use AgentCore Memory plus a custom Strands hook:
from strands.hooks import HookProvider, HookRegistry, MessageAddedEvent, AgentInitializedEvent, AfterInvocationEvent
from bedrock_agentcore.memory.session import MemorySession
from bedrock_agentcore.memory.constants import ConversationalMessage, MessageRole
class MemoryHookProvider(HookProvider):
def __init__(self, memory_session: MemorySession):
self.memory_session = memory_session
def retrieve_context(self, event: MessageAddedEvent):
# Load last few turns and inject as context
recent_turns = self.memory_session.get_last_k_turns(k=3)
if not recent_turns:
return
context_messages = []
for turn in recent_turns:
for message in turn:
role = message.get("role", "unknown")
content = message.get("content", {}).get("text", "")
context_messages.append(f"{role}: {content}")
context_block = "\n".join(context_messages)
event.add_system_message(f"### Recent conversation\n{context_block}")
def save_interaction(self, event: AfterInvocationEvent):
messages = event.messages or []
if not messages:
return
last = messages[-1]
text = last["content"][0]["text"]
role = MessageRole.USER if last["role"] == "user" else MessageRole.ASSISTANT
self.memory_session.add_turns(
messages=[ConversationalMessage(text, role)]
)
def register_hooks(self, registry: HookRegistry):
registry.add_callback(MessageAddedEvent, self.retrieve_context)
registry.add_callback(AfterInvocationEvent, self.save_interaction)
registry.add_callback(AgentInitializedEvent, self.retrieve_context)
This hook:
- Reads from AgentCore Memory on initialization / message added.
- Writes user and assistant messages after invocation.
In the entrypoint, Barca wires this into a MemorySessionManager:
from bedrock_agentcore.memory.session import MemorySessionManager
@app.entrypoint
async def invoke(payload) -> dict:
user_message = (payload or {}).get("prompt", "")
user_id = (payload or {}).get("user_id", "")
session_id = (payload or {}).get("session_id", "")
# Create or load a conversational memory session for this user/session
session_manager = MemorySessionManager(memory_id=MEMORY_ID, region_name="us-east-1")
user_session = session_manager.create_memory_session(
actor_id=user_id,
session_id=session_id,
) Step 3: Identity & OAuth2 with AgentCore Identity
Next, Barca needs the agent to act on behalf of a real user, so the Coveo MCP server can apply the right permissions and personalization.
AgentCore Identity supports this pattern with annotations like @requires_access_token, which retrieves OAuth2 token from the Coveo identification service and injects it into your function at runtime.
Here’s how Barca uses it:
from bedrock_agentcore.identity.auth import requires_access_token
async def on_auth_url(url: str) -> None:
# Sent back to the client so the user can complete OAuth
await queue.put_event({"auth_url": url})
@requires_access_token(
provider_name="oauth-provider-name",
scopes=["full"],
auth_flow="USER_FEDERATION",
on_auth_url=on_auth_url,
force_authentication=False,
)
async def need_token_3LO_async(*, access_token: str) -> str:
# AgentCore Identity injects `access_token` securely
return access_token
Key points:
- oauth-provider-name refers to an AgentCore Identity provider configured to talk to your IdP (e.g., Cognito, Okta, Entra).
- The user completes the OAuth flow via the auth_url returned to the client.
- The function simply returns the access_token, so you don’t handle the secrets or flows manually.
This token is then used to authenticate against the Coveo MCP server.
Step 4: Connecting Strands Agent to Coveo MCP Server
Strands Agents Tools provides an mcp_client tool that can connect to an MCP server over HTTP. Here is how Barca wraps it:
from strands import Agent
from strands_tools import mcp_client
async def load_mcp_client(queue: StreamingQueue, agent: Agent, url: str) -> None:
await queue.put_event({"status": "Loading MCP server..."})
# Get user-scoped token from AgentCore Identity
token = await need_token_3LO_async()
await queue.put_text({"status": "Connecting to MCP and loading tools..."})
# Connect to Coveo MCP
await asyncio.to_thread(
agent.tool.mcp_client,
action="connect",
connection_id="coveo_mcp",
transport="streamable_http",
server_url=url,
headers={"Authorization": f"Bearer {token}"},
timeout=60,
)
# Load tools exposed by the Coveo MCP server
await asyncio.to_thread(
agent.tool.mcp_client,
action="load_tools",
connection_id="coveo_mcp",
)
await queue.put_text({"status": "MCP tools loaded."}) Then they define an agent_task that:
- Connect to MCP and load the tools.
- Ask the Strands agent to answer the user’s question.
- Streams back the answer.
async def agent_task(prompt: str, queue: StreamingQueue, agent: Agent, url: str) -> None:
try:
# Connect and load MCP Tools
await load_mcp_client(queue=queue, agent=agent, url=url)
await queue.put_event({"status": "Generating answer..."})
# Ask the agent to answer
async for chunk in agent.stream_async(
f"Answer this question: <query>{prompt}</query>\n"):
if "data" in chunk:
logger.error(f"Chunk data: {chunk}")
await queue.put_event({"answer": chunk["data"]})
)
except Exception as e:
await queue.put_event({"error": str(e)})
finally:
await queue.finish()
await agent.tool.mcp_client(action="disconnect", connection_id="coveo_mcp")
Finally, the AgentCore entrypoint wires everything together:
@app.entrypoint
async def invoke(payload) -> dict:
# Load payload
user_message = (payload or {}).get("prompt", "")
user_id = (payload or {}).get("user_id", "")
session_id = (payload or {}).get("session_id", "")
# Set Memory session for this user/session
session_manager = MemorySessionManager(memory_id=MEMORY_ID, region_name="us-east-1")
user_session = session_manager.create_memory_session(
actor_id=user_id,
session_id=session_id,
)
# Create the Strands Agent
agent = Agent(
tools=[mcp_client],
hooks=[MemoryHookProvider(user_session)],
state={"actor_id": user_id, "session_id": session_id},
system_prompt=build_system_prompt(),
)
# Fire-and-forget: run in background, stream via queue
asyncio.create_task(
agent_task(prompt=user_message, agent=agent, queue=queue, url=mcp_url)
)
return queue.stream()
From here, AgentCore Runtime takes care of:
- Spinning up the environment.
- Injecting identity and memory.
- Handling streaming back to the client.
Step 5: The Client – Using Runtime Agent with OAuth2 and MCP
On the client side, Barca uses a small client app to demonstrate the pattern. The important parts are:
- Authenticating the user with Cognito and getting an access token.
- Calling AgentCore Runtime with that token and the MCP URL.
Here’s a simplified version of the client:
1. Authenticating the user with Cognito:
def generate_auth_header() -> str:
client = boto3.client("cognito-idp", region_name=default_region)
resp = client.initiate_auth(
ClientId=default_client_id,
AuthFlow="USER_PASSWORD_AUTH",
AuthParameters={"USERNAME": default_user_id, "PASSWORD": default_user_pwd},
)
access_token = resp["AuthenticationResult"]["AccessToken"]
return f"Bearer {access_token}"
2. The Bedrock AgentCore Runtime invocation:
def send_query(prompt: str):
bearer_token = generate_auth_header()
escaped_arn = urllib.parse.quote(
f"arn:aws:bedrock-agentcore:{default_region}:{aws_org_id}:runtime/{default_agent_runtime}",
safe="",
)
url = f"https://bedrock-agentcore.{default_region}.amazonaws.com/runtimes/{escaped_arn}/invocations"
headers = {
"Authorization": bearer_token,
"Accept": "text/event-stream",
"X-Amzn-Bedrock-AgentCore-Runtime-Session-Id": st.session_state.runtime_session_id,
"X-Amzn-Bedrock-AgentCore-Runtime-User-Id": default_runtime_user_id,
}
body = {
"prompt": prompt,
"user_id": default_user_id,
"session_id": st.session_state.runtime_session_id,
}
# Streaming response from AgentCore Runtime
with requests.post(
url,
params={"qualifier": default_qualifier},
headers=headers,
json=body,
timeout=None,
stream=True,
) as response:
response.raise_for_status()
ctype = (response.headers.get("Content-Type") or "").lower()
if "text/event-stream" in ctype:
for raw in response.iter_lines(decode_unicode=True):
if not raw or raw.startswith(":"):
continue
if raw.startswith("data:"):
yield json.loads(raw[len("data:"):])
else:
yield {"error": f"Unexpected content type: {ctype}"}
The client UI then just iterates over send_query(prompt) and renders:
- {“auth_url”: …} → login link.
- {“status”: …} → show status updates, like tools used.
- {“answer”: …} → stream back the agent’s answer.
Crucially:
- User authenticates via Coveo Authentication Service.
- Agent runtime receives that token via AgentCore Identity and passes it into @requires_access_token.
- Strands agent uses that token to call the Coveo MCP server, which applies the correct document-level security and personalization.
So if Leslie and Kyle both ask, “Show me internal roadmap details?” the answer is grounded in the same index and relevance layer, but each sees only what they are allowed to see.

Step 6: AgentCore Observability: Making Barca’s Agent Measurable
Once the agent is live, Barca doesn’t just need answers, it needs visibility. AgentCore Observability turns each interaction into a traceable execution: every model call, tool call (like the Coveo MCP tools), and steps in the workflow is captured as metrics and traces.
For Barca, this brings three big benefits:
- Understand behavior – See how the agent actually answered a question: which tools it called, how long they took, and errors (if any).
- Improve continuously – Use traces to spot content gaps, refine prompts, or adjust how and when the agent calls Search, Passage Retrieval, or Answer API.
- Control performance & cost – Monitor latency, error rates, and token usage across sessions to keep the experience fast, reliable, and within budget.
Combined with Coveo’s secure, relevance-aware retrieval, AgentCore Observability means Barca’s agent isn’t a black box. It’s an observable system the team can monitor, debug, and evolve with confidence.
Why Coveo + Bedrock AgentCore Is a Strong Enterprise Pattern

From Barca’ perspective, this architecture checks all the boxes:
- Security & Permissions
- AgentCore Identity + OAuth2 ensures user-specific context, permissions, and responses for every interaction.
- Coveo enforces security trimming on every query, so agents can’t leak content users shouldn’t see.
- Unified Retrieval Layer
- The agent doesn’t talk to random databases; it talks to the Coveo MCP server.
- Search API, Passage Retrieval API, and Answer API are all fronted by a single, enterprise-hardened index.
- Relevance & Personalization
- Coveo Hybrid lexical + semantic retrieval with a highly configurable relevance tuning system combining ML models and manual business rules, whether the “consumer” is a human in a UI or an AI agent.
- Coveo Hybrid lexical + semantic retrieval with a highly configurable relevance tuning system combining ML models and manual business rules, whether the “consumer” is a human in a UI or an AI agent.
- Memory
- AgentCore Memory gives structured conversational history.
- Hooks make it easy to build custom evaluation and logging around these turns.
- Observability
- AgentCore Observability helps you trace, debug, and monitor agent performance in production environments.
- Platform, Not Prototype
- AgentCore Runtime offers session isolation, scaling, and streaming out of the box.
- Strands Agents + Strands Tools make it easy to compose tools and extend with new capabilities over time.
Conclusion
The path from Demo to Production doesn’t have to be a complete rewrite. By combining Amazon Bedrock AgentCore’s production-grade infrastructure with Strands Agents’ developer-friendly framework and Coveo’s enterprise-ready knowledge layer, Barca built an agent that’s not just demo-ready, it’s production-ready.
The key insight: production agents need production infrastructure. Security, memory, identity, and observability aren’t nice-to-haves; they’re table stakes. With the right architecture, you can build agents that are secure, scalable, and actually useful, without reinventing the wheel.
Recommended Resources:
- Amazon Bedrock AgentCore Documentation
- Strands Agents SDK
- Strands Agents Tools
- Empowering Enterprise AI with Coveo: Guide to Agentic AI Systems and Key APIs
- How to Unlock Agentic AI With the Coveo MCP Server
- How to Build an AI Agent: From Basic Components to Enterprise-Grade Systems
- GitHub Repository of an Amazon Bedrock AgentCore Agent

