Building Production-ready Agents With AgentCore, Strands, and Coveo MCP Server

New to Agentic AI? Before we dive into architecture, we recommend checking out my colleague Louis-Guillaume’s previous post. He explores the fundamentals of agentic systems, why they matter, and how Coveo’s APIs (Search, Passage Retrieval, Answer) provide the essential foundation for enterprise-ready agents.

The Journey from Demo to Production

When “Demo-Ready” Isn’t “Production-Ready”

We’ll follow the story of Barca, a fictional B2B SaaS company whose support organization is currently drowning in tickets, endless Slack threads, and documentation scattered across various sources of content.

Like many enterprises, Barca’s engineering team has already experimented with Generative AI. They tested a few “agent” proof-of-concepts (POCs) that looked impressive in isolated demos. However, nothing was truly production-ready. As soon as they attempted to roll these agents out to real users, the gaps became obvious:

No consistent grounding: The agents often hallucinated because they lacked a connection to trusted, up-to-date knowledge.
No permission enforcement: The POCs couldn’t distinguish between a junior support rep and a VP, risking data leaks.
Fragmented retrieval: There was no unified layer to search across Jira, Confluence, SharePoint or any other simultaneously.
Statelessness: With no reliable memory or identity, every conversation felt like the first one.
Black box operations: There was no observability to track errors, latency, or reasoning steps.
Ops Complexity: Because they built custom modules and orchestration scripts from scratch, the team was stuck managing ad-hoc scaling and infrastructure rather than improving the agent.

The Target

Barca realized they didn’t just need a “chatbot”; they needed a platform. They scrapped the POCs and defined a new set of requirements for an agentic architecture that is:

Cloud-native, scalable, and secure: Able to handle enterprise loads without managing physical servers.
Enterprise search ready: Deeply integrated with their existing knowledge layer.
Unified: Capable of retrieving data from multiple knowledge sources through a single interface.
Open and extensible: Built on open frameworks rather than bespoke, brittle orchestration scripts.
Measurable: Fully observable so engineering can track performance, costs, and accuracy.

To achieve this, Barca turned to a specific stack: Amazon Bedrock AgentCore, the Strands Agents SDK, and the Coveo Model Context Protocol (MCP) Server ^*Beta.

Before We Jump Into The Stack

Before we build a production-grade agent, let’s clarify the key components :

1. Amazon Bedrock

Amazon Bedrock is a fully managed, low-code service for building agents quickly.
Bedrock Model Catalog offers a broad selection of models including: Anthropic, Meta, Amazon Nova, … Select the model that will perform better for your use case.
AgentCore is the underlying infrastructure-as-code layer. It gives developers full control over the agent’s runtime, memory, and identity. Think of it as the “serverless hosting” specifically built for custom AI agents, allowing you to bring your own code and frameworks while AWS handles the security and scaling.

2. Strands Agents SDK is an open-source, code-first framework for building agents. Unlike other frameworks that rely on complex graphs or rigid flows, Strands is model-driven: it lets the LLM decide the plan. It is lightweight, Python-based, and designed to run perfectly inside the AgentCore Runtime.

*The Strands Agentic Loop inside AgentCore, the model decides actions dynamically using tools.*

3. The Coveo Platform & MCP Server: AI agents are only as good as the data they can access. Coveo provides the unified index that “grounds” the agent in truth.

The Coveo MCP Server translates Coveo’s enterprise search capabilities into a standard tool that any agent can plug into.
- Search Tool: Hybrid lexical + semantic retrieval with a highly configurable relevance tuning system combining ML models and manual business rules.
- Passage Tool: Precise passage-level retrieval for grounding LLMs.
- Answer Tool: A headless version of Coveo Relevance Generative Answering that bundles retrieval + synthesis into one call.
- Fetch Tool : Fetch complete document from Coveo Index.
Instead of building a custom retrieval script, the agent simply asks the Coveo MCP server for information, and Coveo handles the complex ranking, security trimming, and relevance.

Not familiar with MCP Server? Learn more.

Bedrock AgentCore and Coveo Stack

In this post, we’ll show how Barca builds a production-ready support agent that:

Runs on Amazon Bedrock AgentCore Runtime
Uses Strands Agents for agent logic and tools
Connects to the Coveo MCP server to access Search, Passage Retrieval, and Answer APIs
Uses AgentCore Memory and a custom hook to persist conversation context
Uses AgentCore Identity + OAuth2 to retrieve secure, permissioned data
Uses AgentCore Observability to help trace, debug, and monitor agent performance in production environments

We’ll look at the architecture, the main building blocks, and some concrete code to help you reproduce this pattern.

Architecture: How the Pieces Fit Together

1. Front-End Client
– Authenticates the user through Amazon Cognito (or another IdP) using OAuth2.
– Calls AgentCore Runtime with the user’s access token in the Authorization header.

2. AgentCore Runtime
– Validates the token via AgentCore Identity.
– Executes a Strands Agent inside a secure environment, with AgentCore Memory configured.

3. Strands Agent + Strands Tools
– Uses the mcp_client tool to connect to the Coveo MCP server.
– Uses a custom memory hook to read/write conversation state via AgentCore Memory.
– Authenticates the Coveo-scoped user through AgentCore Identity using OAuth2.

4. Coveo MCP Server
– Receives requests containing a Coveo-scoped OAuth2 token in the Authorization header.
– Uses that token to call Coveo’s APIs with the correct security & permissions.

The Implementation

Step 1: The AgentCore App and System Prompt

At the core of this agent lies a BedrockAgentCoreApp entry point, which defines how runtime invocations are handled within Amazon Bedrock’s AgentCore framework. This setup initializes the main application context.

from bedrock_agentcore import BedrockAgentCoreApp

app = BedrockAgentCoreApp()

@app.entrypoint
def invoke(payload):
    # your agent logic here
    return {"result": "..."}

if __name__ == "__main__":
    app.run()

Barca defines a system prompt for a support expert that lowers hallucination and must use tools:

def build_system_prompt() -> str:
    return """You are a support expert at Barca.

Important behavioral rules:
- You have no internal or pretrained knowledge.
- You must only use information provided by tools or the recent conversation.
- Treat your internal knowledge base as empty.
- Do not perform web searches or use external knowledge.

If the tools and recent conversation contain no relevant information:
- Say: "I don’t have that information in the provided context."

Answering rules:
- Use only information retrieved from tools or explicitly stated in the conversation.
- Be concise.
- If unsure, admit uncertainty.

When querying tools:
- Rephrase the user question into a precise query that includes relevant product names, features, or entities mentioned in the conversation.

Available Tools
  get_answer
    - **Use for**: Direct factual questions, definitions, how-to queries
    - **Returns**: Curated answer with citations from Coveo Answer API
    - **Best for**: Single, focused questions with clear answers

  get_passages
    - **Use for**: Detailed explanations, comparisons, multi-step processes
    - **Returns**: Relevant passages with full context and metadata
    - **Best for**: Complex questions requiring synthesis from multiple sources

  search
    - **Use for**: Broad exploration, finding multiple resources
    - **Returns**: Ranked search results with excerpts
    - **Best for**: Open-ended or exploratory queries

### Recent conversation
    """

This aligns with the best practices from our previous blog: treating prompts as reusable, system-driven contracts that enforce grounding and control.

Step 2: Memory via AgentCore Memory + Strands Hooks

Barca wants the agent to “remember” recent discussion turns for better context.

They use AgentCore Memory plus a custom Strands hook:

from strands.hooks import HookProvider, HookRegistry, MessageAddedEvent, AgentInitializedEvent, AfterInvocationEvent
from bedrock_agentcore.memory.session import MemorySession
from bedrock_agentcore.memory.constants import ConversationalMessage, MessageRole

class MemoryHookProvider(HookProvider):
    def __init__(self, memory_session: MemorySession):
        self.memory_session = memory_session

    def retrieve_context(self, event: MessageAddedEvent):
        # Load last few turns and inject as context
        recent_turns = self.memory_session.get_last_k_turns(k=3)
        if not recent_turns:
            return

        context_messages = []
        for turn in recent_turns:
            for message in turn:
                role = message.get("role", "unknown")
                content = message.get("content", {}).get("text", "")
                context_messages.append(f"{role}: {content}")

        context_block = "\n".join(context_messages)
        event.add_system_message(f"### Recent conversation\n{context_block}")

    def save_interaction(self, event: AfterInvocationEvent):
        messages = event.messages or []
        if not messages:
            return

        last = messages[-1]
        text = last["content"][0]["text"]
        role = MessageRole.USER if last["role"] == "user" else MessageRole.ASSISTANT

        self.memory_session.add_turns(
            messages=[ConversationalMessage(text, role)]
        )

    def register_hooks(self, registry: HookRegistry):
        registry.add_callback(MessageAddedEvent, self.retrieve_context)
        registry.add_callback(AfterInvocationEvent, self.save_interaction)
        registry.add_callback(AgentInitializedEvent, self.retrieve_context)

This hook:

Reads from AgentCore Memory on initialization / message added.
Writes user and assistant messages after invocation.

In the entrypoint, Barca wires this into a MemorySessionManager:

from bedrock_agentcore.memory.session import MemorySessionManager

@app.entrypoint
async def invoke(payload) -> dict:
    user_message = (payload or {}).get("prompt", "")
    user_id     = (payload or {}).get("user_id", "")
    session_id  = (payload or {}).get("session_id", "")

    # Create or load a conversational memory session for this user/session
    session_manager = MemorySessionManager(memory_id=MEMORY_ID, region_name="us-east-1")
    user_session = session_manager.create_memory_session(
        actor_id=user_id,
        session_id=session_id,
    )

Step 3: Identity & OAuth2 with AgentCore Identity

Next, Barca needs the agent to act on behalf of a real user, so the Coveo MCP server can apply the right permissions and personalization.

AgentCore Identity supports this pattern with annotations like @requires_access_token, which retrieves OAuth2 token from the Coveo identification service and injects it into your function at runtime.

Here’s how Barca uses it:

from bedrock_agentcore.identity.auth import requires_access_token

async def on_auth_url(url: str) -> None:
    # Sent back to the client so the user can complete OAuth
    await queue.put_event({"auth_url": url})

@requires_access_token(
    provider_name="oauth-provider-name",
    scopes=["full"],
    auth_flow="USER_FEDERATION",
    on_auth_url=on_auth_url,
    force_authentication=False,
)
async def need_token_3LO_async(*, access_token: str) -> str:
    # AgentCore Identity injects `access_token` securely
    return access_token

Key points:

oauth-provider-name refers to an AgentCore Identity provider configured to talk to your IdP (e.g., Cognito, Okta, Entra).
The user completes the OAuth flow via the auth_url returned to the client.
The function simply returns the access_token, so you don’t handle the secrets or flows manually.

This token is then used to authenticate against the Coveo MCP server.

Step 4: Connecting Strands Agent to Coveo MCP Server

Strands Agents Tools provides an mcp_client tool that can connect to an MCP server over HTTP. Here is how Barca wraps it:

from strands import Agent
from strands_tools import mcp_client

async def load_mcp_client(queue: StreamingQueue, agent: Agent, url: str) -> None:
    await queue.put_event({"status": "Loading MCP server..."})

    # Get user-scoped token from AgentCore Identity
    token = await need_token_3LO_async()

    await queue.put_text({"status": "Connecting to MCP and loading tools..."})

    # Connect to Coveo MCP
    await asyncio.to_thread(
        agent.tool.mcp_client,
        action="connect",
        connection_id="coveo_mcp",
        transport="streamable_http",
        server_url=url,
        headers={"Authorization": f"Bearer {token}"},
        timeout=60,
    )

    # Load tools exposed by the Coveo MCP server
    await asyncio.to_thread(
        agent.tool.mcp_client,
        action="load_tools",
        connection_id="coveo_mcp",
    )

    await queue.put_text({"status": "MCP tools loaded."})

Then they define an agent_task that:

Connect to MCP and load the tools.
Ask the Strands agent to answer the user’s question.
Streams back the answer.

async def agent_task(prompt: str, queue: StreamingQueue, agent: Agent, url: str) -> None:
    try:
        # Connect and load MCP Tools
        await load_mcp_client(queue=queue, agent=agent, url=url)
        await queue.put_event({"status": "Generating answer..."})

        # Ask the agent to answer
        async for chunk in agent.stream_async(
            f"Answer this question: <query>{prompt}</query>\n"):
            if "data" in chunk:
                logger.error(f"Chunk data: {chunk}")
                await queue.put_event({"answer": chunk["data"]})
        )

    except Exception as e:
        await queue.put_event({"error": str(e)})
    finally:
        await queue.finish()
        await agent.tool.mcp_client(action="disconnect", connection_id="coveo_mcp")

Finally, the AgentCore entrypoint wires everything together:

@app.entrypoint
async def invoke(payload) -> dict:
    # Load payload 
    user_message = (payload or {}).get("prompt", "")
    user_id      = (payload or {}).get("user_id", "")
    session_id   = (payload or {}).get("session_id", "")

    # Set Memory session for this user/session 
    session_manager = MemorySessionManager(memory_id=MEMORY_ID, region_name="us-east-1")
    user_session = session_manager.create_memory_session(
        actor_id=user_id,
        session_id=session_id,
    )

    # Create the Strands Agent
    agent = Agent(
        tools=[mcp_client],
        hooks=[MemoryHookProvider(user_session)],
        state={"actor_id": user_id, "session_id": session_id},
        system_prompt=build_system_prompt(),
    )

    # Fire-and-forget: run in background, stream via queue
    asyncio.create_task(
        agent_task(prompt=user_message, agent=agent, queue=queue, url=mcp_url)
    )

    return queue.stream()

From here, AgentCore Runtime takes care of:

Spinning up the environment.
Injecting identity and memory.
Handling streaming back to the client.

Step 5: The Client – Using Runtime Agent with OAuth2 and MCP

On the client side, Barca uses a small client app to demonstrate the pattern. The important parts are:

Authenticating the user with Cognito and getting an access token.
Calling AgentCore Runtime with that token and the MCP URL.

Here’s a simplified version of the client:

1. Authenticating the user with Cognito:

def generate_auth_header() -> str:
    client = boto3.client("cognito-idp", region_name=default_region)
    resp = client.initiate_auth(
        ClientId=default_client_id,
        AuthFlow="USER_PASSWORD_AUTH",
        AuthParameters={"USERNAME": default_user_id, "PASSWORD": default_user_pwd},
    )
    access_token = resp["AuthenticationResult"]["AccessToken"]
    return f"Bearer {access_token}"

2. The Bedrock AgentCore Runtime invocation:

def send_query(prompt: str):
    bearer_token = generate_auth_header()
    escaped_arn = urllib.parse.quote(
        f"arn:aws:bedrock-agentcore:{default_region}:{aws_org_id}:runtime/{default_agent_runtime}",
        safe="",
    )
    url = f"https://bedrock-agentcore.{default_region}.amazonaws.com/runtimes/{escaped_arn}/invocations"

    headers = {
        "Authorization": bearer_token,
        "Accept": "text/event-stream",
        "X-Amzn-Bedrock-AgentCore-Runtime-Session-Id": st.session_state.runtime_session_id,
        "X-Amzn-Bedrock-AgentCore-Runtime-User-Id": default_runtime_user_id,
    }

    body = {
        "prompt": prompt,
        "user_id": default_user_id,
        "session_id": st.session_state.runtime_session_id,
    }

    # Streaming response from AgentCore Runtime
    with requests.post(
        url,
        params={"qualifier": default_qualifier},
        headers=headers,
        json=body,
        timeout=None,
        stream=True,
    ) as response:
        response.raise_for_status()
        ctype = (response.headers.get("Content-Type") or "").lower()

        if "text/event-stream" in ctype:
            for raw in response.iter_lines(decode_unicode=True):
                if not raw or raw.startswith(":"):
                    continue
                if raw.startswith("data:"):
                    yield json.loads(raw[len("data:"):])
        else:
            yield {"error": f"Unexpected content type: {ctype}"}

The client UI then just iterates over send_query(prompt) and renders:

{“auth_url”: …} → login link.
{“status”: …} → show status updates, like tools used.
{“answer”: …} → stream back the agent’s answer.

Crucially:

User authenticates via Coveo Authentication Service.
Agent runtime receives that token via AgentCore Identity and passes it into @requires_access_token.
Strands agent uses that token to call the Coveo MCP server, which applies the correct document-level security and personalization.

So if Leslie and Kyle both ask, “Show me internal roadmap details?” the answer is grounded in the same index and relevance layer, but each sees only what they are allowed to see.

Step 6: AgentCore Observability: Making Barca’s Agent Measurable

Once the agent is live, Barca doesn’t just need answers, it needs visibility. AgentCore Observability turns each interaction into a traceable execution: every model call, tool call (like the Coveo MCP tools), and steps in the workflow is captured as metrics and traces.

For Barca, this brings three big benefits:

Understand behavior – See how the agent actually answered a question: which tools it called, how long they took, and errors (if any).
Improve continuously – Use traces to spot content gaps, refine prompts, or adjust how and when the agent calls Search, Passage Retrieval, or Answer API.
Control performance & cost – Monitor latency, error rates, and token usage across sessions to keep the experience fast, reliable, and within budget.

Combined with Coveo’s secure, relevance-aware retrieval, AgentCore Observability means Barca’s agent isn’t a black box. It’s an observable system the team can monitor, debug, and evolve with confidence.

Why Coveo + Bedrock AgentCore Is a Strong Enterprise Pattern

From Barca’ perspective, this architecture checks all the boxes:

Security & Permissions
- AgentCore Identity + OAuth2 ensures user-specific context, permissions, and responses for every interaction.
- Coveo enforces security trimming on every query, so agents can’t leak content users shouldn’t see.
Unified Retrieval Layer
- The agent doesn’t talk to random databases; it talks to the Coveo MCP server.
- Search API, Passage Retrieval API, and Answer API are all fronted by a single, enterprise-hardened index.
Relevance & Personalization
- Coveo Hybrid lexical + semantic retrieval with a highly configurable relevance tuning system combining ML models and manual business rules, whether the “consumer” is a human in a UI or an AI agent.
Memory
- AgentCore Memory gives structured conversational history.
- Hooks make it easy to build custom evaluation and logging around these turns.
Observability
- AgentCore Observability helps you trace, debug, and monitor agent performance in production environments.
Platform, Not Prototype
- AgentCore Runtime offers session isolation, scaling, and streaming out of the box.
- Strands Agents + Strands Tools make it easy to compose tools and extend with new capabilities over time.

Conclusion

The path from Demo to Production doesn’t have to be a complete rewrite. By combining Amazon Bedrock AgentCore’s production-grade infrastructure with Strands Agents’ developer-friendly framework and Coveo’s enterprise-ready knowledge layer, Barca built an agent that’s not just demo-ready, it’s production-ready.

The key insight: production agents need production infrastructure. Security, memory, identity, and observability aren’t nice-to-haves; they’re table stakes. With the right architecture, you can build agents that are secure, scalable, and actually useful, without reinventing the wheel.

Recommended Resources: