Automating Visual Search Indexing with Coveo Index Pipeline Extensions

This article is part two of a four-part series exploring how to extend Coveo’s powerful search platform with vector embeddings from Amazon Bedrock to enable advanced visual search experiences for ecommerce.

This blog series describes customer-managed implementation patterns for teams that need to experiment now. These are not the default or recommended Coveo architectures. Where Coveo offers or is introducing native capabilities, those should be evaluated first. The examples here are intended to explain design tradeoffs, orchestration patterns, and governance considerations, not to imply product commitments or turnkey support.

The Automation Challenge

In our previous article, we built the infrastructure for visual search. However, manually running scripts for each new product is not scalable. Ecommerce catalogs change constantly, with products being added, updated, and removed daily.

We need automatic embedding generation as part of the indexing process.

Coveo Index Pipeline Extensions

Coveo’s Index Pipeline Extensions (IPE) provide exactly this capability. An IPE is a Python script that customizes how sources index content.

IPEs execute in a sandboxed Python environment during document processing:

Pre-conversion: Before content extraction—modify raw content
Post-conversion: After content extraction—enrich with metadata

For embedding generation, we use post-conversion—the document is processed, and we have access to all extracted metadata.

Indexing Pipeline Flow

Coveo Indexing Pipeline

Architecture: IPE + Lambda

To avoid long-running processes within the Index Pipeline Extension (IPE), a more robust pattern is to treat the IPE as a trigger for an external asynchronous pipeline. The IPE’s role is simply to pass minimal data to an external function and ensure Coveo’s indexing continues immediately.

The external AWS Lambda function handles the heavy lifting in an asynchronous flow:

Triggered Asynchronously by the IPE or another external service.
Download image from S3
Generate embedding via Amazon Bedrock
Index to OpenSearch
Push Status/Metadata back to Coveo (e.g., via Push API or an additional IP

Why Lambda Function URLs?

Lambda Function URLs provide HTTPS endpoints without API Gateway:

No additional infrastructure
Built-in IAM authentication option
Lower latency than API Gateway
Cost-effective for internal service calls

Implementation

Complete code is available in backend/ipe_extension/ in our GitHub repository.

Lambda Handler (Simplified)

Python

See backend/ipe_extension/handler.py for complete implementation with error handling.

IPE Script (Asynchronous Trigger)

A production-ready approach leverages the IPE only to extract necessary fields and asynchronously trigger the external enrichment pipeline, ensuring the IPE does not wait for the long-running embedding generation. The full implementation of the external pipeline, which uses the Coveo Push API to update the document after enrichment, is detailed in our Part 4 article.

We maintain the IPE script for Metadata Enrichment (Key Design Decision #3) to flag documents for visual search, as this is a fast operation and the document remains in the pipeline for immediate text indexing.

Key Design Decisions

1. Idempotency

Documents may be reprocessed during re-indexing. We skip embedding generation if the asset already exists:

Python

2. Graceful Failure

If embedding generation fails, we return HTTP 200 so Coveo continues indexing. The product remains searchable via text, just not via image:

Python

3. Metadata Enrichment

We add metadata back to the Coveo document, enabling queries like “show products with embeddings”:

Python

Deployment

SAM Template Configuration

YAML

Full template: infrastructure/template.yaml

Setting Up the Coveo Extension

Navigate to Coveo Admin Console > Organization > Extensions
Click Add Extension
Paste the IPE script content
Update LAMBDA_URL with your Function URL. Note: For production environments, consider using Coveo’s Vault parameters to securely manage and store sensitive values like the LAMBDA_URL.
Apply to your source as Post-Conversion
Rebuild the source

Integration with Coveo ML

This automation complements Coveo’s machine learning capabilities:

ART (Automatic Relevance Tuning) continues to optimize text search results based on user behavior
Query Suggestions still provides intelligent autocomplete
Visual search adds a new discovery path without disrupting existing ML models

The embedding metadata (embedding_indexed: true) can be used in query pipeline rules to customize behavior for products with visual search enabled.

Performance Considerations

Operation	Typical Duration
S3 Download	100-500ms
Bedrock Embedding	500-1500ms
OpenSearch Index	50-200ms

Set Lambda timeout to 60 seconds to accommodate these operations.

Next Steps

With automatic embedding generation in place, every new product is immediately searchable via both text and image. In the next article, we’ll build a Front End based on React UI using Coveo Headless.

Even as native product capabilities evolve, the design principles in this article remain relevant: decide where enrichment is orchestrated, validate generated values before they affect ranking or filtering, keep a human override path for high-impact fields, and separate exploratory prototypes from production-grade search architecture.

Automating Visual Search Indexing with Coveo Index Pipeline Extensions

The Automation Challenge

Coveo Index Pipeline Extensions

Indexing Pipeline Flow

Architecture: IPE + Lambda

Why Lambda Function URLs?

Implementation

Lambda Handler (Simplified)

IPE Script (Asynchronous Trigger)

Key Design Decisions

1. Idempotency

2. Graceful Failure

3. Metadata Enrichment

Deployment

SAM Template Configuration

Setting Up the Coveo Extension

Integration with Coveo ML

Performance Considerations

Next Steps

Resources