AI-Powered Product Metadata Extraction with Amazon Bedrock Nova

This article is the final part of a four-part series exploring how to extend Coveo’s powerful search platform with vector embeddings from Amazon Bedrock to enable advanced visual search experiences for ecommerce.

This blog series describes customer-managed implementation patterns for teams that need to experiment now. These are not the default or recommended Coveo architectures. Where Coveo offers or is introducing native capabilities, those should be evaluated first. The examples here are intended to explain design tradeoffs, orchestration patterns, and governance considerations, not to imply product commitments or turnkey support.

The Metadata Challenge

Product metadata is the foundation of effective search. Rich, accurate metadata enables:

Better relevance: Users find what they’re looking for
Faceted navigation: Filter by color, material, style
Personalization: Recommend based on attributes
SEO optimization: Structured data for search engines

But manual tagging is expensive, slow, and inconsistent. With global ecommerce projected to reach $6.8 trillion by 2025, the scale of product catalogs makes manual enrichment impractical.

Why Amazon Bedrock Nova?

Amazon Bedrock Nova Lite is a multimodal model that understands both images and text. For metadata extraction:

Capability	Benefit
Multimodal understanding	Analyzes image content directly
Structured output	Returns clean JSON
Fast inference	~1-2 seconds per image
Cost-effective	~$0.0001 per image

Architecture

Our AI metadata pipeline is designed for batch processing. It downloads images, extracts metadata using Amazon Bedrock Nova, and then indexes the enriched products into Coveo.

Pipeline Architecture

The pipeline processes images in batch, extracting metadata and preparing data for Coveo indexing.

Implementation

Complete code is in scraper/ in our GitHub repository.

The Extraction Prompt

Prompt engineering is critical for consistent output:

Python

Key principles:

Constrained values: “MUST be one of” ensures consistent facets
Examples: Guide the model toward expected format
Clear instructions: “Return ONLY valid JSON” prevents extra text

Calling Bedrock Nova Lite

Python

Low temperature (0.1) ensures consistent, deterministic output.

Processing Pipeline

Python

Indexing to Coveo

Field Configuration

For the purpose of this example, we use simple custom field names (category, color, material). In a production e-commerce implementation, it is best practice to map these extracted values to the standardized Coveo Commerce Fields (e.g., ec_category, ec_color) either at the indexing stage or using field mapping rules in the Coveo Administration Console. This ensures full compatibility with features like Coveo ML’s Automatic Relevance Tuning (ART).

Before indexing, configure Coveo fields for faceting:

Python

Push API Integration

Python

Integration with Coveo ML

AI-extracted metadata is foundational for Coveo’s machine learning features to perform optimally:

ART (Automatic Relevance Tuning):ART learns from user behavior (clicks, purchases) linked to specific queries and facets. Consistent, high quality metadata ensures that the underlying taxonomy and filtering options are reliable. This consistency is what allows ART to accurately identify user intent and optimize relevance across the catalog.
Query Suggestions: Suggestions are context aware and rely on the availability of robust, standardized facets. The enriched metadata ensures a clean, unified set of values, which directly leads to more accurate and valuable query suggestions.
Faceted Search: Standardized values enable reliable filtering and navigation, which is the primary driver of high-quality search experiences.

The combination of AI-extracted metadata and Coveo ML creates a virtuous cycle—better metadata leads to better user interactions, which improves ML model accuracy.

Running the Pipeline

Bash

Cost Analysis

For 150 images:

Service	Cost
Bedrock Nova Lite	~$0.02
S3 Storage	~$0.001
Total	~$0.02

At scale (10,000 images): ~$1.50

Production Considerations for AI Enrichment

The simplified extraction prompt in this article is for illustrative purposes only. For a production-grade catalog enrichment pipeline, additional steps and governance are essential to ensure data quality and user trust:

* Validation and Confidence Thresholds: Implement a process to check the generated JSON structure and validate that values adhere to your product schema. You may discard or flag metadata where the model’s confidence score is too low.

* Human Override Workflow: For high-impact fields (e.g., product category), an administrative override layer is critical. This ensures that manually curated data takes precedence over any AI-generated value when necessary.

* Schema Constraints: Utilize more advanced prompt engineering techniques to enforce a robust schema. This includes not just constrained values (like ‘MUST be one of…’) but also regular expressions or a more complex Pydantic-style output format.

* Durability and Governance: The design principles in this article—deciding where enrichment is orchestrated, validating generated values before they affect ranking, and keeping a human override path—remain relevant even as native product capabilities evolve.

Series Recap

Over these four articles, we built a complete visual search solution:

Vector Embeddings: OpenSearch k-NN with Bedrock Titan
IPE + Lambda: Automatic embedding generation
Headless UI: React interface with Coveo Headless
AI Metadata: Automated product enrichment

The result: users can search by text, image, or both—with rich faceted navigation powered by AI-extracted metadata and relevance optimized by Coveo ML.

Resources:

Coveo Commerce Fields: Create Additional Fields
Amazon Bedrock Documentation
Coveo Push API
GitHub Repository
Map Commerce Fields to Custom Fields (Catalog Source)
Coveo Commerce Fields Payload Example