Conversational Search & Discovery for Commerce: A New Chapter Still Being Written

Product discovery is on the verge of a new era — driven by conversational commerce experiences that blend search, chat, and recommendations.

Consider Amazon: when it launched in the mid-1990s, shoppers explored products primarily through category taxonomies and a simple search box (by title, author, or subject).

Later, in 1998, Amazon pioneered item-based collaborative filtering, laying the foundation for modern product recommendations.

Remember with the Amazon search bar looked like this — Amazon: The little box that built a retail empire

Since then, innovations have been largely incremental – search has become way smarter with features like query suggestions, faceted filtering, and autocomplete, while recommendations have grown more precise, context-aware, and personalized. Yet, for more than two decades, the core touchpoints and modalities of engagement – browse, search, and recommendations – remained fundamentally unchanged.

That is, until now. In 2024, Amazon introduced Rufus, an AI-powered shopping assistant integrated into the Amazon app and website. By allowing shoppers to ask questions and receive conversational answers directly in a chat window, Rufus signals the arrival of a new experience – a hallmark of conversational commerce that could reshape how people discover and interact with products.

Introducing Rufus — Amazon’s GenAI-powered shopping assistant, Rufus

Co-evolution, Not Replacement

Engaging with a conversational assistant won’t necessarily appeal to everyone. Some, after all, will keep searching for “jackets” and use filters to narrow down result sets. But for many shoppers, being able to use a conversational assistant will feel natural and powerful, finally bringing the digital experience closer to that of a knowledgeable, in-store sales associate.

And just as some shoppers prefer to browse, others to search, and still others to rely on recommendations, a growing segment will choose conversation as their preferred path. Preferences will vary by context, too: a shopper might enjoy chatting on mobile, where typing is limited and quick answers matter, but revert to search on desktop for more control and filtering.

The takeaway is clear: a meaningful share of your customers will expect conversational engagement, and brands must be ready to deliver a seamless, valuable experience when they do.

Conversations Must Be Valuable

Conversational search is poised to be the next big leap in product discovery, and there’s growing excitement about weaving conversational elements into the shopping journey.

But novelty alone isn’t enough – it must deliver tangible value to shoppers. Simply sprinkling in forced or superficial dialog won’t work.

We’ve already seen companies experiment with conversational features, only to roll them back when they failed to resonate.

There are many ways in which conversations can be added. For example, Bed Bath & Beyond introduced a conversational discovery feature in 2024 where prompts like “Which color are you looking for?” popped up, followed by filtering options. In practice, however, the key question is whether experiences represent meaningful, helpful conversations that help them more easily find and discover what they need.

Conversational experiences should add value — Superficial dialog without substance fails to enhance discovery

Delivering Relevant Conversational Experiences Is Hard

Customers expect systems to handle genuine conversations – and that’s no small feat.

LLMs are powerful but prone to errors and hallucinations, and while generative AI is fascinating, it’s also unpredictable. Research from Salesforce and Microsoft shows that multi-turn conversations in particular can easily veer off track.

But it’s not only long exchanges that are challenging.

Vans.com illustrates this: even brief conversational interactions sometimes produce irrelevant or unhelpful results – showing just how tough it is to get right.

Not all conversational experiences are relevant

Reliability Through Sources

As we’ve seen, the reliability of answers and conversations is absolutely critical. One of the most effective ways to build that trust is by providing clickable sources – giving users transparency into where information comes from. Amazon Rufus has recently adopted – or rather, begun to test, this approach, underscoring how important it has become.

Yet, most chatbots still haven’t implemented it. At Coveo, we take this very seriously. Ensuring that our AI-powered conversations are not only intelligent but also trustworthy and verifiable is a core part of our commitment to enterprise-grade relevance.

In one recent Coveo enterprise implementation, GenAI helps professional buyers navigate complex product questions with confidence. By embedding clickable sources directly into the experience, the system builds trust and ensures every interaction is both reliable and transparent.

GenAI guidance that's relevant, helpful, and trustworthy — GenAI guidance that’s relevant, helpful, and trustworthy

Where UI Meets Slow AI

Relevance, accuracy, and helpfulness are only part of the equation. Conversational search is powered by LLMs and generative AI – which are notoriously slow compared to traditional search.

We’ve long known that speed is critical in ecommerce: even a 100-millisecond (0.1 second) delay can cost online retailers 1% in sales, according to Amazon’s findings.

How do you balance conversation with speed? One reason chatbots remain popular is that users tend to tolerate slower responses from a chat-based conversational assistant than from a search box.

This patience helps explain why most implementations surface as chatbots rather than fully integrated search experiences.

Yet this trade-off often comes at a cost to the user experience. Chat-based shopping assistants can appear enticing, but they risk oversimplifying the robust, multifaceted search journeys of top-tier ecommerce sites.

Traditional search interfaces enable rich exploration of complex catalogs — leveraging facets like price, ratings, categories, and filters — to deliver both precision and serendipitous discovery. In contrast, many chatbots narrow the interaction to a single suggestion, reducing user control and active engagement.

All the choices, none of the control — Spin the wheel of handbags and hope for the best

For example, as shown here, Michael Kors’ style assistant provides a few carousel suggestions, which can be useful, but without robust sorting or filtering, it’s not easy to explore the catalog fully.

The challenge is to balance the human-like convenience of conversation with the depth and richness of traditional search.

When done right, conversational discovery doesn’t replace search or rely on chatbots. Rather, it enhances the shopping journey, making exploration more engaging and effective.

Memory vs Persistence

Memory can greatly enhance user experiences – a knowledgeable assistant that remembers who you are and your preferences feels more personal and helpful.

Amazon’s Rufus is an impressive example: as Kiri Masters notes in this LinkedIn post, Rufus can recall preferences and context across interactions.

This is critical, as a smart assistant should be able to remember things about you to recommend the most valuable items.

However, there’s an important difference between remembering useful preferences and simply storing every conversation indefinitely.

Conversational discovery: memory vs persistence — Conversational discovery: Memory vs persistence

For instance, on Vans.com, a conversation was still stored days later – long after the interaction ended. It’s not obvious what value this persistence offers. If the memory isn’t actively used to improve the experience, it risks feeling creepy, and it may raise legal and compliance concerns.

If conversation history is retained, users should have control: give them clear options to delete, reset, or manage stored chats. Without that transparency, even helpful features can seem intrusive.

Understanding Intent: A Prerequisite to Conversational Discovery

Conversational systems must be robust across a wide range of user intents, from generating complex product comparisons (as in the example below) to broad, exploratory questions like “What do I need for a home painting project?”

From complex comparisons to open-ended exploration, conversations must support every stage of discovery.

But robustness also means defending against malicious intent or “jailbreak” attempts – deliberate prompts designed to trick the system into inappropriate or unsafe responses. Detecting and gracefully handling these situations is essential to maintaining trustworthy experiences.

For example, Cybernews recently reported that Expedia’s chatbot was manipulated into providing instructions for making a Molotov cocktail – a stark reminder of the risks when intent detection and safeguards fail.

Evaluating Conversations

Granted, conversational search and discovery should be valuable – but how can you assess performance?

Evaluating conversational search is particularly tricky: what are the right metrics and attribution models? Of course, driving Revenue per Visitor (RPV) and conversion rates remains essential. But headline metrics alone can be misleading – or at least incomplete.

For example, if conversational users show higher conversion rates, that doesn’t necessarily mean the feature itself drives value – it may simply reflect selection bias. Early adopters or brand-loyal customers – who already buy more frequently – may also be more willing to try new interfaces.

To assess real impact, complementary metrics may turn out to be critical. These might include:

Switch-to-search ratio: the percentage of users abandoning conversation for traditional search.
Conversation abandonment rate: how often users drop out mid-conversation.
Thumbs-up/down feedback: a simple but effective proxy for customer satisfaction.

Similarly, if users spend a long time chatting, does that signal engagement – or frustration? As we see, a new touchpoint demands new analytics, reporting, and sophisticated attribution models to truly understand its impact.

Expert Best Practices for Conversational Search and Discovery

Forrester recently highlighted several best practices that should shape the next generation of conversational search and discovery experiences:

Enable natural conversation while retaining a satisfying browsing experience.
Always provide an “out” so shoppers can start a new search or manually refine/browse.
Prompt the shopper for refinement selectively while ensuring shoppers can enter their own text at any time.
Respect product data by grounding all interactions in accurate details and filters based only on existing data.
Test continually to ensure experiences are intuitive, avoid repetition, and reduce friction.
Capture customer preferences throughout the journey to apply them in future interactions.

At Coveo, we share this perspective: the future isn’t just chat for chat’s sake. It’s GenAI-augmented guided selling that combines traditional search and browse with intelligent, conversational refinement.

The Road Ahead

Conversational search and discovery remain in their infancy – full of promise but still defined by experimentation and unanswered questions. The potential to reshape product discovery is enormous, yet realizing that potential will require careful design, rigorous measurement, and a deep respect for user expectations.

At Coveo, we’re helping organizations explore these possibilities – not with gimmicks, but with reliable approaches to conversational discovery and production-ready capabilities.

If you’re ready to examine how conversational search can enhance (rather than disrupt) your customer journey, we’d love to collaborate with you on shaping what comes next.

Dig deeper

Grab our ebook to learn more about what’s next for product discovery.

Learn more