As discussed in our last article, product embeddings are an innovative approach to information retrieval and recommender systems that are uniquely tailored to ecommerce. But why should you care? 

Product embeddings enable customers to find and discover highly relevant products based on intent and context by virtue of its ability to naturally capture a lot of the latent properties of products. 

Product embeddings can be used to deliver hyper-relevant experiences and support a number of use cases (e.g., query suggestions, related products recommendation, facet selections, and rankings). 

However, a great deal of interest has actually been focused on product recommendations and in-session personalization. 

Why Personalized Recommendations?

The surge of interest in using a recommender system should come as no surprise. Customers love a personalized recommendation. McKinsey’s conversion research found that recommendations drive 35% of what consumers purchase on Amazon. (And 75% of what they choose on Netflix!) 

Further, it was precisely due to the importance and effectiveness of recommendations that Netflix, back in 2006, launched the Netflix Prize to search for machine learning experts who could improve its previous algorithm. 

Product embeddings and personalized recommendations are a total power couple. By leveraging product embeddings, ecommerce players can: 

  • Gain a better understanding of their catalogs
  • Overcome cold start problems
  • Provide more relevant results and predictions

In a nutshell: By exploiting these benefits, ecommerce players using embeddings provide better customer experiences and drive profitable growth. 

Who Uses Product Embeddings?

First things first. While this may sound like a new concept, product embeddings have actually become a cornerstone for a considerable amount of machine learning models by ecommerce innovators. 

They have been used by the companies that are investing the most in AI innovation, such as Amazon, Walmart, Pinterest, Yahoo, Alibaba, Microsoft, Criteo, and Coveo

For those keeping track, you might think: wasn’t Amazon famous for its approach based on collaborative filtering for their product recommendations? 

Absolutely — in 1998. Much has happened over the past 25 years. 

New techniques have been developed to increase the relevance of experiences and to drive uplifts in conversion rates. 

What Is Collaborative Filtering? 

Broadly speaking, collaborative filtering is a type of recommendation system that predicts what might interest a person based on the taste of many other users. It assumes that if person X likes Burberry sweaters, and person Y likes Burberry sweaters and Ralph Lauren trousers, then person X might like Ralph Lauren trousers as well.  

A graphic illustrates how collaborative filtering works using a cyclist, pizza, salad, and soft drink metaphor.
Or if cyclist one likes both pizza with salad and gets a soft drink, then cyclist two might also like a soft drink with their pizza and salad. Source.

While collaborative filtering is still frequently used on most ecommerce websites, it suffers from two main problems: 

Collaborative Filtering Negative 1: Limited Historical Data 

How do you offer relevant, personalized recommendations to someone you’ve never seen before?

Collaborative filtering requires an abundance of customer data. If you don’t have registered or even recurring customers, it may prove to be difficult to collect enough information to create the rich customer profile necessary for collaborative filtering. 

But this is precisely the situation in ecommerce: 70 to 95% of users visit a website less than twice a year​ and most aren’t logged in. Our estimates are in line with other published and peer-reviewed work.

Collaborative Filtering Negative 2: Anonymous Shoppers 

At this point you might be thinking: surely the Amazons of this world won’t experience this problem! Which is true, but unlike the vast majority of retailers out there, the Amazons of this world don’t have to deal with such a sizable group of anonymous users. 

However, as you see below,  even Amazon  can suffer from a slightly different version of the problem.

Collaborative filtering doesn’t really help to recommend relevant products to genuinely new users – and even when users aren’t genuinely new, previous data and preferences may not be that relevant anymore to detect and decipher shopper intent. 

Consider how data about a backpacking trip you took last year may no longer be relevant to the upcoming family-friendly trip with your children. Or how a student looking for a bed may buy a twin when sharing with a roommate – but a few months  later upon graduating, may opt for a queen to furnish a new apartment instead. 

Sometimes shoppers also have very different interests at different yet closely spaced points in time. For instance, depending on their mood or their social context, users may be interested in watching different movies. 

These are all kinds of user cold-start situations where focusing on real-time intent would help provide more relevant experiences to shoppers than relying on historical data.

Embedding to the Rescue

In light of these challenges, alternative approaches focused on looking at how intent is determined with a word. When a person says “kids,” do they mean baby goats or children? This has been addressed with word embeddings, or Word2Vec, an application of natural language processing. 

By representing each word as a numeric embedding vector, word2vec can be extremely effective when working with texts: they are able to capture synonyms and words used with semantic similarity. 

An image shows a language vector space, and illustrates how different words can be grouped by their relationships.

However, this kind of embedding isn’t the best solution for ecommerce. 

Sure, they can help find a similar product to one you’ve interacted with. (Like how trousers are similar to slacks.) But they won’t address challenges like in-session personalization. For example: If a person has put slacks in her cart, you want to provide recommendations that complement that choice. Not just more pants.

This is where product embeddings enter the picture. 

Coveo’s scientists have been writing extensively about product embeddings (Prod2Vec) over the past years. 

Product embeddings were designed specifically for ecommerce. 

As word embedding aims to capture the similarity between words, product embeddings aim to capture affinities between products. This is done by representing each product as a numeric vector, so products in similar contexts have similar vectors. Products are mapped to vectors in such a way that items that customers perceive or use in similar ways are clustered together. 

This is achieved by training a deep learning model based on all available customer engagement history mined from clickstream data.

There are significant benefits to using session data about customers browsing activity through a website. 

As it turns out, shoppers generate a lot of behavioral data. With this data, a merchandiser can focus on developing a user-behavior strategy that, in the long run, improves the efficiency of results and predictions. Ultimately, they can improve their conversion rate. 

By dynamically building this virtual space (using user-product interactions), product embeddings enable brands to capture product affinity fairly well

Product embeddings can also help provide in-session personalization. Given a list of products viewed in a session, it’s possible to easily understand which products would best complement the products being considered. Product embeddings help retrieve, compute, and combine similarities in real time, and are able to capture the style of a product, color category or a price level. 

Product embeddings make it possible to leverage machine learning to do exactly as an in-store assistant would do; observe a shopper’s behavior to determine interests.

How to Deal With Cold Start Product Problems

So product embeddings are terrific for solving the cold start shopper problems – what about cold start products? Can they work for new or niche products, for which we don’t really have much data available? 

Indeed, popular products typically have good quality embeddings associated with them – you’ll find products that are actually similar. Less popular products have far worse embeddings. And, of course, new products have no embeddings at all, as online users haven’t interacted with them. That means there is no input that the machine can use to learn to identify a product and its semantic surroundings.

This is a crucial concern  for websites where thousands and thousands of new products are continuously uploaded each hour. There are no user behaviors for these items. It can thus become highly challenging to process these items or predict the preferences of users for these items.

Data scientists at Coveo have been researching this problem to find  solutions. 

For instance, we have shown how it is possible to achieve good performance using  image vectors and how it is possible to create vectors of a similar quality for rare and new products (aka the “cold start embeddings” challenge) by leveraging and injecting catalog data (images and other meta-data). 

Readers interested in the details of Coveo’s solution can check out our research paper “The Embeddings That Came in From the Cold”, presented at the world premiere conference on recommender systems – RecSys 2020 – together with peers from Etsy, Amazon and Netflix.

Product Embeddings Drive Better Results

As we’ve seen, approaches inspired by collaborative filtering may work well when there are adequate shopper-product interactions, but fall short when interaction data is sparse. Unfortunately, these are the kinds of solutions that most vendors offer and most ecommerce websites rely on. 

But the truth is that, following the Pareto principle, 20% of a catalog will usually get 80% of the traffic, whereas the rest of the catalog does not have enough interaction data for collaborative filtering to make meaningful behavior-driven recommendations. 

So, what tends to happen in these scenarios is that the product recommendation system tends to suggest only the top 30% of inventoried products

Our own research shows that these top (aka hero) products are only relevant to 16% of visitors, meaning your company’s attempt to help your customer and make further sales probably ends right there.

Luckily, there are alternatives for ecommerce players trying to boost their conversion rates and average order value. 

For instance, research from Coveo shows that prod2vec, which is a technique to create product embeddings, performs remarkably well on rare products. This is in line with previous research showing that product embeddings can significantly outperform techniques such as collaborative filtering when it comes to unpopular items.

On top of that, as we’ve seen, product embeddings can provide an effective solution to the problem of real time personalization for shopper cold start. By using product embeddings, it is finally possible to personalize experiences for cold-start shoppers, who make up 70% of site traffic – that includes genuinely new visitors who have never been on the site as well as those who are categorized as new visitors based on your tracking snippet. 

The benefits are impressive. McKinsey states that personalization can deliver lift sales by 10% or more. While retailers have been employing forms of recommendations based on collaborative filtering for the last two decades, meaningful, real time personalization for recommendations wasn’t possible until recently. The stark reality is that the right tools simply didn’t exist to deliver truly relevant, effective recommendations to customers across channels.

But of course, while we have focused mostly on product recommendations here, product embeddings can be used to provide great performance across a number of different tasks. 

Looking for an example? 

Here’s one. Consider intent prediction, which is the task of guessing whether a shopping session will eventually end in the user adding items to the cart (signaling purchasing intention). Interestingly, a paper by Coveo’s ML scientists has shown that for ecommerce players at a “reasonable scale,” prod2vec can even outperform neural networks inspired by transformers (i.e. BERT). 

So, it shouldn’t come as a surprise the fact that leading tech and Ecommerce players have chosen to apply sophisticated deep learning approaches such as product embeddings to finally unlock the power of real-time personalization for cold start shoppers, to contextually recommend rare products that would’ve otherwise remained buried due to a lack of historical events, and to ultimately deliver relevant, meaningful experiences that drive growth. 

Dig Deeper

Product embeddings is just one tool of many that Coveo uses to transform your online retail site into a powerful ecommerce experience. Learn more!

With Coveo, you canBoost your conversions, cart sizes & profits in 90 days

Have a minute? We’d love to get your feedback.

Share this story:

About Andrea Polonioli

Andrea is a Product Marketer for our Commerce line of business. Prior to joining Coveo, he was at Tooso, the acquired AI search ecommerce startup. He has a passion for innovation-driven companies and a research background in cognitive science. When he is not working, he is likely to be experimenting new dishes, travelling or hiking.

Read more from this author