What is semantic search? Semantic search refers to a family of information retrieval techniques that focus on query context and relationships between words to elevate search beyond mere keyword matching. 

Google does an amazing job of understanding that if you are looking at Digital Commerce you also might mean Ecommerce or E Tailing, or whatever the latest trend is. 

But while semantic search is tremendous when you are looking at topics – or categories – sometimes it falls short in ecommerce. Why? In ecommerce, a search query is typically short, broad, ambiguous, and underspecified, meaning that there aren’t enough linguistic elements to return results that are fully relevant to a shoppers’ intent. 

Shoppers looking for that perfect product don’t just want a search engine to match their keyword-they want the search solution to understand their shopping intent – even if they only enter three words in the search box. 

In light of this, digital leaders have been exploring sophisticated ways to leverage a combination of historical and in-session variables to create a semantic web or knowledge graph of words.  But is that the best approach?

To help make sense of all this and figure out what is best for you, I’ve broken this down into three areas.

  1. Ecommerce challenges of keyword matching 
  2. Ecommerce semantic search shortcomings
  3. The next generation of innovation for ecommerce search 

Ecommerce Challenges of Keyword Matching 

To get a better idea of the challenges semantic search addresses, we’ll briefly explain what keyword matching is. 

A Short Primer on Keyword Matching 

Many popular ecommerce search engines rely on general-purpose keyword search algorithms. 

The well-known Apache Lucene library, which powers both Solr and Elasticsearch systems, is a nice example of a leading search engine based on keyword matching. When applied to ecommerce, these search systems treat all product information (such as title, attributes, and description) as parts of one large text document. 

When a shopper enters a search term, those queries are split into single words. Each term is searched within the product’s description text using boolean logic. All matches are then ranked according to word statistics, which relies on counting how often particular words from the query are mentioned in the product data and how common this word is across all the products in the catalog. 

Ranking formulas can become pretty sophisticated, and take into account a variety of statistical factors and additional data, such as the newness of the product. Still, at the core they’re about counting words.

The Many Relevance Problems of Keyword Matching 

There are a few tricks that keyword matching-based search engines use to provide better result recall. For instance, they can normalize words, reducing them to their stems, so that “clean” will match “cleaner.” 

But due to the mechanics of information retrieval approaches, they still may not produce a relevant result.

Relevance Problem #1: Vocabulary Gaps 

Shoppers signal purchase intent in several ways, sometimes using a different vocabulary than that of the product catalog. For example, people looking for “cantaloupe,” “rockmelon,” or “sweet melon” have the same intent. As shown below, Safeway’s search shows an interesting example of the problem. No fruit for the shoppers using anything other than cantaloupe! 

Three screen captures show the results of different terms used to search for fruit

While search engines based on keyword matching can use thesauri with numerous synonyms so that “dress” will match “gown,” this is ill-equipped for ecommerce search. 

Many ecommerce catalogs rely on technical and often idiosyncratic language that generic thesauri available won’t likely be able to handle. Think: “solid-wood pergola with mounting kit” in home improvement or “skater dresses” in fashion.  Because of this, companies relying on keyword matching-based search engines will end up managing synonyms manually, which is not only tedious but also non-scalable, expensive and error-prone.

For instance, setting synonyms rigidly may conceal the context-dependent nature of words. Should “black” and “dark” be treated as synonyms? Sure, in some contexts: “black night” can mean “dark night.” But does “black dress” always mean “dark dress”? Not if you want to avoid a fashion faux pas

Relevance Problem #2: Related Products

If your online store doesn’t carry a specific brand, it’s not enough to simply omit that name. Shoppers who are shown net-zero results will bounce to find what they’re looking for. Instead, your site search again needs to understand the intent of the request and offer intelligent product recommendations

Say you have a customer looking for Mizuno-brand sneakers, but you don’t carry them. Don’t just throw all the other shoes you have at them! Instead, your ecommerce site search should recognize that your customer is looking for athletic shoes and offer similar products – like Nike or Asics sneakers. 

Matching a shopper’s intent to what is available in your inventory is not exactly feasible when relying on focusing on strings of text – versus concepts and the relationships among them.   

As it turns out, ecommerce websites often struggle to handle such scenarios. For instance, in the example below, I am visiting Ulta Beauty’s website looking for some shampoo by Sachajuan following some great reviews and feedback from friends. Unfortunately, not only am I provided with zero results (suggesting that Ulta does not carry this brand among the hundreds of shampoos available) but I’m also shown a bunch of recommendations that are clearly irrelevant – as they have nothing to do with my search intent. 

A screen capture shows no results for a 'sachajuan shampoos' search

Relevance Problem #3: Ambiguous and Broad Queries

Did you know there’s more than one kind of ambiguity? 

One is structural; the way a search query is ordered can vastly change the intent. For example, consider “dress shirt” and “shirt dress:” one is a shirt for business or fancy occasions, and the other is a dress that is fashioned in the style of a shirt. Same words, completely different meanings—just by changing the order of words! 

People can quickly navigate this complexity, relying on the context of an interaction and their own vast background knowledge about the world. But anyone expecting a keyword matching-based search engine to be this smart will be left disappointed. Keyword matching is poorly equipped to handle such queries. In fact, even a popular website such as ASOS (Alexa 235) returns the same results for the two queries.

A screen capture shows the results for a search on 'shirt dress'
A screen capture shows the same results surfaced for 'dress shirt' as for 'shirt dress'

Another type of ambiguity is semantics, which relates directly to the meaning of the query. If I search for “denim,” I might mean a type of fabric or I could be looking for a new pair of jeans. Keyword matching search systems match strings of text and not concepts, which makes it hard to handle not only syntactic but also semantic ambiguity. 

Relevance Problem #4: Precision of Search Results

It’s not enough to just show your customers something. While showing zero results can lead to a customer bouncing, showing unrelated results can also send shoppers elsewhere. Search based on matching keywords struggles to handle the precision of results. 

For example, say you search for “men’s black leather wallet.” The product catalog doesn’t have any SKUs that match this query exactly. Unsophisticated search systems will resort to a partial match and may return “men’s brown leather wallets” (which is relevant!) along with “men’s black leather belts” (irrelevant!). 

While you can boost the most relevant items to the top, it is good to keep in mind that most ecommerce sites allow customers to re-sort products by price, newness, sales, or ratings-in other words, the relevant product will be lost.

Is Semantic Search the Answer?

Clearly, ecommerce merchants need an alternative – and superior – search technology to tackle the above-mentioned challenges. Semantic search does offer an alternative approach, but there are definitely shortcomings with this method as well. 

While the term semantic search was coined in 2003, it didn’t really gain traction until the deployment of Google’s Hummingbird in 2013. The year prior, Google announced that users of its search engine would be able to search for “things, not strings” (of text), and this captures the core idea of semantic search quite effectively.

Broadly speaking, semantic search aims to match documents that correspond to the meaning and user intent – not just its words like the full-text search approach used to do. 

Semantic Search Doesn’t Speak with One Voice

There isn’t just one way to do semantic search. There are in fact several approaches available, ranging from knowledge graphs to semantic vector spaces. 

Knowledge Graphs

A common approach to semantic search is the knowledge graph. While the term was popularized by Google’s Knowledge Graph in 2012, knowledge graphs are way older than that. Simply put, these organize data from multiple sources, capture information about entities of interest in a given domain or task, and forge connections between them. 

They represent knowledge by subject-predicate-object triples, where the predicate indicates the relationship between an entity pair (subject and object). Entities are connected to each other by predicates/relations. 

A graphic visualizes the concept of a 'knowledge graph'
A represents the subject, B represents the predicate (relation), C represents the object

The semantic nature of knowledge graphs comes from the fact that the meaning of the data is encoded in an ontology. This describes the types of entities in the graph and their characteristics. The graph then, is not only a place to organize and store data, but also (and crucially) to derive information and enable advanced processing of queries. When a customer searches for “purple jacket,” the knowledge graph understands the relationship between the different words and helps return relevant results.

Semantic Vector Search 

Another approach that has garnered plenty of attention over the past years is semantic vector search. This has become by now such an established approach that there are open-source projects (such as  Facebook FAISS, for example) that are used by digital players and vendors. 

The basic idea behind this semantic vector search is that we need to go deeper into the meaning of both data and queries in a way that is directly accessible and understandable to computers.

Computers can only deal with numbers, so we need a way to represent our data numerically. Obviously, a single number cannot represent all the complexity of a query or a product, so we need a whole lot of them. An orderly list of numbers, such as [24, -5.14, 0, -14] is called a vector, and the length of this list is called a vector dimension.

Imagine representing all queries and products as two-dimensional vectors – creating a vector space. With semantic vectors, products and queries that are similar in meaning are represented by vectors similar in distance. For instance, we can have a clear cluster of queries and products representing concepts of dresses. 

A graphic visualizes the concept of a vector space
A vector space visualizes the distance between and relationship of different products.

Distances between points represent levels of similarity between corresponding concepts. We can thus build a semantic vector space for our data. With a semantic vector space, the complex and vague problem of searching for relevant products by text queries can be transformed into a well-stated problem of searching for closest vectors in vector space, which is something computers are very good at.

The Benefits and Value of Semantic Search 

By deploying these more sophisticated approaches and moving beyond basic keyword matching, semantic search promises to address some of the critical problems that keyword-based search suffers from. 

For example, by leveraging its semantic search capabilities, Google can handle vocabulary gaps. It understands that both “home renovation loans” and “home improvement loans” mean the same thing, and the user intent behind both searches is pretty much the same. 

A screen capture shows that Google understands the semantic differences and relationships of terms, using home renovation loans and home improvement loans as examples
A screen capture shows that Google understands the semantic differences and relationships of terms, using home renovation loans and home improvement loans as examples

Similarly, if, because I read stellar reviews, I search Netflix for the horror movie The Babadook – I’ll find it isn’t available. This may be bad news – but Netflix still manages to come back with a number of results that are quite relevant, because it understood my intent. 

A mobile screen capture shows a search for "The Babadook" surfacing different but related results.

Moreover, keyword matching struggles when queries are similar in terms of words and structure, but actually relate to very different products. Semantic search handles this effectively. For example, Google produces different results for the queries “camera with lens” and “lens for camera” because it understands the meaning of “for” versus “with”.

A screen capture shows that Google understands word placement and usage in a phrase, showing lens for camera and camera with lens as examples
A screen capture shows that Google understands word placement and usage in a phrase, showing lens for camera and camera with lens as examples

So, is using semantic search the best option for an ecommerce search? Well maybe not…

A Better Alternative to Semantic Search 

In web search, queries tend to be relatively long. But this is not the case in ecommerce. For example, research from the Nielsen Norman Group shows the average number of characters is 20.5 for web-wide searches. Meanwhile, about 30-40% of e-commerce customers start a shopping session with broad queries like “mens tops”, “nike,” or “handbags.”

Short head queries like these provide very limited information about what they’re really looking to buy. Some of those queries, like “mens tops,” match a significant proportion of the catalog, and nothing in the search query itself can help determine the relevant product. 

The linguistic content associated with online customers’ queries typically does not provide enough semantic context to determine a shopper’s preferences and needs (what we call shopper or user intent).

Does a shopper typing in “shoes” want running shoes, and how do you determine that just from a single word? Does a search for “jacket” mean a winter or a summer jacket, which are completely different yet both relevant sets of products?

If purely semantic approaches don’t guarantee the capture of shopping intent for a huge and critical portion of customers’ sessions, then what does?

From Word Vectors to Product Vectors

Leading industry analysts have introduced new categories to mark the need to evolve semantic search into a more mature, complete, and intelligent approach to information retrieval. For example, Forrester Research has introduced the category of Cognitive Search to refer to a more complete, sophisticated approach that leverages multiple types of relevant data and artificial intelligence to deliver the most relevant search experiences.  

In a recent report, Forrester analyst Scott Compton pointed out that “Cognitive Search is able to show results based on a combination of historical and in-session variables, making it increasingly relevant to the consumer, even during their first visit.” 

It is precisely in this spirit that Coveo recently introduced an approach dubbed ‘Personalization As You Go.’ 

Similar to the concept in semantic search of creating vector maps of words, the idea is to create a vector map of the products in a customer’s catalog. By mapping out products that are more akin to one another using attributes such as brand, size, price point, color. Think of this like a map of a store with similar items being displayed near one another. 

The idea is then to combine this vector map with in-session variables and customer onsite behaviors to capture searcher intent. Apply machine learning in real-time to tailor query suggestions, autocomplete, dynamic facets or recommendations for the shopper.

For example, there’s nothing in the query “gloves” that specifies that the user is interested in golf gloves, rather than, say, winter gloves. But onsite customer behavior and the fact that she’s been browsing through golf pants definitely help capture the shopper’s intended meaning. 

This is a new and exciting way we can solve commerce challenges such as cold start shoppers and detecting shopper intent that goes beyond the limitations of what semantic search can offer.

A chart from Forrester Research shows how the 'humanization' of search technology increases its business value

Dig Deeper 

To learn more about the ways in which Coveo leverages in-session behaviour to deliver the most relevant results, read Powerful Personalization in Ecommerce  

If you want a technical deep dive then continue reading with Real-Time Search Personalization in Less Than 100 Lines of Code!

To learn more about why Coveo has been named a leader in Cognitive Search for the fourth consecutive year, read The Forrester Wave: Cognitive Search Q3 2021

To learn more about Coveo’s AI innovation research visit research.coveo.com 

Share this story:

About Andrea Polonioli

Andrea is a Product Marketer for our Commerce line of business. Prior to joining Coveo, he was at Tooso, the acquired AI search ecommerce startup. He has a passion for innovation-driven companies and a research background in cognitive science. When he is not working, he is likely to be experimenting new dishes, travelling or hiking.

Read more from this author