Have you ever searched for something like “sneaks” on your favorite online store and been surprised when nothing shows up in the results? This is a common (and disappointing) experience many users have when performing a keyword search on a website or app that isn’t specifically a search engine. 

We tend to think site search should work like it does on websites like Google – automatically connecting concepts and showing us useful, relevant results. But many search tools work by matching the exact words we type to content word-for-word. So, without an exact keyword match, your search for “sneaks” may come up with nothing in the search results. While matching the exact phrase to the right content – in this case a product description – yields good results, it often means you won’t find related or similar content. 

Enter keyword stemming – the process of relating words to their base or “root” so searches connect concepts more intelligently. By implementing stemming, sites can deliver results more akin to Google, where a search for “sneaks” surfaces results that may include sneakers, running shoes, and slippers. 

In this post, we’ll unpack how stemming works and why integrating it into your website can improve search relevance and content discovery for your visitors.

What is Keyword Stemming and Why Does It Matter?

Stemming is the process of reducing words down to their base or “root” form. For example, Coveo’s search engine stems the words “search,” “searching,” and “searched” to the common root word “search.”

A keyword stemming example.
A keyword stemming example. 

Stemming matters for digital experiences because it allows queries to match documents that may not contain the exact keyword searched, but do contain variations of that root word. 

Coveo’s keyword stemming feature currently applies to words with four or more characters. It works by expanding the query variations of the root word, but also terms that are semantically relevant. 

This query expansion connects concepts in a relevant yet broader way. Without stemming enabled, a search is very rigid, with the system looking only for precise words or phrases matched exactly. 

Turning stemming on helps search surface more relevant content and recommendations, not just pages that contain the exact term or query.   

How Does Keyword Stemming Impact the Digital Experience?

Stemming transforms basic keyword searches in a way that connects concepts together – similar to how web search engines like Google work. This makes site search more user friendly and relevant. 

Here are a few ways stemming improves things:

  • Adds relevancy: Surfaces content that matches what people are looking for, even if they don’t get the query quite exactly right. Reduces “zero results” searches (when a search produces no results for the user).
  • Aids discovery: Stemming connects a search query to relevant materials that users might not have discovered (or even known to look for). Since stemming uses the root word or words of a keyword to find supplementary sources, it opens new doors to relevant content.
  • Boosts flexibility: Stemming streamlines the digital experience. Users don’t need to get their query 100% right or distinguish from singular or plural versions of a term. Stemming matches content despite small fluctuations. More flexibility = less hassle.
  • Reduces frustration: Since stemming surfaces more complete results versus dead ends, it reduces searcher frustration. Your audience doesn’t have to keep rewriting queries over and over, getting them answers they need faster and with less effort.

What Are the Different Types of Keyword Stemming Algorithms?

There are various keyword stemming algorithms that take different approaches to stemming like suffix stripping, statistical analysis, and rulesets to normalize keywords. Factors like speed, accuracy and multilingual support impact their effectiveness for text processing pipelines. 

Here’s a high-level overview of the most common stemmer algorithms: 

  • Porter’s stemmer: One of the most popular and effective algorithms, Porter’s stemmer works by eliminating common suffixes to get to the root word form. 
  • Snowball stemmer: Improves on Porter’s stemmer by handling multiple languages better.  
  • Lovins stemmer: Takes an aggressive approach by removing the longest suffix first.
  • Dawson stemmer: Builds on Lovins stemmer with an indexing system to remove suffixes. 
  • Krovetz stemmer: Converts different word forms to their singular present tense forms.
  • Xerox stemmer: Relies heavily on lexicons to over-stem words down to their roots.  
  • N-gram stemmer: Breaks words into consecutive letter pairs and uses statistics to relate similar words.
  • Lancaster stemmer: Uses external rule files and aggressive iterative truncating.  
  • Regexp stemmer: Allows custom regular expression rules to be defined for suffix removal.

What Are The Common Misconceptions About Keyword Stemming?

Stemming is a powerful technique for improving digital experience search functionality. But there are some common misconceptions about how to implement it which can impact the relevance of your search results.

Here are the top misconceptions about keyword stemming:

  • It will lead to irrelevant or unrelated search results: It’s true that stemming expands a search leading to more matches across your content repository, but this doesn’t mean you’re facing a barrage of irrelevant results. When properly implemented, stemming matches queries to relevant supplemental materials, not tangential content.
  • Stemming creates query expansion anarchy: Effective stemming algorithms normalize words to root forms in a controlled way. They achieve this by using predefined rules and logic which prevents “over-stemming” queries that could lead to broad or irrelevant results.   
  • Stemming is unwieldy for ecommerce: On the contrary, stemming helps manage large indexes and product catalogs, where creating exhaustive thesaurus rules would be impossible. Well-formatted data combined with stemming mitigates this issue, allowing you to surface relevant results without manually creating exhaustive rules.
  • Stemming causes “result collision”: Result collisions occur when multiple results from a search query are too similar or identical. This causes redundant information in the search results presented to the user. Stemming typically starts normalizing words at 5+ characters to result-collisions. And while stemming does not work out-of-the-box for exact phrase searches, it can be enabled to connect relevant phrasal concepts which further reduce result collisions.  

Stemming is an incredibly useful tool, but it’s not a magic bullet. When done right, it effectively connects the dots between search queries and content in helpful ways. 

Stemming Is One Of Many Steps to Search Relevance

On its own, stemming doesn’t guarantee relevance since relevance is contextual. What satisfies one searcher may not satisfy the next. There is no one-size-fits-all approach. Still, digital experiences should emulate the kind of relevancy that customers are used to getting from search engines like Google – aiming to deliver useful, personalized results, limit redundancies, and eliminate blank results pages. 

Achieving a truly relevant search result requires looking beyond basic keyword matching to connect word variations more intelligently to content. AI-powered platforms like Coveo can help you achieve this. Coveo uses machine learning to match a root keyword with related keywords, surfacing the most appropriate content for a given search. 

It’s also important to continuously monitor search relevance. Understanding what search relevance metrics to measure allows you to fine-tune your stemming approach, ensuring that stemming remains impactful in helping your website visitors find more of what they need. Ultimately, this translates to happier customers, higher conversions, and an alignment with your business objectives.

Relevant Reading
Blog | Panning for Gold: What’s the Right Search Relevance Metric for Your Organization?