The Ecommerce Machine Learning Algorithm – Up Close and Personal
Throughout the years, the range of tools that allow you to leverage machine learning (ML) for ecommerce has grown exponentially. At Coveo-Qubit, we leverage the power of artificial intelligence (AI) and ML to drive meaningful personalization, at scale.
In this article, we’re going to take a close-up look at what’s required to leverage new technologies such as machine learning (spoiler: without data, it’s not going to work!) and how Coveo-Qubit is continuously investing in opening up the power of machine learning to improve customer experience.
First of all, let’s look at the differences between artificial intelligence, machine learning, and deep learning.
- Artificial intelligence is the overarching concept that involves machines that can imitate human behavior.
- Machine learning is a way of achieving AI. It describes the ability to learn without being explicitly programmed. It is a way of training an algorithm so that it can learn how to make decisions.
- Deep learning refers to a specific class of machine learning algorithms, where algorithms are stacked together in “deep” layers. Most commonly, this stacking is done using artificial neural networks.
When looking at how machine learning algorithms can support your efforts in providing a more personalized experience to your customers, it’s important to clarify the different types of objective that can be achieved.
What Machine Learning Can Do For Ecommerce
There are four major areas where machine learning technology can provide value for ecommerce businesses:
- Scaling out customer understanding, by slicing and dicing them automatically into valuable segments.
- Predicting customer preferences. This is usually the objective covered by product recommendations, but has the potential to be used in many different ways.
- Predicting customer intent.
- Predicting customer value.
At Coveo-Qubit, we believe there are specific use cases for ecommerce ML and ways to achieve each of these objectives by creating personalized and targeted experiences for consumers.
A great example of predicting customer preferences is our real-time category prediction model, which is based on real-time visitor behavior. It provides us with a list of categories we predict a visitor will most likely prefer to interact with next.
The same model could also be used by ecommerce businesses to generate product recommendations or tailor email campaigns.
Why is Investing in ML Important?
Personalization is about understanding your visitors and using that information to make informed decisions that create a personalized customer experience. The volume of data necessary to make the decisions on how to proceed is vast.
Filtering, analyzing and using this customer data becomes harder and harder — and is increasingly beyond the capacity of humans to process in a timely way. But the data is important, so automation and ML are the next steps in personalization, overseen and directed by marketers and merchandisers.
This opens up possibilities for scale and insight that online retailers have never been seen before.
How Do You Train an ML Model?
Training a machine learning model is done by taking a chunk of data, splitting it up, and “feeding” a subset of it to a model. Then you compare the model’s output against the real result of the other chunks of the data. And then repeat over and over, until the model’s accuracy is at an acceptable level.
To get the most accurate model, you need three things:
The first requirement is a dataset to train the ML model, which comprehensively captures the variation in the phenomenon you are trying to capture. The more variation, the larger the dataset needs to be to train the model sufficiently. One of the most famous examples is Google’s Inception model for image recognition; it was trained on 10 million images.
For personalization, the dataset is usually user behavior. Because customer behavior is so variable, more data is required.
The second requirement is a set of ML algorithms where the performance has demonstrably been shown to scale with data. Coveo-Qubit’s data platform enables us to collect all of the user behavior data in a structured way, which allows us to continuously improve our models for them.
The third requirement is a regime where the distribution of the data in the training set at least approximately matches the distribution of data in the real world. That is, the data you have on a phenomenon need to be a fair representation (albeit at smaller scale) of the patterns in that phenomenon in real life… for example, the data you have about the products sold online needs to be an accurate representation of the real-world data of all products sold online.
All together, this means you need enough of the right data, and a model that works with that data on a large scale.
How is Coveo-Qubit Investing in ML?
From our very beginnings, we made the decision to be serious about data. Serious about the infrastructure which enables our customers to collect all the different types of information that were valuable to their businesses. This information has now become the foundation and the fuel for machine learning, in both type, structure and volume.
Some of the different types of data we use to feed our machine learning models are:
- Generic user characteristics: decide type, operating system name, time of activity
- User behavioral information: time on site, average price of products viewed
- Product catalog characteristics: most popular products, increasingly popular products
- Product information: product name, product price, product description
- User and product interaction: user-product co-occurrence, time in product
The more of this data is available, the more accurate and valuable the output of machine learning can be.
To make sure we have the right quantity and quality of data, we have collected over eight years’ worth of ecommerce data and taken the burden out of “preprocessing” (the various steps you go through to standardize and clean up the source data) by standardizing our data model across our entire customer base, tweaking the schema for each vertical. This means that any data point we collect, regardless of channel, is of a known type and structure.
Structured, clean, and consistent data is like gold dust for machine learning. It means we can develop a model once, and retrain it across all of our customers without having to start again from scratch, making us faster to build, test and scale our models.
Putting ML Models to Work
Data science is no easy feat! To build and train a machine learning model, you need the right volume of data, the right data structure and the right algorithms. Using it effectively means working at scale (because it is for the things too big or too fast for humans to deal with, reaching all of your customers with personalization in real time). And to do that, you need a pipeline that can collect and provide that data at scale and the right infrastructure to harness the power of these models.
Are you an online shopping retailer looking to harness the power of personalization? Take your ecommerce platform to the next level and refine your product recommendation by reading our blog Powerful Personalization in Ecommerce – No Big Data Required.