Multi-Armed Bandit

In the context of machine learning, “multi-armed bandit” is an algorithm-based traffic allocation strategy that aims to optimize conversions. Picture each variation of a user experience is an “arm.” The algorithms send traffic to the arm that they believe will produce the best results in relation to that user.

< Back to Glossary

The exploration-exploitation tradeoff

The name “multi-armed bandit” (MAB) has its roots in probability theory. Picture a gambler standing in a row of slot machines, also called “one-armed bandits.” The gambler’s objective is to maximize his winnings by deciding which machines to play, how often to play them, and in what order. The gambler will have to balance exploration – testing each machine and building a theory about which could provide the best payout – and exploitation – testing his theory in order to reach the expected payout. That’s the exploration-exploitation tradeoff.

To stick with the analogy of the gambler, we can think of your web traffic as the coins and your site or app variations as the slot machines. A good machine learning algorithm will observe user behaviour, then send users to variations of pages that it believes have the best chance of converting them.

How is the multi-armed bandit different from A/B testing?

A/B testing typically splits traffic evenly among variations. The goal is to determine which variation performs best after a specified number of visitors or a pre-determined time frame. When there’s a clear winner, that variation becomes the new control.

The issue with this approach is that you’re losing out on lots of potential conversions from the traffic that was sent to the losing variation during the test.

With the multi-armed bandit, users are sent to variations that the MAB algorithm believes to be most effective for each user. Rather than a clear-cut 50/50 split, traffic is optimized for conversions. So the MAB approach focuses on exploitation, while the A/B testing approach focuses on exploration.

How does the multi-armed bandit work for personalization?

The multi-armed bandit approach allows you to deliver hyper-personalized experiences to your users. By observing the actions of previous visitors, the MAB algorithm builds a database of “user stereotypes.” Those stereotypes could be based on user actions, attributes, or environmental factors. Then, the algorithm cues your website to serve content that historically works best for people who fit that stereotype. It might look something like:

Your visitor’s IP address tells the algorithm that they’re from Oklahoma. Historically, people from the midwest click through to your best-sellers page when they see variation 3 of your homepage. So your site serves them Homepage 3.
Your visitor added a pair of running shoes and a sports bra to their cart. Other customers who’ve bought those two items also tend to buy shorts. At checkout, your site shows a variation of an upsell page that recommends adding shorts to the purchase.
Your visitor immediately opted-in to your email list for a 15% discount on their first purchase. Typically, people who opt-in use their discount to buy a higher-priced product. When they navigate to a collections page, they see a variation with expensive products placed at the top.

Want more info about how you can implement personalization in your ecommerce store? Click here.