General guidance on conducting A/B experiments on cloud retail solution

This page describes how you can use A/B experiments to understand how Retail is impacting your business.


An A/B experiment is a randomized experiment with two groups: an experimental group and a control group. The experimental group receives some different treatment (in this case, predictions or search results from the Retail API); the control group does not.

When you run an A/B experiment with the Retail API, you include the information about which group a user was in when you record user events. The Retail API uses that information to refine the model and provide metrics.

Both versions of your application must be the same, except that users in the experimental group see results generated by the Retail API and the control group does not. You log user events for both groups.

For more on traffic splitting, see Splitting Traffic in the App Engine documentation.

Experiment platforms

Set up the experiment using a third-party experiment platform such as Google Optimize or Optimizely. The control and experimental groups each get a unique experiment ID from the platform. When you record a user event, specify which group the user is in by including the experiment ID in the experimentIds field. Providing the experiment ID enables the Retail API to compare the metrics for the versions of your application seen by the control and experimental groups.

Best practices for A/B experiments

The goal of an A/B experiment is to accurately determine the impact of updating your site (in this case, employing Retail). To get an accurate measure of the impact, you must design and implement the experiment correctly, so that other differences do not creep in and impact the experiment results.

To design a meaningful A/B experiment, use the following tips:

  • Before setting up your A/B experiment, use prediction or search preview to ensure that your model is behaving as you expect.

  • Make sure that the behavior of your site is identical for the experimental group and the control group.

    Site behavior includes latency, display format, text format, page layout, image quality, and image size. There should be no discernible differences for any of these attributes between the experience of the control and experiment groups.

  • Accept and display results as they are returned from Retail, and display them in the same order as they are returned.

    Filtering out items that are out of stock is acceptable. However, you should avoid filtering or ordering results based on your business rules.

  • If you include an attribution token with your user events, make sure it is set up correctly. See the documentation for Attribution tokens.

  • Make sure that the serving config you provide when you request recommendations or search results matches your intention for that recommendation or search result, and the location where you display the results.

    When you use Recommendations, the serving config affects how models are trained and therefore what products are recommended. Learn more.

  • If you are comparing an existing solution with the Retail API, keep the experience of the control group strictly segregated from the experience of the experimental group.

    If the control solution does not provide a recommendation or search result, do not provide one from the Retail API in the control pages. Doing so will skew your test results.

    Make sure your users don't switch between the control group and the experiment group. This is especially important within the same session, but also recommended across sessions. This improves experiment performance and helps you get statistically significant A/B test results sooner.