Predicting Customer Lifetime Value with AI Platform: introduction

This article is the first part of a four-part series that discusses how you can predict customer lifetime value (CLV) by using AI Platform (AI Platform) on Google Cloud.

The articles in this series include the following:


Many advertisers try to tailor their advertisements to individuals or groups of similar users, but they don't always market to their most valuable customers. The Pareto principle is often cited in reference to sales, predicting that 20% of your customers represent 80% of your sales. What if you could identify which of your customers make up that 20%, not just historically, but in the future as well? Predicting customer lifetime value (CLV) is a way to identify those customers.

The goals of this series are as follows:

  • Explain the concepts of CLV modeling.
  • Compare two approaches to CLV modeling.
  • Show how to implement CLV models on Google Cloud.

This solution compares two different approaches to CLV modeling: probabilistic models and machine learning (ML) models. It provides an implementation of each approach and presents the results when each model is applied to a public dataset. The articles in the series focus on implementing the modeling system on Google Cloud.

When to use this approach

You can use CLV models to answer these types of questions about customers:

  • Number of purchases: How many purchases will the customer make in a given future time range?
  • Lifetime: How much time will pass before the customer becomes permanently inactive?
  • Monetary value: How much monetary value will the customer generate in a given future time range?

When you're predicting future lifetime value, there are two distinct problems which require different data and modeling strategies:

  • Predict the future value for existing customers who have a known transaction history.
  • Predict the future value for new customers who just made their first purchase.

This series is focused on the first problem.

Many companies predict CLVs only by looking at the total monetary amount of sales, without understanding context. For example, a customer who makes one big order might be less valuable than another customer who buys multiple times, but in smaller amounts. CLV modeling can help you better understand the buying profile of your customers and help you value your business more accurately.

By using the approaches described in this series to predict your customers' value, you can prioritize your next actions, such as the following:

  • Decide how much to invest in advertising.
  • Decide which customers to target with advertising.
  • Plan how to move customers from one segment to another.

The models that are used in this series are not suitable for businesses in which you can observe and measure customer churn directly. For example, you shouldn't use these models for businesses that are based on subscriptions, customer accounts, or contracts that can be cancelled. The models in this series instead assume that users engage with the business at will, such as in e-commerce stores in which users might make purchases at any time. In addition, the models that are described in this series are best applied to predict the future value for existing customers who have at least a moderate amount of transaction history.

CLV concepts: RFM

Three important inputs into CLV models are recency, frequency, and monetary value:

  • Recency: When was the customer's last order?
  • Frequency: How often do they buy?
  • Monetary: What amount do they spend?

The following diagram shows a succession of past sales for a set of four customers.

Sales history for 4 customers

The diagram illustrates the RFM values for the customers, showing for each customer:

  • Recency: The time between the last purchase and today, represented by the distance between the leftmost circle and the vertical dotted line that's labeled Now.
  • Frequency: The time between purchases, represented by the distance between the circles on a single line.
  • Monetary: The amount of money spent on each purchase, represented by the size of the circle. This amount could be the average order value or the quantity of products that the customer ordered.

In the models that are used in this series, only historical sales data is used to calculate CLV. The RFM value inputs are calculated from the transaction history for each customer.

Two modeling methods for CLV

As noted earlier, this series of articles compares two approaches to calculate CLV:

  • Probabilistic models. These models work by fitting a probability distribution to the observed RFM values for customers. These models are based on buying behavior that's defined by the transaction history for each customer. That data is enough to extract RFM values.
  • ML models. These models are an extensive, widely used class of statistical models in which the parameters are fitted to the data by training with stochastic gradient descent. ML models can make use of more features than the probabilistic models. In this series, we use deep neural network (DNN) models, a popular class of ML models. We also show how to use AutoML Tables to automatically create an ML model.

For probabilistic models, this solution uses an existing Lifetimes library that relies on DataFrames with Pandas. You could potentially build an equivalent library in TensorFlow to perform the same tasks as the Lifetimes library.

Probabilistic models

Probabilistic models use RFM values that are computed from a list of order transactions. Each transaction consists of a customer ID, an order date, and an order value. Different models are appropriate for modeling different customer relationships. Part 2 of the article series shows how to employ the following models that are available in the Lifetimes library:

  • Pareto/negative binomial distribution (NBD)
  • Beta-geometric (BG/NBD)


The Pareto/NBD model was originally developed by Schmittlein et al. This model is used for non-contractual situations in which customers can make purchases at any time. Using four parameters, it describes the rate at which customers make purchases and the rate at which they stop being a customer of the business. You use the model by optimizing the parameters to provide a best fit to a set of RFM data.


The Pareto/NBD model has been widely used but is challenging to implement, because the parameter estimation is computationally intense. To address these concerns, Fader and Hardie developed the BG/NBD model. The BG/NBD model, like the Pareto/NBD model, is used for non-contractual situations in which customers can make purchases at any time. The BG/NBD model also uses four parameters to describe the rate at which customers make purchases and the rate at which they drop out. However, the BG/NBD model is easier to implement than Pareto/NBD and runs faster. The two models tend to yield similar results.

Calculating CLV with probabilistic models

Using the probabilistic models is a multi-step process. The code in the model performs the following tasks:

  1. Preprocesses the transaction data to calculate RFM values.
  2. Uses the Lifetimes module to optimize the parameters of the appropriate model, either Pareto/NBD or BG/NBD, to fit the RFM data.
  3. Calculates predicted monetary value for each customer.

Creating a model for the monetary value is complex, because many parameters, such as product prices that change over time, are not represented by transaction data alone. The probabilistic method assumes that monetary value follows a gamma-gamma distribution. The code from the Lifetimes library includes a gamma-gamma distribution method that you can use to compute CLV given a fitted probabilistic model. You will learn more about using the Lifetimes library to generate CLV predictions in Part 2 of the series.

Machine learning models

ML models represent a good alternative to probabilistic models. These articles discuss using DNNs, which you implement in TensorFlow by using the Estimator interface.

This solution uses two techniques to improve the performance of the DNNs:

  • Batch normalization, which has these benefits:

    • It normalizes and decorrelates numerical values in features. Because batch normalization performs this step, you don't have to do it yourself.
    • It minimizes the impact of changing weights over time, which affects the importance of each feature.
  • Learning rate decay in which the learning rate is decreased exponentially over time, helping to prevent oscillations in the loss when the learning rate gets close to the minimum.

This set of articles shows two implementations of DNN in TensorFlow:

  1. Canned Estimator DNN. TensorFlow includes pre-implemented DNN models that conform to the Estimator interface. The sample code for this series includes a model based on the pre-implemented DNNRegressor.
  2. Custom Estimator DNN. You get increased flexibility to include advanced techniques in the model by using a custom Estimator. The custom Estimator implementation in the sample code illustrates this flexibility by incorporating learning rate decay and batch normalization.

One of the advantages of DNNs is that they can incorporate many features. As a result, their quality often depends on feature engineering. The data preparation section of Part 2 of this series explains how to create features from the available inputs for potentially large datasets.

What's next

Read the next parts of this series, which explain how you can implement CLV prediction:

Was this page helpful? Let us know how we did:

Send feedback about...