Recommendations in TensorFlow: Train and Tune on AI Platform

This article is the second part of a multi-part tutorial series that shows you how to implement a machine learning (ML) recommendation system with TensorFlow and AI Platform. In this part, you learn how to train the recommendation system and tune hyperparameters using AI Platform in Google Cloud Platform (GCP).

The series consists of these parts:

This tutorial assumes that you have completed the preceding tutorial in the series.

Objectives

  • Learn how to run a training job on AI Platform to perform MovieLens dataset recommendations.
  • Tune AI Platform hyperparameters and optimize the TensorFlow WALS recommendation model for the MovieLens dataset.

Costs

This tutorial uses Cloud Storage and AI Platform, which are billable services. You can use the pricing calculator to estimate the costs for your projected usage. The projected cost for this tutorial is $0.20. If you are a new GCP user, you might be eligible for a free trial.

Before you begin

Follow the instructions in Part 1 to set up your GCP project.

Training the model

The first tutorial in this set reviewed the implementation of the WALS algorithm in TensorFlow. This tutorial shows you how to train the model with AI Platform. In this context, "training the model" means factoring a sparse matrix of ratings R into a user factor matrix X and item factor matrix Y. The resulting user factors serve as the base model for a recommendation system.

You deploy the recommendation system on GCP in Part 4.

Train jobs on AI Platform

Training a model with AI Platform requires specifying a job directory, which is a Cloud Storage bucket folder. To execute a training job follow these steps:

  1. Create a new Cloud Storage bucket in your project, or use an existing bucket.

    To create a new bucket, in the Cloud Console, select Cloud Storage > Browser, and then click Create Bucket.

    Remember the name you give this bucket. It's a good idea to put the bucket in the same region as your Compute Engine instance.

  2. In your shell, set the environment variable BUCKET to the Cloud Storage bucket URL of the bucket you are using.

    BUCKET=gs://[YOUR_BUCKET_NAME]
  3. Copy the MovieLens datasets to the bucket using the gsutil tool:

    gsutil cp -r data/u.data $BUCKET/data/u.data
    gsutil cp -r data/ratings.dat $BUCKET/data/ratings.dat
    gsutil cp -r data/ratings.csv $BUCKET/data/ratings.csv
  4. Run the job by executing the training script in the wals_ml_engine directory, setting the train option, and specifying the Cloud Storage bucket URL and the path to the data file inside the bucket.

    The dataset you use depends on which you decided to use in Part 1. Set additional options appropriate to the data file, such as the delimiter or header:

    cd wals_ml_engine
    • For the MovieLens 100k dataset, specify the path to the 100k data file:

      ./mltrain.sh train ${BUCKET} data/u.data
    • For the 1m dataset, include the --delimiter option and specify the path to the 1m data file:

      ./mltrain.sh train ${BUCKET} data/ratings.dat --delimiter ::
    • For the 20m dataset, use the --delimiter and --headers options:

      ./mltrain.sh train ${BUCKET} data/ratings.csv --delimiter , --headers

    You can monitor the status and output of the job on the Jobs page of the AI Platform section of the Cloud Console. Click Logs to view the job output. Results are logged as a Root Mean Squared Error (RMSE) on the test set. RMSE represents the average error for the model of the predicted user rating across the entire test set of ratings.

Save the model

After factorization, the factor matrices are saved in five separate files in numpy format so they can be used to perform recommendations. Part 3 of the tutorial set explains the model files, and shows you how to generate recommendations using them. Part 4 shows you how to deploy a production system to perform recommendations. When the model is trained locally, the files are saved in the jobs folder inside the code package. When the model is trained on AI Platform, the files are saved in a Cloud Storage bucket provided by the job-dir argument to the AI Platform job described in the previous section.

MovieLens dataset results

The results of the matrix factorization approximations are based on the predicted ratings for the test set. The test set was extracted from the ratings matrix during preprocessing. To compute the difference between the predicted ratings and the actual user-supplied test set ratings, use the loss formula outlined in Part 1:

$$ L = \sum_{u,i}(r_{ui} - x^{T}_{u} \cdot y_{i})^{2} $$

Here, \(r_{ui}\) are the test set ratings, and \(x_{u}\) and \(y_{i}\) are the row and column factors computed by applying the WALS factorization to the training set.

The performance of the matrix factorization is highly dependent on several hyperparameters, which are discussed in more detail in the next section of this document. Using the 1m MovieLens dataset and the default set of hyperparameters listed in table 1, an RMSE of 1.06 was achieved on the test set. The RMSE corresponds to the average error in the predicted ratings compared to the test set. On average, each rating produced by the algorithm is within ±1.06 of the actual user rating in the 1m dataset test set. A user-defined rating of 3 is likely to generate a predicted rating between 2 and 4, but it's unlikely to produce 1 or 5. This isn't a bad result, but published results for this dataset achieve an RMSE less than 1.0.

To improve the result, you must tune the hyperparameters listed in table 1.

Hyperparameter name and description Default Value Scale
latent_factors
Number of latent factors K
5 UNIT_REVERSE_LOG_SCALE
regularization
L2 Regularization constant
0.07 UNIT_REVERSE_LOG_SCALE
unobs_weight
Weight on unobserved ratings matrix entries
0.01 UNIT_REVERSE_LOG_SCALE
feature_wt_factor
Weight on observed entries
130 UNIT_LINEAR_SCALE
feature_wt_exp
Feature weight exponent
1 UNIT_LOG_SCALE
num_iters
Number of alternating least squares iterations
20 UNIT_LINEAR_SCALE

Table 1. Hyperparameter names and default values used in the model

Tuning hyperparameters

Finding the optimal set of hyperparameters is critical to the performance of machine learning models. Unfortunately, theory provides only scant guidance. Data scientists are forced to optimize by experimenting with various values over reasonable ranges, testing the resulting performance of the model, and then picking a combination of parameters with the best performance. This can be a time-consuming and costly process in terms of person-hours and computational resources. The space of possible hyperparameter combinations grows exponentially based on the number of different parameters in the model. Searching the entire space isn't feasible. This forces you to make assumptions about what factors affect the model based on heuristics, prior experience, and knowledge of the mathematical properties of each parameter.

AI Platform includes a hyperparameter tuning feature, which automatically searches for an optimal set of hyperparameters. To use the tuning feature, you provide a list of hyperparameters you want to tune, as well as expected ranges or values for these parameters. AI Platform executes a search over the space of hyperparameters, running as many different trials as you like, and returns a ranked list of results for the best-performing hyperparameters found in all trials. You can provide the presumed scale of the parameter, log or linear, as an additional hint for the search process.

For more information on hyperparameter tuning, refer to the AI Platform documentation. For more information on the underlying algorithm used for hyperparameter tuning, see the blog post Hyperparameter tuning in Cloud Machine Learning Engine using Bayesian Optimization.

The hyperparameter configuration file

The hyperparameter list for AI Platform can be provided in a JSON or YAML configuration file. In this tutorial's sample code, the hyperparameter tuning configuration is defined by config/config_tune.json. Each of the hyperparameters tuned in this tutorial is listed in this file, along with minimum range, maximum range, and scale. For valid parameter scale values, see Overview of Hyperparameter Tuning.

The standard_gpu machine type is specified in the scaleTier parameter, so tuning takes place on a GPU-provisioned machine. The configuration file looks like this:

{
  "trainingInput":{
    "scaleTier":"CUSTOM",
    "masterType":"standard_gpu",
    "hyperparameters":{
      "goal":"MINIMIZE",
      "params":[
        {
          "parameterName":"regularization",
          "type":"DOUBLE",
          "minValue":"0.001",
          "maxValue":"10.0",
          "scaleType":"UNIT_REVERSE_LOG_SCALE"
        },
        {
          "parameterName":"latent_factors",
          "type":"INTEGER",
          "minValue":"5",
          "maxValue":"50",
          "scaleType":"UNIT_REVERSE_LOG_SCALE"
        },
        {
          "parameterName":"unobs_weight",
          "type":"DOUBLE",
          "minValue":"0.001",
          "maxValue":"5.0",
          "scaleType":"UNIT_REVERSE_LOG_SCALE"
        },
        {
          "parameterName":"feature_wt_factor",
          "type":"DOUBLE",
          "minValue":"1",
          "maxValue":"200",
          "scaleType":"UNIT_LOG_SCALE"
        }
      ],
      "maxTrials":500
    }
  }
}

Hyperparameter tuning code

The model code includes the following features to allow for tuning hyperparameters:

  • Each hyperparameter is passed as an argument to the hyperparameter tuning job on AI Platform. In this case, the task.py file, which serves as the entry point to the job, processes the hyperparameter arguments. You must make sure that the name of the argument matches the hyperparameter name listed in the hyperparameter configuration file.
  • The model writes a TensorFlow summary with a special tag, training/hptuning/metric, that's set to the metric that evaluates the quality of the model. In this case, RMSE is the test set metric. This summary metric enables the search process of the AI Platform hyperparameter tuning service to rank the trials. The summary metric value is written out in a utility function in util.py:

    summary = Summary(value=[Summary.Value(tag='training/hptuning/metric',
                                           simple_value=metric)])
    
    eval_path = os.path.join(args['output_dir'], 'eval')
    summary_writer = tf.summary.FileWriter(eval_path)
    
    # Note: adding the summary to the writer is enough for hyperparam tuning.
    # The ml engine system is looking for any summary added with the
    # hyperparam metric tag.
    summary_writer.add_summary(summary)
  • It's important to have a separate output directory for each trial. The output directory is used for writing the TensorFlow summary and the saved model. If you don't create a different output directory for each trial, the results of each trial overwrite results from the previous trial. Creating a unique directory for each trial is handled in task.py by this code in the parse_arguments method :

    if args.hypertune:
      # if tuning, join the trial number to the output path
      trial = json.loads(os.environ.get('TF_CONFIG', '{}')).get('task', {}).get('trial', '')
      output_dir = os.path.join(job_dir, trial)
    else:
      output_dir = os.path.join(job_dir, args.job_name)

    Here the parse_arguments function distinguishes between tuning and standard output, and alters the output_dir accordingly.

Run the hyperparameter tuning job

The following command runs the tuning job on the 100k dataset:

./mltrain.sh tune $BUCKET data/u.data

Make sure that the BUCKET variable is set to the bucket you created earlier. The small size of the 100k dataset allows for a large number of trials to be run, in this case 500.

Results of tuning

The results of hyperparameter tuning are stored in the AI Platform job data, which you can access in the Jobs page of the AI Platform area of the Cloud Console. As you can see in figure 1, the job results include the best RMSE score across all trials of the summary metric. You can see the results of the 500-trial hyperparameter tuning on the 100k MovieLens dataset. The best results occurred in trial 384.

Hyperparameter tuning job results, highlighting the results from
        job 384
Figure 1. Hyperparameter tuning job results

Hyperparameter tuning can make a big difference in the final results. In this case, hyperparameter tuning on the 100k test set achieved an RMSE of 0.98. Applying these parameters to the 1m and 20m datasets resulted in RMSE values of 0.90 and 0.88, respectively. The optimal parameters are listed in table 2, and the RMSE values before and after tuning are summarized in table 3.

Hyperparameter Name Description Value From Tuning
latent_factors Latent factors K 34
regularization L2 Regularization constant 9.83
unobs_weight Unobserved weight 0.001
feature_wt_factor Observed weight 189.8
feature_wt_exp Feature weight exponent N/A
num_iters Number of iterations N/A

Table 2. Values discovered by AI Platform hyperparameter tuning

The feature weight exponent was not part of the tuning parameters because the linear observed weight is used for the MovieLens dataset. The default value was used for num_iters, the number of iterations parameter.

Dataset RMSE with default hyperparameters RMSE after hyperparameter tuning
100k 1.06 0.98
1m 1.11 0.90
20m 1.30 0.88

Table 3. Summary of RMSE values on the test set for the different MovieLens datasets, before and after hyperparameter tuning

What's next

Recommendations in TensorFlow: Apply to Data from Google Analytics (Part 3) of the solution shows how to apply the recommendation model to live data from Google Analytics.