Overview of hyperparameter tuning

This page describes the concepts involved in hyperparameter tuning, which is the automated model enhancer provided by AI Platform Training. Hyperparameter tuning takes advantage of the processing infrastructure of Google Cloud to test different hyperparameter configurations when training your model. It can give you optimized values for hyperparameters, which maximizes your model's predictive accuracy.

What's a hyperparameter?

Hyperparameters contain the data that govern the training process itself.

Your training application handles three categories of data as it trains your model:

  • Your input data (also called training data) is a collection of individual records (instances) containing the features important to your machine learning problem. This data is used during training to configure your model to accurately make predictions about new instances of similar data. However, the values in your input data never directly become part of your model.

  • Your model's parameters are the variables that your chosen machine learning technique uses to adjust to your data. For example, a deep neural network (DNN) is composed of processing nodes (neurons), each with an operation performed on data as it travels through the network. When your DNN is trained, each node has a weight value that tells your model how much impact it has on the final prediction. Those weights are an example of your model's parameters. In many ways, your model's parameters are the model—they are what distinguishes your particular model from other models of the same type working on similar data.

  • Your hyperparameters are the variables that govern the training process itself. For example, part of setting up a deep neural network is deciding how many hidden layers of nodes to use between the input layer and the output layer, and how many nodes each layer should use. These variables are not directly related to the training data. They are configuration variables. Note that parameters change during a training job, while hyperparameters are usually constant during a job.

Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best values. Hyperparameters are tuned by running your whole training job, looking at the aggregate accuracy, and adjusting. In both cases you are modifying the composition of your model in an effort to find the best combination to handle your problem.

Without an automated technology like AI Platform Training hyperparameter tuning, you need to make manual adjustments to the hyperparameters over the course of many training runs to arrive at the optimal values. Hyperparameter tuning makes the process of determining the best hyperparameter settings easier and less tedious.

How hyperparameter tuning works

Hyperparameter tuning works by running multiple trials in a single training job. Each trial is a complete execution of your training application with values for your chosen hyperparameters, set within limits you specify. The AI Platform Training training service keeps track of the results of each trial and makes adjustments for subsequent trials. When the job is finished, you can get a summary of all the trials along with the most effective configuration of values according to the criteria you specify.

Hyperparameter tuning requires explicit communication between the AI Platform Training training service and your training application. Your training application defines all the information that your model needs. You must define the hyperparameters (variables) that you want to adjust, and a target value for each hyperparameter.

To learn how AI Platform Training uses Bayesian optimization for hyperparameter tuning, read the blog post named Hyperparameter Tuning in Cloud Machine Learning Engine using Bayesian Optimization.

In addition to Bayesian optimization, AI Platform Training optimizes across hyperparameter tuning jobs. If you are doing hyperparameter tuning against similar models, changing only the objective function or adding a new input column, AI Platform Training is able to improve over time and make the hyperparameter tuning more efficient.

What hyperparameter tuning optimizes

Hyperparameter tuning optimizes a single target variable, also called the hyperparameter metric, that you specify. The accuracy of the model, as calculated from an evaluation pass, is a common metric. The metric must be a numeric value, and you can specify whether you want to tune your model to maximize or minimize your metric.

When you start a job with hyperparameter tuning, you establish the name of your hyperparameter metric. This is the name you assign to the scalar summary that you add to your training application.

The default name of the metric is training/hptuning/metric. We recommend that you assign a custom name. The only functional difference is that if you use a custom name, you must set the hyperparameterMetricTag value in the HyperparameterSpec object in your job request to match your chosen name.

How AI Platform Training gets your metric

For TensorFlow models, the AI Platform Training service monitors TensorFlow summary events generated by your training application and retrieves the metric. If your model was built with another framework or uses a custom container, you need to use the cloudml-hypertune Python package to report your training metric to AI Platform Training.

The flow of hyperparameter values

Without hyperparameter tuning, you can set your hyperparameters by whatever means you like in your training application. For example, you can configure the hyperparameters by passing command-line arguments to your main application module, or feed them to your application in a configuration file.

When you use hyperparameter tuning, you must use the following procedure to set the values of the hyperparameters that you're using for tuning:

  • Define a command-line argument in your main training module for each tuned hyperparameter.

  • Use the value passed in those arguments to set the corresponding hyperparameter in your application's TensorFlow code.

When you configure a training job with hyperparameter tuning, you define each hyperparameter to tune, its type, and the range of values to try. You identify each hyperparameter using the same name as the corresponding argument you defined in your main module. The training service includes command-line arguments using these names when it runs your application.

Selecting hyperparameters to tune

There is very little universal advice to give about how to choose which hyperparameters you should tune. If you have experience with the machine learning technique that you're using, you may have insight into how its hyperparameters behave. You may also be able to find advice from machine learning communities.

However you choose them, it's important to understand the implications. Every hyperparameter that you choose to tune has the potential to increase the number of trials required for a successful tuning job. When you train on AI Platform Training you are charged for the duration of the job; a careful choice of hyperparameters to tune can reduce the time and cost of training your model.

Hyperparameter types

The supported hyperparameter types are listed in the job reference documentation. In the ParameterSpec object, you specify the type for each hyperparameter and the related value ranges as described in the following table:

Type Value ranges Value data
DOUBLE minValue & maxValue Floating-point values
INTEGER minValue & maxValue Integer values
CATEGORICAL categoricalValues List of category strings
DISCRETE discreteValues List of values in ascending order

Hyperparameter scaling

You can specify a type of scaling to be performed on a hyperparameter. Scaling is recommended for DOUBLE and INTEGER types. The available scaling types are:

  • UNIT_LINEAR_SCALE
  • UNIT_LOG_SCALE
  • UNIT_REVERSE_LOG_SCALE

Search algorithms

You can specify a search algorithm in the HyperparameterSpec object. If you do not specify an algorithm, your job uses the default AI Platform Training algorithm, which drives the parameter search to arrive at the optimal solution with a more effective search over the parameter space.

Available values are:

  • ALGORITHM_UNSPECIFIED: Results in the same behavior as when you don't specify a search algorithm. AI Platform Training uses a default algorithm, which applies Bayesian optimization to search the space of possible hyperparameter values, resulting in the most effective technique for your set of hyperparameters.

  • GRID_SEARCH: A simple grid search within the feasible space. This option is particularly useful if you want to specify a number of trials that is more than the number of points in the feasible space. In such cases, if you do not specify a grid search, the AI Platform Training default algorithm may generate duplicate suggestions. To use grid search, all parameters must be of type INTEGER, CATEGORICAL, or DISCRETE.

  • RANDOM_SEARCH: A simple random search within the feasible space.

What's next