Overview of Hyperparameter Tuning

This page describes hyperparameter tuning, which is the automated model enhancer provided by Cloud Machine Learning Engine. Hyperparameter tuning takes advantage of the processing infrastructure of Google Cloud Platform to test different hyperparameter configurations when training your model. It can give you optimized values for hyperparameters, which maximizes your model's predictive accuracy.

What's a hyperparameter?

If you're new to machine learning, you may have never encountered the term hyperparameters before. Your trainer handles three categories of data as it trains your model:

  • Your input data (also called training data) is a collection of individual records (instances) containing the features important to your machine learning problem. This data is used during training to configure your model to accurately make predictions about new instances of similar data. However, the actual values in your input data never directly become part of your model.

  • Your model's parameters are the variables that your chosen machine learning technique uses to adjust to your data. For example, a deep neural network (DNN) is composed of processing nodes (neurons), each with an operation performed on data as it travels through the network. When your DNN is trained, each node has a weight value that tells your model how much impact it has on the final prediction. Those weights are an example of your model's parameters. In many ways, your model's parameters are the model—they are what distinguishes your particular model from other models of the same type working on similar data.

  • If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all. They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job.

Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best values. Hyperparameters are similarly tuned by running your whole training job, looking at the aggregate accuracy, and adjusting. In both cases you are modifying the composition of your model in an effort to find the best combination to handle your problem.

Without an automated technology like Cloud ML Engine hyperparameter tuning, you need to make manual adjustments to the hyperparameters over the course of many training runs to arrive at the optimal values. Hyperparameter tuning makes the process of determining the best hyperparameter settings easier and less tedious.

How it works

Hyperparameter tuning works by running multiple trials in a single training job. Each trial is a complete execution of your training application with values for your chosen hyperparameters set within limits you specify. The Cloud ML Engine training service keeps track of the results of each trial and makes adjustments for subsequent trials. When the job is finished, you can get a summary of all the trials along with the most effective configuration of values according to the criteria you specify.

Hyperparameter tuning requires explicit communication between the Cloud ML Engine training service and your training application. You define all the information that your model needs in your training application. The best way to think about this interaction is that you define the hyperparameters (variables) that you want to adjust and you define a target value.

To learn more about how Bayesian optimization is used for hyperparameter tuning in Cloud ML Engine, read the August 2017 Google Cloud Big Data and Machine Learning Blog post named Hyperparameter Tuning in Cloud Machine Learning Engine using Bayesian Optimization.

In addition to Bayesian optimization, Cloud ML Engine optimizes across hyperparameter tuning jobs. If you are doing hyperparameter tuning against similar models, changing only the objective function or adding a new input column, Cloud ML Engine is able to improve over time and make the hyperparameter tuning more efficient.

What it optimizes

Hyperparameter tuning optimizes a single target variable (also called the hyperparameter metric) that you specify. The accuracy of the model, as calculated from an evaluation pass, is a common metric. The metric must be a numeric value, and you can specify whether you want to tune your model to maximize or minimize your metric.

When you start a job with hyperparameter tuning, you establish the name of your hyperparameter metric. This is the name you assign to the scalar summary that you add to your trainer. You can use a custom name if you want, or you can use the default name of training/hptuning/metric. The only functional difference is that if you use a custom name you must set the hyperparameterMetricTag value in the HyperparameterSpec object you use in your job request to match your chosen name.

How Cloud ML Engine gets your metric

You may notice that there are no instructions in this documentation for passing your hyperparameter metric to the Cloud ML Engine training service. That's because the service automatically monitors TensorFlow summary events generated by your trainer and retrieves the metric.

The flow of hyperparameter values

Without hyperparameter tuning, you can set your hyperparameters by whatever means you like in your trainer. You might configure them according to command-line arguments to your main application module, or feed them to your application in a configuration file, for example. When you use hyperparameter tuning, you must set the values of the hyperparameters that you're using for tuning with a specific procedure:

  • Define a command-line argument for your main trainer module for each tuned hyperparameter.

  • Use the value passed for those arguments to set the corresponding hyperparameter in your trainer's TensorFlow code.

When you configure a training job with hyperparameter tuning, you define each hyperparameter to tune, its type, and the range of values to try. You identify each hyperparameter using exactly the same name as the corresponding argument you defined in your main module. The training service includes command-line arguments using these names when it runs your trainer.

Selecting hyperparameters to tune

There is very little universal advice to give about how to choose which hyperparameters you should tune. If you have experience with the machine learning technique that you're using, you may have insight into how its hyperparameters behave. You may also be able to find advice from machine learning communities.

However you choose them, it's important to understand the implications. Every hyperparameter that you choose to tune has the potential to increase the number of trials required for a successful tuning job. When you train on Cloud ML Engine you are charged for the duration of the job; a careful choice of hyperparameters to tune can reduce the time and cost of training your model.

Hyperparameter types

The supported hyperparameter types are listed in the job data reference page. The type you specify in your ParameterSpec object determines which value members you should use. These relationships are summarized in this table:

Type Value members Value data
DOUBLE minValue & maxValue Floating-point values
INTEGER minValue & maxValue Integer values
CATEGORICAL categoricalValues List of category strings
DISCRETE discreteValues List of values in ascending order

Search algorithms

You can specify a search algorithm in the HyperparameterSpec object. If you do not specify an algorithm, your job uses the default Cloud ML Engine algorithm, which drives the parameter search to arrive at the optimal solution with a more effective search over the parameter space.

Available values are:

  • GRID_SEARCH: A simple grid search within the feasible space. This option is particularly useful if you want to specify a number of trials that is more than the number of points in the feasible space. In such cases, if you do not specify a grid search, the Cloud ML Engine default algorithm may generate duplicate suggestions. To use grid search, all parameters must be of type INTEGER, CATEGORICAL, or DISCRETE.

  • RANDOM_SEARCH: A simple random search within the feasible space.

Hyperparameter scaling

You can specify a type of scaling to be performed on a hyperparameter. Scaling is recommended for DOUBLE and INTEGER types. The available scaling types are:


Setting a limit to the number of trials

You should decide how many trials you want to allow the service to run and set the maxTrials value of the HyperparameterSpec object in your job request. There are two competing interests to consider when deciding how many trials to allow: time (and consequently cost) and accuracy. Increasing the number of trials generally yields better results, but it is not always so. In most cases there is a point of diminishing returns after which additional trials have little or no effect on the accuracy. It may be best to start with a small number of trials to gauge the effect your chosen hyperparameters have on your model's accuracy before starting a job with a large number of trials.

To get the most out of hyperparameter tuning, you shouldn't set your maximum value lower than ten times the number of hyperparameters you use.

Understanding parallel trials

You can specify a number of trials to run in parallel as part of the HyperparameterSpec object in your job request. Running parallel trials has the benefit of reducing the time the training job takes (real time—the total processing time required is not typically changed). However, running in parallel can reduce the effectiveness of the tuning job overall. That is because hyperparameter tuning uses the results of previous trials to inform the values to assign to the hyperparameters of subsequent trials. When running in parallel, some trials will begin without having the benefit of the results of any trials still running.

If you use parallel trials, the training service provisions multiple training processing clusters (or multiple individual machines in the case of a single-process trainer). The scale tier that you set for your job is used for each individual training cluster.

Stopping trials early

You can specify that Cloud Machine Learning Engine automatically stop a trial that has become clearly unpromising. This saves you the cost of continuing a trial that is unlikely to be useful.

To permit stopping a trial early, set the enableTrialEarlyStopping value of the HyperparameterSpec to TRUE in your job request.

Resuming completed jobs

You can resume a completed hyperparameter tuning job. This makes it possible to reuse the knowledge gained in the previous hyperparameter tuning job and start from a state that is partially optimized.

To resume a hyperparameter tuning job, submit a new hyperparameter tuning job and set the resumePreviousJobId value of the HyperparameterSpec object to the job ID of the previous trial, and specify maxTrials and maxParallelTrials values.

Cloud Machine Learning Engine then uses the previous job ID to find and reuse the same goal, params, and hyperparameterMetricTag values to continue the hyperparameter tuning job.

Using consistent hyperparameterMetricTag name and params for similar jobs, even when the jobs have different parameters, makes it possible for Cloud Machine Learning Engine to improve optimization over time.

What's next

Send feedback about...

Cloud ML Engine for TensorFlow