Using Hyperparameter Tuning

This page shows you how to use Cloud Machine Learning Engine hyperparameter tuning when training your model. The process involves making some changes to your TensorFlow application code and adding some configuration information when you submit your training job. You can learn more about this feature in the hyperparameter tuning overview in this documentation.

The steps involved in hyperparameter tuning

To use hyperparameter tuning with your trainer you must perform the following steps:

  1. Decide which hyperparameters you want to tune with and define names for each.

  2. Change your trainer code to:

    1. Add command-line arguments for each hyperparameter you defined in the previous step.

    2. Use the values passed for those arguments to set the value of the hyperparameters for your training trial.

    3. Add a scalar summary event to your graph's summary writer to report your target value.

    4. Append the hyperparameter tuning trial number to your output path so that your training artifacts don't get overridden by the next trial. (Note: This is only necessary if you are not using the --job-dir ML Engine argument to specify the path to your model.)

  3. Specify the hyperparameters to tune by including a HyperparameterSpec with your training job configuration data.

  4. (optional) Monitor the progress of hyperparameter tuning along with other status checks you do while the job is running.

  5. Check the final job status to see the results of hyperparameter tuning.

Select your tuning hyperparameters

Before you make any changes to your trainer code, think about which hyperparameters are the most important to tune your target value. Remember that each hyperparameter that you tune will significantly increase the amount of time the tuning job takes. Select the hyperparameters to tune carefully.

Making changes to your TensorFlow training application

You must make three changes to your trainer to use hyperparameter tuning:

  • Set up command-line arguments that the training service uses to set your hyperparameter values.

  • Add your target variable to the summary for your graph.

  • Change your trainer output to create a new subdirectory for each tuning trial. (Note: This is only necessary if you are not using the --job-dir ML Engine argument to specify the path to your model.)

Add command-line arguments for your tuning parameters

Cloud ML Engine sets command-line arguments when it calls your trainer module. Define a name for each hyperparameter argument and parse it in your trainer using whatever argument parser you prefer (typically either argparse or

You must use the same argument names when you configure your training job.

Set your hyperparameters to the values received

Once you have command-line arguments that Cloud ML Engine will use to tune your model, you need to assign those values to the hyperparameters in your graph.

Add your target variable to the summary for your graph

Cloud ML Engine looks for the value of your goal variable when the graph's summary writer is called. You have two options for this name: you can use the default or your can define your own:

To use the default target variable name

  1. In your trainer, add your variable to the summary writer with the name set to 'training/hptuning/metric'.

  2. When you start the job, do not include the hyperparameterMetricTag member of the HyperparameterSpec object you use for your job's TrainingInput.

To use your own target variable name

  1. In your trainer, add your variable to the summary writer using your desired name.
  2. Set the hyperparameterMetricTag member of the

    HyperparameterSpec object you use for your job's TrainingInput. The name must be exactly the same as that added to your summary writer.

Managing output file locations

Note: If you are using the --job-dir argument to specify to the training job where to store its model, you can skip this section as the hyperparameter tuning trial number is automatically appended to the --job-dir argument as a subdirectory when passed to the training job run as part of that trial.

You should write your trainer to output to a different subdirectory for each hyperparameter tuning trial. If you don't, each trail overwrites the previous one and you lose your data.

We recommend using a base output location with the hyperparameter tuning trial number appended to it. The trial number running on a given replica is stored in the TF_CONFIG environment variable as the trial member of the task object. The following example shows how you might construct an output path in individual replicas of your trainer.

def makeTrialOutputPath(output_path):
    For a given static output path, returns a path with
    the hyperparameter tuning trial number appended.

    Dependencies: os, json
    # Get the configuration data from the environment variable.
    env = json.loads(os.environ.get('TF_CONFIG', '{}'))

    # Get the task information.
    taskInfo = env.get('task')

    if taskInfo:

        trial = taskInfo.get('trial', '')

        if trial:
            return os.path.join(output_path, trial)

    return output_path

Specifying hyperparameter tuning configuration for a training job

With your trainer coded to handle hyperparameter tuning, you must still include the specific configuration to use when you start a training job. Configure your hyperparameter tuning information in a HyperparameterSpec object and add it to your TrainingInput object as the hyperparameters object.

You can get more details about hyperparameter types and values in the concepts page.


Add your hyperparameter configuration information to your configuration YAML file. The following example adds hyperparameter tuning configuration to the example YAML file shown in the training configuration instructions.

  scaleTier: CUSTOM
  masterType: complex_model_m
  workerType: complex_model_m
  parameterServerType: large_model
  workerCount: 9
  parameterServerCount: 3
    goal: MAXIMIZE
    maxTrials: 30
    maxParallelTrials: 1
    - parameterName: hidden1
      type: INTEGER
      minValue: 40
      maxValue: 400
      scaleType: UNIT_LINEAR_SCALE
    - parameterName: numRnnCells
      type: DISCRETE
      - 1
      - 2
      - 3
      - 4
    - parameterName: rnnCellType
      type: CATEGORICAL
      - BasicLSTMCell
      - BasicRNNCell
      - GRUCell
      - LSTMCell
      - LayerNormBasicLSTMCell


When configuring your training job in Python code, you make a dictionary representing your HyperparameterSpec and add it to your training input.

The following example assumes that you have already created a TrainingInput dictionary (in this case named training_inputs) as shown in the training job configuration how-to.

# Add hyperparameter tuning to the job config.
hyperparams = {
    'goal': 'MAXIMIZE',
    'maxTrials': 30,
    'maxParallelTrials': 1,
    'params': []}

    'minValue': 40,
    'maxValue': 400,
    'scaleType': 'UNIT_LINEAR_SCALE'})

    'discreteValues': [1, 2, 3, 4]})

    'type': 'CATEGORICAL',
    'categoricalValues': [

# Add the hyperparameter specification to the training inputs dictionary.
training_inputs['hyperparameters'] = hyperparams

# Build the job spec.
job_spec = {'jobId': my_job_name, 'trainingInput': training_inputs}

Monitoring hyperparameter tuning in progress

You can monitor hyperparameter tuning by getting the detailed status of your running training job.

The TrainingOutput object in the response's Job resource has the following values set during a training job with hyperparameter tuning:

  • isHyperparameterTuningJob set to True.

  • trials is present and contains a list of HypeparameterOutput objects, one per trial.

Getting hyperparameter tuning results

When the training runs are complete, you can call to get the results. The TrainingOutput object in the job resource contains the metrics for all runs, with the metrics for the best-tuned run identified.

Use the same detailed status request that you do to monitor the job during processing to get this information.

You'll get the results from each trial in the job description. Find the trial that yielded the most desirable value for your target variable. If it meets your standard for success of the model, you can use the hyperparameter values shown for that trial for subsequent runs of your model.

Sometimes you will find multiple trials that give identical results for your tuning metric. In such a case, you should determine which of the hyperparameter values are most advantageous by other measures. For example, if you are tuning the number of nodes in a hidden layer and you get identical results when the value is set to 8 that you do when it's set to 20, you should use 8, because more nodes means more processing and cost for no improvement in your model.

What's next

Send feedback about...

Cloud Machine Learning Engine (Cloud ML Engine)