Train a machine learning model with TensorFlow 2 on AI Platform Training by using runtime version 2.1 or later. TensorFlow 2 simplifies many APIs from TensorFlow 1. The TensorFlow documentation provides a guide to migrating TensorFlow 1 code to TensorFlow 2.
Running a training job with TensorFlow 2 on AI Platform Training follows the same process as running other custom code training jobs. However, some AI Platform Training features work differently with TensorFlow 2 compared to how they work with TensorFlow 1. This document provides a summary of these differences.
Python version support
Runtime versions 2.1 and later only support training with Python 3.7. Therefore you must use Python 3.7 to train with TensorFlow 2.
The Python Software Foundation ended support for Python 2.7 on January 1, 2020. No AI Platform runtime versions released after January 1, 2020 support Python 2.7.
TensorFlow 2 provides an updated API for distributed training. Additionally,
AI Platform Training sets the
TF_CONFIG environment variable differently in runtime
versions 2.1 and later. This section describes both changes.
To perform distributed training with multiple virtual machine (VM) instances in
TensorFlow 2, use the
In particular, we recommend that you use the Keras API together with the
or, if you specify parameter servers for your job,
However, note that TensorFlow currently only provides experimental
for these strategies.
TensorFlow expects a
to be set on each VM used for training. AI Platform Training automatically sets this
environment variable on each VM used in your training job. This lets each VM
behave differently depending on its type and it helps the VMs communicate with
In runtime version 2.1 and later, AI Platform Training no longer uses the
type in any
TF_CONFIG environment variables. Instead, your training job's
master worker is labeled with the
chief type in
TF_CONFIG environment variable. Learn more about how AI Platform Training sets the
TF_CONFIG environment variable.
Accelerators for training
If you want to train on a single VM with multiple GPUs, the best practice
is to use TensorFlow's
If you want to train using multiple VMs with GPUs, the best practice is to use
To learn how to use TPUs for training, read the guide to training with TPUs.
If you are running a hyperparameter tuning job with TensorFlow 2, you might need to adjust how your training code reports your hyperparameter tuning metric to the AI Platform Training service.
If you are training with an Estimator, you can write your metric to a summary in
the same way that you do in TensorFlow 1. If you are training with Keras, we
recommend that you use
to write a summary.
- Read about configuring the runtime version and Python version for a training job.
- Read more about configuring distributed training.
- Read more about hyperparameter tuning.