AI Platform Training release notes

This page documents production updates to AI Platform Training. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

Older AI Platform Training release notes are located in the archived Cloud ML Engine release notes.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/ai-platform-training-release-notes.xml

July 31, 2023

This legacy version of AI Platform Training is deprecated and will no longer be available on Google Cloud after January 31, 2025. Migrate your resources to Vertex AI custom training to get new machine learning features that are unavailable in AI Platform.

January 24, 2023

Runtime version 2.11 is available. You can use runtime version 2.11 to train with TensorFlow 2.11, scikit-learn 1.0.2, or XGBoost 1.6.1. Runtime version 2.11 supports training with CPUs, GPUs, or TPUs.

See the full list of updated dependencies in runtime version 2.11.

April 19, 2022

Pre-built PyTorch containers for PyTorch 1.11 are available for training. You can use these containers to train with CPUs, GPUs, or TPUs.

February 15, 2022

Runtime version 2.8 is available. You can use runtime version 2.8 to train with TensorFlow 2.8, scikit-learn 1.0.2, or XGBoost 1.5.2. Runtime version 2.8 supports training with CPUs, GPUs, or TPUs.

See the full list of updated dependencies in runtime version 2.8.

December 08, 2021

Runtime version 2.7 is available. You can use runtime version 2.7 to train with TensorFlow 2.7, scikit-learn 1.0.1, or XGBoost 1.5.0. Runtime version 2.7 supports training with CPUs, GPUs, or TPUs.

See the full list of updated dependencies in runtime version 2.7.

November 02, 2021

Using interactive shells to inspect training jobs is generally available (GA).

You can use these interactive shells with VPC Service Controls.

October 06, 2021

Runtime version 2.6 is available. You can use runtime version 2.6 to train with TensorFlow 2.6, scikit-learn 0.24.2, or XGBoost 1.4.2. Runtime version 2.6 supports training with CPUs, GPUs, or TPUs.

See the full list of updated dependencies in runtime version 2.6.

September 23, 2021

Pre-built PyTorch containers for PyTorch 1.9 are available for training. You can use these containers to train with CPUs, GPUs, or TPUs.

August 09, 2021

You can use TPU Pods for training. This feature is available in Preview.

July 19, 2021

You can now use an interactive shell to inspect your training container while it runs. The interactive shell can be helpful for monitoring and debugging training jobs.

This feature is available in preview.

June 11, 2021

March 05, 2021

AI Platform Training now provides pre-built PyTorch containers for PyTorch 1.7.

In addition to training with CPUs or GPUs, you can use one of the PyTorch 1.7 containers to perform PyTorch training with a TPU.

February 16, 2021

The default boot disk type for virtual machine instances used for training jobs has changed from pd-standard to pd-ssd. Learn more about disk types for custom training and read about pricing for different disk types.

Note that for training jobs where you don't specify a DiskConfig, pricing does not change. This is because the first 100 GB of disk for each VM do not incur any charge, regardless of disk type.

February 10, 2021

Runtime version 2.4 is now available. You can use runtime version 2.4 to train with TensorFlow 2.4.1, scikit-learn 0.24.0, or XGBoost 1.3.1. Runtime version 2.4 supports training with CPUs, GPUs, or TPUs.

See the full list of updated dependencies in runtime version 2.4.

February 01, 2021

You can now use E2, N2, and C2 machine types for training. Learn about the specific machine types available for training, and learn about their pricing.

January 28, 2021

You can now use NVIDIA A100 GPUs and several accelerator-optimized (A2) machine types for training. You must use A100 GPUs and A2 machine types together.

A100 GPUs and A2 machine types are available in preview. Learn about their pricing.

January 20, 2021

Training with a custom service account is now generally available.

Support for VPC Network Peering is now generally available.

January 15, 2021

AI Platform Training now provides pre-built PyTorch containers for PyTorch 1.6.

In addition to training with CPUs or GPUs, you can use one of the PyTorch 1.6 containers to perform PyTorch training with a TPU.

December 09, 2020

Runtime version 2.3 is now available. You can use runtime version 2.3 to train with TensorFlow 2.3.1, scikit-learn 0.23.2, or XGBoost 1.2.1. Runtime version 2.3 supports training with CPUs, GPUs, or TPUs.

See the full list of updated dependencies in runtime version 2.3.

September 22, 2020

AI Platform Training runtime version 2.2 now supports training with TPUs using TensorFlow 2.2.

August 28, 2020

Runtime version 2.2 is now available. You can use runtime version 2.2 to train with TensorFlow 2.2.0, scikit-learn 0.23.1, or XGBoost 1.1.1. See the full list of updated dependencies in runtime version 2.2.

August 17, 2020

You can now set a maximum time that you are willing to wait between the moment when you create a training job and the moment when AI Platform Training starts running the job. If your training job has not started running after this duration, AI Platform Training cancels the job. Set the maximum wait time by specifying the scheduling.maxWaitTime field.

August 14, 2020

The TabNet built-in algorithm is now available in Beta. You can train models on tabular data for classification and regression problems, and also get feature attributions to help explain the model's behavior.

Try the TabNet built-in algorithm introductory tutorial.

August 04, 2020

Read a new guide to distributed PyTorch training. You can use this guide with pre-built PyTorch containers, which are in beta.

July 20, 2020

You can now train a PyTorch model on AI Platform Training by using a pre-built PyTorch container. Pre-built PyTorch containers are available in beta.

July 13, 2020

You can now configure a training job to run using a custom service account. Using a custom service account can help you customize which Google Cloud resources your training code can access.

This feature is available in beta.

June 22, 2020

You can now use Cloud TPUs for training jobs in the europe-west4 region. TPU v2 accelerators are generally available, and TPU v3 accelerators are available in beta.

Learn how to configure your training job to use TPUs, and read about TPU pricing on AI Platform Training.

June 15, 2020

AI Platform Training now supports private services access in beta. You can use VPC Network Peering to create a private connection so that training jobs can connect to your network on private IP.

Learn how to set up VPC Network Peering with AI Platform Training.

May 21, 2020

You can now use TPUs with TensorFlow 2.1 when you create a training job with runtime version 2.1. You can also use TPUs with TensorFlow 2.1 when you train in a custom container.

Read the guide to using TPUs with AI Platform Training, which has been updated to show how to use TPUs with TensorFlow 2 APIs.

May 13, 2020

AI Platform Training now supports the following regions, in addition to those that were already supported:

  • northamerica-northeast1 (Montréal)
  • southamerica-east1 (São Paulo)
  • australia-southeast1 (Sydney)

GPUs are available for training in each of the new regions:

  • NVIDIA Tesla P4 GPUs are available in northamerica-northeast1.
  • NVIDIA Tesla T4 GPUs are available in southamerica-east1.
  • NVIDIA Tesla P4 GPUs and NVIDIA Tesla P100 GPUs are available in australia-southeast1.

See the full list of available regions and the guide to training with GPUs.

northamerica-northeast1 and southamerica-east1 have the same pricing as other Americas regions, and australia-southeast1 has the same pricing as other Asia Pacific regions. Learn about pricing for each region.

April 09, 2020

You can now specify virtual machine instances with the evaluator task type as part of your training cluster for distributed training jobs. Read more about evaluators in TensorFlow distributed training, see how to configure machine types for evaluators, and learn about using evaluators with custom containers.

The maximum running time for training jobs now defaults to seven days. If a training job is still running after this duration, AI Platform Training cancels the job.

Learn how to adjust the maximum running time for a job.

April 06, 2020

Runtime version 2.1 now includes scikit-learn 0.22.1 instead of scikit-learn 0.22.

April 03, 2020

You can now use customer-managed encryption keys (CMEK) to protect data in your AI Platform Training jobs. This feature is available in beta.

To learn about the benefits and limitations of using CMEK, and to walk through configuring CMEK for a training job, read the guide to using CMEK with AI Platform Training.

March 27, 2020

AI Platform Training now supports the following regions, in addition to those that were already supported:

  • us-west3 (Salt Lake City)
  • europe-west2 (London)
  • europe-west3 (Frankfurt)
  • europe-west6 (Zurich)
  • asia-south1 (Mumbai)
  • asia-east2 (Hong Kong)
  • asia-northeast1 (Tokyo)
  • asia-northeast2 (Osaka)
  • asia-northeast3 (Seoul)

Out of these regions, the following support training with NVIDIA Tesla T4 GPUs:

  • europe-west2
  • asia-south1
  • aisa-northeast1
  • asia-northeast3

See the full list of available regions and read about pricing for each region.

March 17, 2020

Runtime versions 1.2 through 1.9 are no longer available for training. We recommend that you use runtime version 1.14 or later for your training jobs.

Read more about the AI Platform Training policy for supporting older runtime versions. This policy is being retroactively implemented in several stages for runtime versions 1.13 and earlier.

March 09, 2020

Runtime version 2.1 for AI Platform Training is now available.

Runtime version 2.1 is the first runtime version to support TensorFlow 2. Specifically, this runtime version includes TensorFlow 2.1.0. Read the new Training with TensorFlow 2 guide to learn about important differences to consider when using AI Platform Training with TensorFlow 2, compared to TensorFlow 1.

Runtime version 2.1 is also the first runtime version not to support Python 2.7. The Python Software Foundation ended support for Python 2.7 on January 1, 2020. No AI Platform runtime versions released after January 1, 2020 support Python 2.7.

Runtime version 2.1 also updates many other dependencies; see the runtime version list for more details.

Runtime version 2.1 includes scikit-learn 0.22 rather than 0.22.1. This is a known issue, and the release notes will be updated when runtime version 2.1 includes scikit-learn 0.22.1.

When you train with runtime version 2.1 or later, AI Platform Training uses the chief task name to represent the master VM in the TF_CONFIG environment variable. This environment variable is important for distributed training with TensorFlow. For runtime version 1.15 and earlier, AI Platform Training uses the master task name instead, but this task name is not supported in TensorFlow 2.

However, by default, AI Platform Training continues to use the master task name in custom container training jobs. If you are performing multi-worker distributed training with TensorFlow 2 in a custom container, set the new trainingInput.useChiefInTfConfig field to true when you create a custom container training job in order to use the chief task name.

Learn more about this change.

March 06, 2020

The built-in linear learner algorithm and the built-in wide and deep algorithm now use TensorFlow 1.14 for training. They previously used TensorFlow 1.12.

The single-replica version of the built-in XGBoost algorithm now uses XGBoost 0.81 for training. It previously used XGBoost 0.80.

February 11, 2020

You can now set a maximum running time when you create a training job. If your training job is still running after this duration, AI Platform Training cancels the job. Set the maximum running time by specifying the scheduling.maxRunningTime field.

February 05, 2020

The GPU compatibility issue that was described in the January 7, 2020 release note has been resolved. You can now use GPUs to accelerate training on runtime version 1.15.

January 29, 2020

AI Platform Training documentation has been reorganized. The new information architecture only includes documents that are relevant to AI Platform Training.

Previously, documentation for AI Platform Training and AI Platform Prediction were grouped together. You can now view AI Platform Prediction documentation separately. Some overviews and tutorials that are relevant to both products are available in the overall AI Platform documentation.

January 28, 2020

AI Platform Training runtime version 1.15 now supports training with TPUs using TensorFlow 1.15.

January 14, 2020

The price of using NVIDIA Tesla T4 GPUs for training has changed. The following table describes the pricing change for various geographic areas:

Geographic area   Old price per hour   New price per hour  
Americas $0.9500 $0.3500
Europe $1.0300 $0.3800
Asia Pacific $1.0300 $0.3900

Read more about using GPUs for training.

January 07, 2020

Training jobs that use both runtime version 1.15 and GPUs fail due to a compatibility issue with the CuDNN library, which TensorFlow depends on.

As a workaround, do one of the following:

December 20, 2019

VPC Service Controls now supports AI Platform Training. Learn how to use a service perimeter to protect your training jobs. This functionality is in beta.

December 19, 2019

AI Platform Training now offers two built-in algorithms to train a machine learning model on image data without writing your own training code:

Both image algorithms are available in beta.

AI Platform runtime version 1.15 is now available for training. This version supports TensorFlow 1.15.0 and includes other packages as listed in the runtime version list.

Runtime version 1.15 is the first runtime version to support training using Python 3.7, instead of Python 3.5. Runtime version 1.15 also still supports Python 2.7. Learn about specifying the Python version for training.

Training with TPUs is not supported in runtime version 1.15 at this time.

December 10, 2019

Starting January 1, 2020, the Python Software Foundation will no longer support Python 2.7. Accordingly, any runtime versions released after January 1, 2020 will not support Python 2.7.

Starting on January 13, 2020, AI Platform Training and AI Platform Prediction will support each runtime version for one year after its release date. You can find the release date of each runtime version in the runtime version list.

Support for each runtime version changes according to the following schedule:

  1. Starting on the release date: You can create training jobs, batch prediction jobs, and model versions that use the runtime version.

  2. Starting 12 months after the release date: You can no longer create training jobs, batch prediction jobs, or model versions that use the runtime version.

    Existing model versions that have been deployed to AI Platform Prediction continue to function.

  3. 24 months after the release date: AI Platform Prediction automatically deletes all model versions that use the runtime version.

This policy will be applied retroactively on January 13, 2020. For example, since runtime version 1.0 was released over 24 months ago, AI Platform Training and AI Platform Prediction no longer support it. There will be a three-month grace period (until April 13, 2020) before AI Platform Prediction automatically deletes model versions that use the oldest runtime versions.

The following table describes the first two important dates that mark the end of support for runtime versions:

Date  Runtime versions affected  Change in functionality
January 13, 2020  1.0, 1.1, 1.2, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11, 1.12 You can no longer create training jobs, batch prediction jobs, or model versions using these runtime versions.
April 13, 2020  1.0, 1.1, 1.2, 1.4, 1.5, 1.6 AI Platform Prediction automatically deletes any model versions using these runtime versions.

To learn about when availability ends for every runtime version, see the runtime version list.

Starting on January 13, 2020, AI Platform Training will automatically delete the history of each training job 120 days after it is completed. A training job is considered completed when the job enters the SUCCEEDED, FAILED, or CANCELLED state.

This policy will be applied retroactively on January 13, 2020: all jobs that were completed September 15, 2019 or earlier will be deleted.

Starting on January 13, 2020, runtimeVersion and pythonVersion will become required fields when you create Job or Version resources. Previously, runtimeVersion defaulted to 1.0 and pythonVersion defaulted to 2.7.

November 27, 2019

AI Platform Training no longer supports TPUs in runtime version 1.12. You can still train using TPUs in runtime versions 1.13 and 1.14.

November 20, 2019

AI Platform Training now offers a built-in distributed XGBoost algorithm to train a machine learning model without writing your own training code. This algorithm is available in beta.

The built-in distributed XGBoost algorithm provides functionality similar to the existing single-replica version of the built-in XGBoost algorithm, but it lets you speed up training on large datasets by using multiple virtual machines in parallel. The algorithm also lets you use GPUs for training.

The built-in distributed XGBoost algorithm does not support automatic preprocessing of data.

October 28, 2019

We now recommend that you use Compute Engine machine types when you create new AI Platform Training jobs. These machine types offer the greatest flexibility for customizing the virtual CPU (vCPU), RAM, GPU, and TPU resources that your jobs use.

The older machine types available for training, which were previously referred to as "AI Platform Training machine types," are now called "legacy machine types" in the AI Platform Training documentation.

October 04, 2019

The us-west2 (Los Angeles), us-east4 (N. Virginia), and europe-north1 (Finland) regions are now available for training. You can use NVIDIA Tesla P4 GPUs for training in us-west2 and us-east4.

Read about pricing for training in these regions, including pricing for accelerators.

September 09, 2019

Runtime version 1.14 now supports training with TPUs using TensorFlow 1.14.

August 28, 2019

Training with custom containers is now generally available.

NVIDIA Tesla P4 and NVIDIA Tesla T4 GPUs are now generally available for training. Read about using GPUs for training and learn about GPU pricing.

The documentation for AI Platform Notebooks has moved to a new location.

August 26, 2019

AI Platform Training now supports using Cloud TPU devices with TPU v3 configurations to accelerate your training jobs. TPU v3 accelerators for AI Platform Training are available in beta.

Learn more about how to configure your training job to use TPU v3 accelerators and read about TPU v3 pricing.

August 16, 2019

AI Platform runtime versions 1.13 and 1.14 now include numpy 1.16.4 instead of numpy 1.16.0. View the runtime version list for the full list of packages included in runtime versions 1.13 and 1.14.

August 01, 2019

The AI Platform Training Training and Prediction documentation has been reorganized. Previously, documentation for using AI Platform Training with specific machine learning frameworks was separated into sections. You can now navigate to all Training and Prediction documentation from the AI Platform documentation home page.

July 19, 2019

AI Platform runtime version 1.14 is now available for training. This version supports TensorFlow 1.14.0 and includes other packages as listed in the runtime version list.

Training with TPUs is not supported in runtime version 1.14 at this time.

AI Platform runtime version 1.12 now supports TensorFlow 1.12.3. View the runtime version list for the full list of packages included in runtime version 1.12.

July 17, 2019

The prediction input format for the following built-in algorithms has changed:

Instead of a raw string, make sure to format each instance as a JSON with a "csv_row" key and "key" key. This "key" is useful for doing batch predictions using AI Platform Training. For online predictions, you can pass in a dummy value to the "key" key in your input JSON request. For example:

{"csv_row": "1, 2, 3, 4, 0, abc", "key" : "dummy-key"}

See the Census Income tutorial for an example.

June 19, 2019

The asia-southeast1 (Singapore) region is now available for training. You can use P4 or T4 GPUs for training in the region. Read about pricing for training in asia-southeast1, including pricing for accelerators.

June 18, 2019

Runtime version 1.13 now supports training with TPUs using TensorFlow 1.13.

Support for training with TPUs in runtime version 1.11 ended on June 6, 2019.

June 12, 2019

You can now view monitoring data for training jobs directly within the AI Platform Training Job Details page in the Cloud Console. The following charts are available:

  • CPU, GPU, and memory utilization, broken down by master, worker, and parameter servers.
  • Network usage: the rate per second of bytes sent and received.

Learn more about how to monitor resource utilization for your training jobs.

There are new options for filtering jobs within the AI Platform Training Jobs page in the Cloud Console. You can filter jobs by Type and by whether or not the job used HyperTune.

Learn more about how to filter your training jobs.

You can now view and sort hyperparameter tuning trials within the AI Platform Training Job Details page in the Cloud Console. If your training job uses hyperparameter tuning, your Job Details page includes a HyperTune trials table, where you can view metrics such as RMSE, learning rate, and training steps. You can also access logs for each trial. This table makes it easier to compare individual trials.

Learn more about how to view your hyperparameter tuning trials.

June 03, 2019

You can now create AI Platform Notebooks instances with R and core R packages installed. Learn how to install R dependencies, and read guides for using R with BigQuery in AI Platform Notebooks and using R and Python in the same notebook.

May 03, 2019

T4 GPUs are now in beta for AI Platform Training. For more information, see the guides to using GPUs, their regional availability, and their pricing.

AI Platform runtime version 1.12 now supports TensorFlow 1.12.2. View the runtime version list for the full list of packages included in runtime version 1.12.