AI Platform Prediction release notes

This page documents production updates to AI Platform Prediction. You can periodically check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.

Older AI Platform Prediction release notes are located in the archived Cloud ML Engine release notes.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.

To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly: https://cloud.google.com/feeds/ai-platform-prediction-release-notes.xml

July 31, 2023

This legacy version of AI Platform Prediction is deprecated and will no longer be available on Google Cloud after January 31, 2025. Migrate your resources to Vertex AI to get new machine learning features that are unavailable in AI Platform.

November 17, 2021

Runtime version 2.7 is now available. You can use runtime version 2.7 to serve online predictions with TensorFlow 2.7.0, scikit-learn 1.0, or XGBoost 1.4.2. Runtime version 2.7 does not support batch prediction.

See the full list of updated dependencies in runtime version 2.7.

September 01, 2021

Runtime version 2.6 is now available. You can use runtime version 2.6 to serve online predictions with TensorFlow 2.6.0, scikit-learn 0.24.2, or XGBoost 1.4.2. Runtime version 2.6 does not support batch prediction.

See the full list of updated dependencies in runtime version 2.6.

June 08, 2021

Runtime version 2.5 is now available. You can use runtime version 2.5 to serve online predictions with TensorFlow 2.5.1, scikit-learn 0.24.1, or XGBoost 1.4.0. Runtime version 2.5 does not support batch prediction.

See the full list of updated dependencies in runtime version 2.5.

April 16, 2021

Runtime version 2.4 is now available. You can use runtime version 2.4 to serve online predictions with TensorFlow 2.4.1, scikit-learn 0.24.0, or XGBoost 1.3.1. Runtime version 2.4 does not support batch prediction.

See the full list of updated dependencies in runtime version 2.4.

January 20, 2021

December 16, 2020

You can now configure AI Platform Prediction to automatically scale prediction nodes for model versions that use GPUs for online prediction.

Previously, you could only configure manual scaling for model versions that use GPUs. Now, you can choose between automatic and manual scaling.

Using automatic scaling with GPUs is available in preview.

December 09, 2020

Runtime version 2.3 is now available. You can use runtime version 2.3 to serve online predictions with TensorFlow 2.3.1, scikit-learn 0.23.2, or XGBoost 1.2.1. Runtime version 2.3 does not support batch prediction.

See the full list of updated dependencies in runtime version 2.3.

November 11, 2020

The following regional endpoints are now generally available for online prediction, in addition to the regional endpoints that were already available:

  • us-east1-ml.googleapis.com
  • us-east4-ml.googleapis.com
  • us-west1-ml.googleapis.com
  • northamerica-northeast1-ml.googleapis.com
  • europe-west1-ml.googleapis.com
  • europe-west2-ml.googleapis.com
  • europe-west3-ml.googleapis.com
  • asia-northeast1-ml.googleapis.com
  • asia-southeast1-ml.googleapis.com
  • australia-southeast1-ml.googleapis.com

On some of these regional endpoints, you can use GPUs to accelerate prediction. Learn which types of GPUs are available on which regional endpoints.

Pricing for online prediction varies between regional endpoints. Read about the pricing for each regional endpoint.

October 27, 2020

You can now use use a custom container to customize how you serve predictions. To try using a custom container, read the new tutorial on serving predictions from a PyTorch model.

This feature is in preview.

Console logging (formerly also referred to as "stream logging") is now available in preview for Compute Engine (N1) machine types and in GA for legacy (MLS1) machine types.

Read a new document about using custom service accounts with custom containers or custom prediction routines.

This feature is in beta.

August 28, 2020

Runtime version 2.2 is now available. You can use runtime version 2.2 to serve online predictions with TensorFlow 2.2.0, scikit-learn 0.23.1, or XGBoost 1.1.1. Runtime version 2.2 does not currently support batch prediction.

See the full list of updated dependencies in runtime version 2.2.

August 19, 2020

You can now use runtime version 2.1 to serve online predictions using scikit-learn 0.22.1 and XGBoost 0.90.

August 18, 2020

GPUs for online prediction are now generally available. You can use GPUs to serve predictions when you create a TensorFlow model version that uses a Compute Engine (N1) machine type.

Learn which types of GPU are available on each regional endpoint.

The following regional endpoints for online prediction are now generally available:

  • us-central1-ml.googleapis.com
  • europe-west4-ml.googleapis.com
  • asia-east1-ml.googleapis.com

Using Compute Engine (N1) machine types on the global API endpoint (ml.googleapis.com) is deprecated. This functionality was previously available in beta in the us-central1 region.

To continue to use Compute Engine (N1) machine types in the us-central1 region, create a model on the us-central1-ml.googleapis.com regional endpoint, and then create model versions using that model.

July 14, 2020

VPC Service Controls now supports AI Platform Prediction. Learn how to use a service perimeter to protect online prediction. This functionality is in beta.

June 08, 2020

The Total latency chart on the Version details page of the Google Cloud Console was reporting incorrect information. This chart has now been fixed.

In some cases, this adjustment might cause latencies to appear higher than they were previously. However, the latency of models has not changed.

This affects both Compute Engine (N1) machine types and legacy (MLS1) machine types.

May 13, 2020

AI Platform Prediction now supports the following regions for batch prediction, in addition to those that were already supported:

  • northamerica-northeast1 (Montréal)
  • southamerica-east1 (São Paulo)
  • australia-southeast1 (Sydney)

See the full list of available regions.

northamerica-northeast1 and southamerica-east1 have the same pricing as other Americas regions, and australia-southeast1 has the same pricing as other Asia Pacific regions. Learn about pricing for each region.

April 29, 2020

AI Platform Prediction now supports several regional endpoints for online prediction. Regional endpoints provide additional protection against outages in other regions by isolating your model and version resources from other regions. The following regional endpoints are available in beta:

  • us-central1-ml.googleapis.com
  • europe-west4-ml.googleapis.com
  • asia-east1-ml.googleapis.com

You can use these endpoints instead of the global endpoint, ml.googleapis.com, when you use AI Platform Prediction for online prediction. Learn how to use regional endpoints for online prediction, and read about their benefits and limitations.

You can now deploy scikit-learn and XGBoost models for online prediction using Compute Engine (N1) machine types. Previously, you could only deploy TensorFlow models when you used these machine types. Learn more about ML framework support for Compute Engine (N1) machine types.

You cannot use GPUs with scikit-learn or XGBoost models, and you can only use scikit-learn and XGBoost models with Compute Engine (N1) machine types when you deploy your models and versions to a regional endpoint.

Compute Engine (N1) machine types for online prediction remain available in the beta launch stage.

The europe-west4 (Netherlands) and asia-east1 (Taiwan) regions are now available for online prediction. These regions are only available for online prediction on their respective regional endpoints, and you can only use Compute Engine (N1) machine types for online prediction in these regions.

When you deploy model versions in the europe-west4 region, you can optionally use NVIDIA Tesla P4, NVIDIA Tesla T4, or NVIDIA Tesla V100 GPUs to accelerate prediction.

When you deploy model versions in the asia-east1 region, you can optionally use NVIDIA Tesla K80 or NVIDIA Tesla P100 GPUs to accelerate prediction.

Learn more about using GPUs for online prediction, and see which GPUs are available in which regions.

Learn about the pricing for the newly available regions and GPU resources.

We recommend against using Compute Engine (N1) machine types on the AI Platform Prediction global endpoint. Instead, only use Compute Engine (N1) machine types when you deploy models and versions to a regional endpoint.

Model versions that use Compute Engine (N1) machine types and were previously deployed to the us-central1 region on the global endpoint will continue to function.

April 24, 2020

Visualization settings for AI Explanations are now available. You can customize how feature attributions are displayed for image data.

Learn more about visualizing explanations.

April 13, 2020

The pricing of Compute Engine (N1) machine types for online prediction in the us-central1 region has changed. vCPU resources now cost $0.031613 per vCPU hour and RAM now costs $0.004242 per GB hour.

Read more details about pricing.

April 09, 2020

If you deploy a model version for online prediction that uses runtime version 2.1 with a GPU, AI Platform Prediction now correctly uses TensorFlow 2.1.0 to serve predictions. Previously, AI Platform Prediction used TensorFlow 2.0.0 to serve predictions in this situation.

March 27, 2020

AI Explanations now supports XRAI, a new feature attribution method for image data.

The image tutorial has been updated to include XRAI. In the tutorial, you can deploy an image classification model using both integrated gradients and XRAI, and compare the results.

AI Explanations provides an approximation error with your explanations results.

Learn more about the approximation error and how to improve your explanations results.

AI Platform Prediction now supports the following regions for batch prediction, in addition to those that were already supported:

  • us-west3 (Salt Lake City)
  • europe-west2 (London)
  • europe-west3 (Frankfurt)
  • europe-west6 (Zurich)
  • asia-south1 (Mumbai)
  • asia-east2 (Hong Kong)
  • asia-northeast1 (Tokyo)
  • asia-northeast2 (Osaka)
  • asia-northeast3 (Seoul)

Note that asia-northeast1 was already available for online prediction.

See the full list of available regions and read about pricing for each region.

March 09, 2020

Runtime version 2.1 for AI Platform Prediction is now available.

Runtime version 2.1 is the first runtime version to support TensorFlow 2 for online and batch prediction. Specifically, this runtime version includes TensorFlow 2.1.0. Review the updated guide to exporting TensorFlow SavedModels for use with AI Platform Prediction for details about exporting compatible models by using TensorFlow 2 APIs, like Keras.

When you use runtime version 2.1 for online prediction, you can currently only deploy TensorFlow model versions. You cannot deploy scikit-learn, XGBoost, or custom prediction routine artifacts for online prediction with runtime version 2.1. For the time being, continue to use runtime version 1.15 to serve predictions from these types of models.

Runtime version 2.1 is also the first runtime version not to support Python 2.7. The Python Software Foundation ended support for Python 2.7 on January 1, 2020. No AI Platform runtime versions released after January 1, 2020 support Python 2.7.

If you deploy a model version for online prediction that uses runtime version 2.1 with a GPU, AI Platform Prediction uses TensorFlow 2.0.0 (instead of TensorFlow 2.1.0) to serve predictions. This is a known issue, and the release notes will be updated when online prediction with GPUs supports TensorFlow 2.1.0.

February 10, 2020

The known issue with using custom prediction routines together with runtime version 1.15 and Python 3.7 has been fixed. The issue was described in a January 23, 2020 release note.

You can now use custom prediction routines with runtime version 1.15 and Python 3.7.

February 05, 2020

The GPU compatibility issue that was described in the January 7, 2020 release note has been resolved. You can now use GPUs to accelerate prediction on runtime version 1.15.

January 29, 2020

AI Platform Prediction documentation has been reorganized. The new information architecture only includes documents that are relevant to AI Platform Prediction.

Previously, documentation for AI Platform Prediction and AI Platform Training were grouped together. You can now view AI Platform Training documentation separately. Some overviews and tutorials that are relevant to both products are available in the overall AI Platform documentation.

January 23, 2020

Creating an AI Platform Prediction custom prediction routine that uses runtime version 1.15 and Python 3.7 might fail due to a problem with a dependency.

As a workaround, use runtime version 1.15 with Python 2.7 or use a different runtime version when you create your model version.

January 22, 2020

AI Explanations no longer supports AI Platform Prediction runtime version 1.13. AI Explanations now supports runtime versions 1.14 and 1.15. Learn more about AI Platform Prediction runtime versions supported by AI Explanations.

January 15, 2020

The price of using NVIDIA Tesla T4 GPUs for online prediction has changed from $0.9500 per hour to $0.3500 per hour.

GPUs for online prediction are currently only available when you deploy your model in the us-central1 region and use a Compute Engine (N1) machine type.

January 07, 2020

Model versions that use both runtime version 1.15 and GPUs fail due to a compatibility issue with the CuDNN library, which TensorFlow depends on.

As a workaround, do one of the following:

December 19, 2019

AI Platform runtime version 1.15 is now available for prediction. This version supports TensorFlow 1.15.0 and includes other packages as listed in the runtime version list.

Runtime version 1.15 is the first runtime version to support serving predictions using Python 3.7, instead of Python 3.5. Runtime version 1.15 also still supports Python 2.7. Learn about specifying the Python version for prediction.

December 10, 2019

Starting January 1, 2020, the Python Software Foundation will no longer support Python 2.7. Accordingly, any runtime versions released after January 1, 2020 will not support Python 2.7.

Starting on January 13, 2020, AI Platform Training and AI Platform Prediction will support each runtime version for one year after its release date. You can find the release date of each runtime version in the runtime version list.

Support for each runtime version changes according to the following schedule:

  1. Starting on the release date: You can create training jobs, batch prediction jobs, and model versions that use the runtime version.

  2. Starting 12 months after the release date: You can no longer create training jobs, batch prediction jobs, or model versions that use the runtime version.

    Existing model versions that have been deployed to AI Platform Prediction continue to function.

  3. 24 months after the release date: AI Platform Prediction automatically deletes all model versions that use the runtime version.

This policy will be applied retroactively on January 13, 2020. For example, since runtime version 1.0 was released over 24 months ago, AI Platform Training and AI Platform Prediction no longer support it. There will be a three-month grace period (until April 13, 2020) before AI Platform Prediction automatically deletes model versions that use the oldest runtime versions.

The following table describes the first two important dates that mark the end of support for runtime versions:

Date  Runtime versions affected   Change in functionality  
January 13, 2020   1.0, 1.1, 1.2, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11, 1.12  You can no longer create training jobs, batch prediction jobs, or model versions using these runtime versions.
April 13, 2020   1.0, 1.1, 1.2, 1.4, 1.5, 1.6  AI Platform Prediction automatically deletes any model versions using these runtime versions.

To learn about when availability ends for every runtime version, see the runtime version list.

Starting on January 13, 2020, runtimeVersion and pythonVersion will become required fields when you create Job or Version resources. Previously, runtimeVersion defaulted to 1.0 and pythonVersion defaulted to 2.7.

December 03, 2019

You cannot enable request-response logging for AI Platform Prediction when you create a model version. Instead, you must first create a model version without request-response logging enabled, then enable request-response logging by sending a projects.models.versions.patch request to the REST API.

November 20, 2019

AI Explanations now offers feature attributions through AI Platform Prediction. This feature is available in Beta. To gain more insight on your model's predictions, you can use feature attributions based on the sampled Shapley and integrated gradients methods. Try the example notebooks to get started, and refer to the AI Explainability Whitepaper to learn more.

October 24, 2019

Many Compute Engine (N1) machine types are now available for online prediction in beta, in addition to the existing legacy (MLS1) machine types. When you create a model version with a Compute Engine machine type, you can allocate virtual machines with more vCPU and memory resources for your online prediction nodes, improving throughput of predictions or reducing latency. Additionally, you can use GPUs with these new machine types and deploy TensorFlow models up to 2 GB in size. The machine types are currently only available in the us-central1 region.

Learn more about the features, limitations, and usage of Compute Engine (N1) machine types. Model versions that use Compute Engine (N1) machine types, including with GPUs, are available at no charge until November 14, 2019. Read about the pricing for these machine types that goes into effect on November 14, 2019.

Model versions that use one of the new Compute Engine (N1) machine types and scale to use more than 40 prediction nodes may exhibit high latency when handling online prediction requests. In this case, AI Platform Prediction may also drop requests.

For the best performance until this issue is resolved, do not scale your model version to use more than 40 nodes.

The default max model size for model versions that use a legacy (MLS1) machine type has increased from 250 MB to 500 MB.

October 04, 2019

The us-west2 (Los Angeles), us-east4 (N. Virginia), and europe-north1 (Finland) regions are now available for batch prediction. Note that us-east4 was already available for online prediction.

Additionally, the us-west1 (Oregon) and europe-west4 (Netherlands) regions, which were already available for training, are now available for batch prediction.

Read about pricing for batch prediction in these regions.

September 16, 2019

The What-If Tool can be used to inspect models deployed on AI Platform Prediction, and to compare two models. Learn how to use the What-If Tool with AI Platform Prediction.

September 06, 2019

When you deploy a model version to AI Platform Prediction, you can now configure AI Platform Prediction to log a sample of online prediction requests received by the model together with the responses it sends to these requests. AI Platform Prediction saves these request-response pairs to BigQuery. This feature is in beta.

Learn how to how to enable request-response logging and read about the configuration options for this type of logging.

August 28, 2019

The documentation for AI Platform Notebooks has moved to a new location.

August 22, 2019

Continuous evaluation for AI Platform Prediction is now available in beta. When you create a continuous evaluation job, AI Platform Data Labeling Service assigns human reviewers to provide ground truth labels for a portion of your model version's online predictions; alternatively, you can provide your own ground truth labels. Then Data Labeling Service compares these labels to your model version's predictions to calculate daily evaluation metrics.

Learn more about continuous evaluation.

August 16, 2019

AI Platform runtime versions 1.13 and 1.14 now include numpy 1.16.4 instead of numpy 1.16.0. View the runtime version list for the full list of packages included in runtime versions 1.13 and 1.14.

August 01, 2019

The AI Platform Prediction Training and Prediction documentation has been reorganized. Previously, documentation for using AI Platform Prediction with specific machine learning frameworks was separated into sections. You can now navigate to all Training and Prediction documentation from the AI Platform documentation home page.

July 19, 2019

AI Platform runtime version 1.14 is now available for prediction. This version supports TensorFlow 1.14.0 and includes other packages as listed in the runtime version list.

AI Platform runtime version 1.12 now supports TensorFlow 1.12.3. View the runtime version list for the full list of packages included in runtime version 1.12.

July 17, 2019

The prediction input format for the following built-in algorithms has changed:

Instead of a raw string, make sure to format each instance as a JSON with a "csv_row" key and "key" key. This "key" is useful for doing batch predictions using AI Platform Prediction. For online predictions, you can pass in a dummy value to the "key" key in your input JSON request. For example:

{"csv_row": "1, 2, 3, 4, 0, abc", "key" : "dummy-key"}

See the Census Income tutorial for an example.

June 19, 2019

The asia-southeast1 (Singapore) region is now available for batch prediction.

June 05, 2019

You can now specify a service account for your model version to use when you deploy a custom prediction routine to AI Platform Prediction. This feature is in beta.

May 03, 2019

AI Platform runtime version 1.12 now supports TensorFlow 1.12.2. View the runtime version list for the full list of packages included in runtime version 1.12.

April 25, 2019

AI Platform Prediction now supports custom prediction routines in beta. Custom prediction routines let you provide AI Platform Prediction with custom code to use when it serves online predictions from your deployed model. This can be useful for preprocessing prediction input, postprocessing your model's predictions, and more.

Work through a tutorial on deploying a custom prediction routine with Keras or one on deploying a custom prediction routine with scikit-learn.

AI Platform Prediction now supports custom transformers for scikit-learn pipelines in beta. This lets you provide AI Platform Prediction with custom code to use during online prediction. Your deployed scikit-learn pipeline uses this code when it serves predictions.

Work through a tutorial on training and deploying a custom scikit-learn pipeline.

AI Platform Prediction now supports logging of your prediction nodes' stderr and stdout streams to Stackdriver logging during online prediction. Stream logging is in beta. You can enable this type of logging in addition to—or in place of—the access logging that was already available. It can be useful for understanding how your deployment handles prediction requests.