Deploying a model using the Cloud Console

This page describes how to use the Google Cloud Console to deploy a model to an endpoint.

Introduction

You must deploy a model to an endpoint before that model can be used to serve online predictions; deploying a model associates physical resources with the model so it can serve online predictions with low latency. An undeployed model can serve batch predictions, which do not have the same low latency requirements.

You can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. For more information about options and use cases for deploying models, see About deploying models.

You cannot deploy a video model to an endpoint. Video models do not serve online predictions.

For help with deploying a model using the AI Platform (Unified) API, see Deploying a model using the AI Platform API.

Deploying a model

  1. In the Cloud Console, in the AI Platform section, go to the Models page.

    Go to the Models page

  2. Click the name of the model you want to deploy to open its details page.

  3. Select the Deploy & Test tab.

    If your model is already deployed to any endpoints, they are listed in the Deploy your model section.

  4. Click Deploy to endpoint.

  5. To deploy your model to a new endpoint, select Create new endpoint and provide a name for the new endpoint. To deploy your model to an existing endpoint, select Add to existing endpoint and select the endpoint from the dropdown list.

    You can add more than one model to an endpoint, and you can add a model to more than one endpoint. Learn more.

  6. If you deploy your model to an existing endpoint that already has one or more models deployed to it, you must update the Traffic split percentage for the model you are deploying and the already deployed models so that all of the percentages add up to 100%.

  7. Select your model type below to finish selecting your model settings:

    AutoML Image

    1. If you're deploying your model to a new endpoint, accept 100 for the Traffic split. Otherwise, adjust the traffic split values for all models on the endpoint so they add up to 100.

    2. Enter the Number of compute nodes you want to provide for your model.

      This is the number of nodes available to this model at all times. You are charged for the nodes, even without prediction traffic. See the pricing page.

    3. Learn how to change the default settings for prediction logging.

    4. Click Done for your model, and when all the Traffic split percentages are correct, click Continue.

      The region where your model will be deployed is displayed. This must be the region where your model was created.

    5. Click Deploy to deploy your model to the endpoint.

    AutoML Tabular

    1. If you're deploying your model to a new endpoint, accept 100 for the Traffic split. Otherwise, adjust the traffic split values for all models on the endpoint so they add up to 100.

    2. Enter the Minimum number of compute nodes you want to provide for your model.

      This is the number of nodes available to this model at all times. You are charged for the nodes used, whether to handle prediction load or for standby (minimum) nodes, even without prediction traffic. See the pricing page.

    3. Select your Machine type.

      Larger machine resources will increase your prediction performance and increase costs.

    4. Learn how to change the default settings for prediction logging.

    5. Click Done for your model, and when all the Traffic split percentages are correct, click Continue.

      The region where your model will be deployed is displayed. This must be the region where your model was created.

    6. Click Deploy to deploy your model to the endpoint.

    AutoML Text

    1. If you're deploying your model to a new endpoint, accept 100 for the Traffic split. Otherwise, adjust the traffic split values for all models on the endpoint so they add up to 100.

    2. Click Done for your model, and when all the Traffic split percentages are correct, click Continue.

      The region where your model will be deployed is displayed. This must be the region where your model was created.

    3. Click Deploy to deploy your model to the endpoint.

    AutoML Video

    You cannot deploy an AutoML video model to an endpoint.

    Custom trained

    1. If you're deploying your model to a new endpoint, accept 100 for the Traffic split. Otherwise, adjust the traffic split values for all models on the endpoint so they add up to 100.

    2. Enter the Minimum number of compute nodes you want to provide for your model.

      This is the number of nodes available to this model at all times.

      You are charged for the nodes used, whether to handle prediction load or for standby (minimum) nodes, even without prediction traffic. See the pricing page.

    3. To use autoscaling, enter the Maximum number of compute nodes you want to AI Platform to scale up to.

      The number of compute nodes can increase if needed to handle prediction traffic, but it will never go above the maximum number of nodes.

    4. Select your Machine type.

      Larger machine resources will increase your prediction performance and increase costs. Compare the available machine types.

    5. Select an Accelerator type and an Accelerator count.

      This option only displays if you enabled accelerator use when you imported or created the model.

      For the accelerator count, refer to the GPU table to check for valid numbers of GPUs that you can use with each CPU machine type.

    6. If you want use a custom service account for the deployment, select a service account in the Service account drop-down box.

    7. Learn how to change the default settings for prediction logging.

    8. Click Done for your model, and when all the Traffic split percentages are correct, click Continue.

      The region where your model will be deployed is displayed. This must be the region where your model was created.

    9. Click Deploy to deploy your model to the endpoint.

What's next