Deploying models

This page explains how to deploy your model to AI Platform Prediction to get predictions.

In order to deploy your trained model on AI Platform Prediction, you must:

  • Upload your saved model to a Cloud Storage bucket.
  • Create an AI Platform Prediction model resource.
  • Create an AI Platform Prediction version resource, specifying the Cloud Storage path to your saved model.

Before you begin

Train your machine learning model and follow the guide to exporting models for prediction to create model artifacts that can be deployed to AI Platform Prediction.

Store your model in Cloud Storage

Generally, it is easiest to use a dedicated Cloud Storage bucket in the same project you're using for AI Platform Prediction.

If you're using a bucket in a different project, you must ensure that your AI Platform Prediction service account can access your model in Cloud Storage. Without the appropriate permissions, your request to create an AI Platform Prediction model version fails. See more about granting permissions for storage.

Set up your Cloud Storage bucket

This section shows you how to create a new bucket. You can use an existing bucket, but it must be in the same region where you plan on running AI Platform jobs. Additionally, if it is not part of the project you are using to run AI Platform Prediction, you must explicitly grant access to the AI Platform Prediction service accounts.

  1. Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.

    BUCKET_NAME="YOUR_BUCKET_NAME"

    For example, use your project name with -aiplatform appended:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    BUCKET_NAME=${PROJECT_ID}-aiplatform
  2. Check the bucket name that you created.

    echo $BUCKET_NAME
  3. Select a region for your bucket and set a REGION environment variable.

    Use the same region where you plan on running AI Platform Prediction jobs. See the available regions for AI Platform Prediction services.

    For example, the following code creates REGION and sets it to us-central1:

    REGION=us-central1
  4. Create the new bucket:

    gcloud storage buckets create gs://$BUCKET_NAME --location=$REGION

Upload the exported model to Cloud Storage

The following examples show how to upload different types of model artifacts to a model directory in Cloud Storage:

TensorFlow SavedModel

SAVED_MODEL_DIR=$(ls ./YOUR_EXPORT_DIR_BASE | tail -1)
gcloud storage cp $SAVED_MODEL_DIR gs://YOUR_BUCKET --recursive

When you export a SavedModel from tf.keras or from a TensorFlow estimator, it gets saved as a timestamped subdirectory of a base export directory that you choose, like YOUR_EXPORT_DIR_BASE/1487877383942. This example shows how to upload the directory with the most recent timestamp. If you created your SavedModel in a different way, it may be in a different location on your local filesystem.

scikit-learn or XGBoost model file

Depending on how you exported your trained model, upload your model.joblib, model.pkl, or model.bst file.

The following example shows how to upload a file exported by sklearn.externals.joblib:

gcloud storage cp ./model.joblib gs://YOUR_BUCKET/model.joblib

The following example shows how to upload a file exported by Python's pickle module:

gcloud storage cp ./model.pkl gs://YOUR_BUCKET/model.pkl

The following example shows how to upload a file exported by xgboost.Booster's save_model method:

gcloud storage cp ./model.bst gs://YOUR_BUCKET/model.bst

If you are deploying a custom prediction routine (beta), upload any additional model artifacts to your model directory as well.

The total file size of your model directory must be 500 MB or less if you use a legacy (MLS1) machine type or 10 GB or less if you use a Compute Engine (N1) machine type. Learn more about machine types for online prediction.

When you create subsequent versions of your model, organize them by placing each one into its own separate directory within your Cloud Storage bucket.

Upload custom code

If you are deploying a scikit-learn pipeline with custom code or a custom prediction routine, you must also upload the source distribution package containing your custom code. For example:

gcloud storage cp dist/my_custom_code-0.1.tar.gz gs://YOUR_BUCKET/my_custom_code-0.1.tar.gz

You may upload this tarball to the same directory in Cloud Storage as your model file, but you don't have to. In fact, keeping them separate may provide better organization, especially if you deploy many versions of your model and code.

Test your model with local predictions

You can use the gcloud ai-platform local predict command to test how your model serves predictions before you deploy it to AI Platform Prediction. The command uses dependencies in your local environment to perform prediction and returns results in the same format that gcloud ai-platform predict uses when it performs online predictions. Testing predictions locally can help you discover errors before you incur costs for online prediction requests.

For the --model-dir argument, specify a directory containing your exported machine learning model, either on your local machine or in Cloud Storage. For the --framework argument, specify tensorflow, scikit-learn, or xgboost. You cannot use the gcloud ai-platform local predict command with a custom prediction routine.

The following example shows how to perform local prediction:

gcloud ai-platform local predict --model-dir LOCAL_OR_CLOUD_STORAGE_PATH_TO_MODEL_DIRECTORY/ \
  --json-instances LOCAL_PATH_TO_PREDICTION_INPUT.JSON \
  --framework NAME_OF_FRAMEWORK

Deploy models and versions

AI Platform Prediction organizes your trained models using model and version resources. An AI Platform Prediction model is a container for the versions of your machine learning model.

To deploy a model, you create a model resource in AI Platform Prediction, create a version of that model, then link the model version to the model file stored in Cloud Storage.

Create a model resource

AI Platform Prediction uses model resources to organize different versions of your model.

You must decide at this time whether you want model versions belonging to this this model to use a regional endpoint or the global endpoint. In most cases, choose a regional endpoint. If you need functionality that is only available on legacy (MLS1) machine types, then use the global endpoint.

You must also decide at this time if you want model versions belonging to this model to export any logs when they serve predictions. The following examples do not enable logging. Learn how to enable logging.

console

  1. Open the AI Platform Prediction Models page in the Google Cloud console:

    Go to the Models page

  2. Click the New Model button at the top of the Models page. This brings you to the Create model page.

  3. Enter a unique name for your model in the Model name field.

  4. When the Use regional endpoint checkbox is selected, AI Platform Prediction uses a regional endpoint. To use the global endpoint instead, clear the Use regional endpoint checkbox.

  5. From the Region drop-down list, select a location for your prediction nodes. The available regions differ depending on whether you use a regional endpoint or the global endpoint.

  6. Click Create.

  7. Verify that you have returned to the Models page, and that your new model appears in the list.

gcloud

Regional endpoint

Run the following command:

gcloud ai-platform models create MODEL_NAME \
  --region=REGION

Replace the following:

If you don't specify the --region flag, then the gcloud CLI prompts you to select a regional endpoint (or to use us-central on the global endpoint).

Alternatively, you can set the ai_platform/region property to a specific region in order to make sure the gcloud CLI always uses the corresponding regional endpoint for AI Platform Prediction, even when you don't specify the --region flag. (This configuration doesn't apply to commands in the gcloud ai-platform operations command group.)

Global endpoint

Run the following command:

gcloud ai-platform models create MODEL_NAME \
  --regions=REGION

Replace the following:

If you don't specify the --regions flag, then the gcloud CLI prompts you to select a regional endpoint (or to use us-central1 on the global endpoint).

REST API

Regional endpoint

  1. Format your request by placing the model object in the request body. At minimum, specify a name for your model by replacing MODEL_NAME in the following sample:

    {
      "name": "MODEL_NAME"
    }
    
  2. Make a REST API call to the following URL, replacing PROJECT_ID with your Google Cloud project ID:

    POST https://REGION-ml.googleapis.com/v1/projects/PROJECT_ID/models/
    

    Replace the following:

    For example, you can make the following request using the curl command. This command authorizes the request using the credentials associated with your Google Cloud CLI installation.

    curl -X POST -H "Content-Type: application/json" \
      -d '{"name": "MODEL_NAME"}' \
      -H "Authorization: Bearer `gcloud auth print-access-token`" \
      "https://REGION-ml.googleapis.com/v1/projects/PROJECT_ID/models"
    

    The API returns a response similar to the following:

    {
      "name": "projects/PROJECT_ID/models/MODEL_NAME",
      "regions": [
        "REGION"
      ]
    }
    

Global endpoint

  1. Format your request by placing the model object in the request body. At minimum, specify a name for your model by replacing MODEL_NAME in the following sample, and specify a region by replacing REGION with a region that supports legacy (MLS1) machine types:

    {
      "name": "MODEL_NAME",
      "regions": ["REGION"]
    }
    
  2. Make a REST API call to the following URL, replacing PROJECT_ID with your Google Cloud project ID:

    POST https://ml.googleapis.com/v1/projects/PROJECT_ID/models/
    

    For example, you can make the following request using the curl command. This command authorizes the request using the credentials associated with your Google Cloud CLI installation.

    curl -X POST -H "Content-Type: application/json" \
      -d '{"name": "MODEL_NAME", "regions": ["REGION"]}' \
      -H "Authorization: Bearer `gcloud auth print-access-token`" \
      "https://ml.googleapis.com/v1/projects/PROJECT_ID/models"
    

    The API returns a response similar to the following:

    {
      "name": "projects/PROJECT_ID/models/MODEL_NAME",
      "regions": [
        "REGION"
      ]
    }
    

See the AI Platform Prediction model API for more details.

Create a model version

Now you are ready to create a model version with the trained model you previously uploaded to Cloud Storage. When you create a version, you can specify a number of parameters. The following list describes common parameters, some of which are required:

  • name: must be unique within the AI Platform Prediction model.
  • deploymentUri: the path to your model directory in Cloud Storage.

    • If you're deploying a TensorFlow model, this is a SavedModel directory.
    • If you're deploying a scikit-learn or XGBoost model, this is the directory containing your model.joblib, model.pkl, or model.bst file.
    • If you're deploying a custom prediction routine, this is the directory containing all your model artifacts. The total size of this directory must be 500 MB or less.
  • framework: TENSORFLOW, SCIKIT_LEARN, or XGBOOST. Omit this parameter if you're deploying a custom prediction routine.

  • runtimeVersion: a runtime version based on the dependencies your model needs. If you're deploying a scikit-learn model, an XGBoost model, or a custom prediction routine, this must be at least 1.4. If you plan to use the model version for batch prediction, then you must use runtime version 2.1 or earlier.

  • packageUris (optional): a list of paths to your custom code distribution packages (.tar.gz files) in Cloud Storage. Only provide this parameter if you are deploying a scikit-learn pipeline with custom code (beta) or a custom prediction routine (beta).

  • predictionClass (optional): the name of your Predictor class in module_name.class_name format. Only provide this parameter if you are deploying a custom prediction routine (beta).

  • serviceAccount (optional): You may specify a service account for your model version to use if it accesses Google Cloud resources while serving predictions. Learn more about specifying a service account. Only provide this parameter if you are using a custom container or a custom prediction routine.

  • pythonVersion: must be set to "3.5" (for runtime versions 1.4 through 1.14) or "3.7" (for runtime versions 1.15 and later) to be compatible with model files exported using Python 3. Can also be set to "2.7" if used with runtime version 1.15 or earlier.

  • machineType (optional): the type of virtual machine that AI Platform Prediction uses for the nodes that serve predictions. Learn more about machine types. If not set, this defaults to n1-standard-2 on regional endpoints and mls1-c1-m2 on the global endpoint.

See more information about each of these parameters, as well as additional less common parameters, in the API reference for the version resource.

Additionally, if you created your model on a regional endpoint, make sure to also create the version on the same regional endpoint.

console

  1. Open the AI Platform Prediction Models page in the Google Cloud console:

    Go to the Models page

  2. On the Models page, select the name of the model resource you would like to use to create your version. This brings you to the Model Details page.

  3. Click the New Version button at the top of the Model Details page. This brings you to the Create version page.

  4. Enter your version name in the Name field. Optionally, enter a description for your version in the Description field.

  5. Enter the following information about how you trained your model in the corresponding dropdown boxes:

  6. Select a Machine type to run online prediction.

  7. In the Model URI field, enter the Cloud Storage bucket location where you uploaded your model file. You may use the Browse button to find the correct path.

    Make sure to specify the path to the directory containing the file, not the path to the model file itself. For example, use gs://your_bucket_name/model-dir/ instead of gs://your_bucket_name/model-dir/saved_model.pb or gs://your_bucket_name/model-dir/model.pkl.

  8. If you are deploying a scikit-learn pipeline with custom code (beta) or a custom prediction routine (beta), provide the Cloud Storage path to any custom code packages (.tar.gz) under Custom code and dependencies. If you are are deploying a custom prediction routine, enter the name of your Predictor class in the Prediction class field.

  9. Select a Scaling option for online prediction deployment:

    • If you select "Auto scaling", the optional Minimum number of nodes field displays. You can enter the minimum number of nodes to keep running at all times, when the service has scaled down.

    • If you select "Manual scaling", you must enter the Number of nodes you want to keep running at all times.

    Learn how scaling options differ depending on machine type.

    Learn more about pricing for prediction costs.

  10. To finish creating your model version, click Save.

gcloud

  1. Set environment variables to store the path to the Cloud Storage directory where your model binary is located, your model name, your version name and your framework choice.

    When you create a version with the gcloud CLI, you may provide the framework name in capital letters with underscores (for example, SCIKIT_LEARN) or in lowercase letters with hyphens (for example, scikit-learn). Both options lead to identical behavior.

    Replace [VALUES_IN_BRACKETS] with the appropriate values:

    MODEL_DIR="gs://your_bucket_name/"
    VERSION_NAME="[YOUR-VERSION-NAME]"
    MODEL_NAME="[YOUR-MODEL-NAME]"
    FRAMEWORK="[YOUR-FRAMEWORK_NAME]"
    

    For a scikit-learn pipeline with custom code (beta), set an additional variable with the path to your custom code tarball:

    MODEL_DIR="gs://your_bucket_name/"
    VERSION_NAME="[YOUR-VERSION-NAME]"
    MODEL_NAME="[YOUR-MODEL-NAME]"
    FRAMEWORK="scikit-learn"
    CUSTOM_CODE_PATH="gs://your_bucket_name/my_custom_code-0.1.tar.gz"
    

    For a custom prediction routine (beta), omit the FRAMEWORK variable and set additional variables with the path to your custom code tarball and the name of your predictor class:

    MODEL_DIR="gs://your_bucket_name/"
    VERSION_NAME="[YOUR-VERSION-NAME]"
    MODEL_NAME="[YOUR-MODEL-NAME]"
    CUSTOM_CODE_PATH="gs://your_bucket_name/my_custom_code-0.1.tar.gz"
    PREDICTOR_CLASS="[MODULE_NAME].[CLASS_NAME]"
    
  2. Create the version:

    gcloud ai-platform versions create $VERSION_NAME \
      --model=$MODEL_NAME \
      --origin=$MODEL_DIR \
      --runtime-version=2.11 \
      --framework=$FRAMEWORK \
      --python-version=3.7 \
      --region=REGION \
      --machine-type=MACHINE_TYPE
    

    Replace the following:

    • REGION: The region of the regional endpoint on which you created the model. If you created the model on the global endpoint, omit the --region flag.

    • MACHINE_TYPE: A machine type, determining the computing resources available to your prediction nodes.

    For a scikit-learn pipeline with custom code (beta), use the gcloud beta component and make sure to set the --package-uris flag. To deploy custom code, your model must use the global endpoint.

    gcloud components install beta
    
    gcloud beta ai-platform versions create $VERSION_NAME \
      --model=$MODEL_NAME \
      --origin=$MODEL_DIR \
      --runtime-version=2.11 \
      --framework=$FRAMEWORK \
      --python-version=3.7 \
      --machine-type=mls1-c1-m2 \
      --package-uris=$CUSTOM_CODE_PATH
    

    For a custom prediction routine (beta), use the gcloud beta component, omit the --framework flag, and set the --package-uris and --prediction-class flags. To deploy custom code, your model must use the global endpoint.

    gcloud components install beta
    
    gcloud beta ai-platform versions create $VERSION_NAME \
      --model=$MODEL_NAME \
      --origin=$MODEL_DIR \
      --runtime-version=2.11 \
      --python-version=3.7 \
      --machine-type=mls1-c1-m2 \
      --package-uris=$CUSTOM_CODE_PATH \
      --prediction-class=$PREDICTOR_CLASS
    

    Creating the version takes a few minutes. When it is ready, you should see the following output:

    Creating version (this might take a few minutes)......done.
  3. Get information about your new version:

    gcloud ai-platform versions describe $VERSION_NAME \
      --model=$MODEL_NAME
    

    You should see output similar to this:

    createTime: '2018-02-28T16:30:45Z'
    deploymentUri: gs://your_bucket_name
    framework: [YOUR-FRAMEWORK-NAME]
    machineType: mls1-c1-m2
    name: projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions/[YOUR-VERSION-NAME]
    pythonVersion: '3.7'
    runtimeVersion: '2.11'
    state: READY

REST API

  1. Format your request body to contain the version object. This example specifies the version name, deploymentUri, runtimeVersion, framework and machineType. Replace [VALUES_IN_BRACKETS] with the appropriate values:

    {
      "name": "[YOUR-VERSION-NAME]",
      "deploymentUri": "gs://your_bucket_name/",
      "runtimeVersion": "2.11",
      "framework": "[YOUR_FRAMEWORK_NAME]",
      "pythonVersion": "3.7",
      "machineType": "[YOUR_MACHINE_TYPE]"
    }
    
  2. Make your REST API call to the following path, replacing [VALUES_IN_BRACKETS] with the appropriate values:

    POST https://REGION-ml.googleapis.com/v1/projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions
    

    Replace REGION with the region of the regional endpoint where you created your model. If you created your model on the global endpoint, useml.googleapis.com.

    For example, you can make the following request using the curl command:

    curl -X POST -H "Content-Type: application/json" \
      -d '{"name": "[YOUR-VERSION-NAME]", "deploymentUri": "gs://your_bucket_name/", "runtimeVersion": "2.11", "framework": "[YOUR_FRAMEWORK_NAME]", "pythonVersion": "3.7", "machineType": "[YOUR_MACHINE_TYPE]"}' \
      -H "Authorization: Bearer `gcloud auth print-access-token`" \
      "https://REGION-ml.googleapis.com/v1/projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions"
    

    Creating the version takes a few minutes. When it is ready, you should see output similar to this:

    {
      "name": "projects/[YOUR-PROJECT-ID]/operations/create_[YOUR-MODEL-NAME]_[YOUR-VERSION-NAME]-[TIMESTAMP]",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.ml.v1.OperationMetadata",
        "createTime": "2018-07-07T02:51:50Z",
        "operationType": "CREATE_VERSION",
        "modelName": "projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]",
        "version": {
          "name": "projects/[YOUR-PROJECT-ID]/models/[YOUR-MODEL-NAME]/versions/[YOUR-VERSION-NAME]",
          "deploymentUri": "gs://your_bucket_name",
          "createTime": "2018-07-07T02:51:49Z",
          "runtimeVersion": "2.11",
          "framework": "[YOUR_FRAMEWORK_NAME]",
          "machineType": "[YOUR_MACHINE_TYPE]",
          "pythonVersion": "3.7"
        }
      }
    }