Importing models to Vertex AI

This guide demonstrates how to import existing models that you've trained outside of Vertex AI, or that you've trained using Vertex AI and exported. After you import your model, you'll have a model in Vertex AI, which you can deploy to an endpoint and then use to request predictions.

Pre-built or custom containers

When you import a model, you associate it with a container for Vertex AI to run prediction requests. You can use pre-built containers provided by Vertex AI, or use your own custom containers that you build and push to Container Registry or Artifact Registry.

You can use a pre-built container if your model meets the following requirements:

If you are importing a tabular AutoML model that you previously exported, you must use a specific custom container provided by Vertex AI.

Otherwise, create a new custom container, or use an existing custom container that you have in Container Registry or Artifact Registry.

Upload model artifacts to Cloud Storage

You must store your model artifacts in a Cloud Storage bucket, where the bucket's region matches the regional endpoint you're using. The total file size of your model artifacts must be 10 GB or less.

If your Cloud Storage bucket is in a different Google Cloud project, you need to grant Vertex AI access to read your model artifacts.

If you're using a pre-built container, ensure that your model artifacts have filenames that exactly match the following examples:

  • TensorFlow SavedModel: saved_model.pb
  • scikit-learn: model.joblib or model.pkl
  • XGBoost: model.bst

Learn more about exporting model artifacts for prediction.

Import a model using Cloud Console

To import a model using Cloud Console:

  1. In the Cloud Console, go to the Vertex AI Models page.

    Go to the Models page

  2. Click Import.

  3. Name and region: Enter a name for your model. Select the region that matches both your bucket's region, and the Vertex AI regional endpoint you're using. Click Continue.

Depending on the type of container you are using, select the appropriate tab below.

Pre-built container

  1. Select Import model artifacts into a new pre-built container.

  2. Select the Model framework and Model framework version you used to train your model.

  3. If you want to use GPUs for serving predictions, set the Accelerator type to GPUs.

    You select the type of GPU later on, when you deploy the model to an endpoint.

  4. Specify the Cloud Storage path to the directory that contains your model artifacts.

    For example, gs://BUCKET_NAME/models/.

  5. Leave the Predict schema blank.

  6. Click Import.

    After the import has completed, your model appears on the Models page.

Custom container

  1. Select Import an existing custom container.

  2. Set the container image URI.

  3. If you want to provide model artifacts in addition to a container image, specify the Cloud Storage path to the directory that contains your model artifacts.

    For example, gs://BUCKET_NAME/models/.

  4. Specify values for any of the other fields.

    Learn more about these optional fields.

  5. Click Import.

    After the import has completed, your model appears on the Models page.

AutoML tabular container

  1. Select Import an existing custom container.

  2. In the Container image field, enter MULTI_REGION-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server-v1:latest.

    Replace MULTI_REGION with us, europe, or asia to select which Docker repository you want to pull the Docker image from. Each repository provides the same Docker image, but choosing the Artifact Registry multi-region closest to the machine where you are running Docker might reduce latency.

  3. In the Package location field, specify the Cloud Storage path to the directory that contains your model artifacts.

    The path looks similar to the following example:

    gs://BUCKET_NAME/models-MODEL_ID/tf-saved-model/TIMESTAMP/

  4. Leave all other fields blank.

  5. Click Import.

    After the import has completed, your model appears on the Models page. You can use this model just like other AutoML tabular models, except imported AutoML tabular models don't support Explainable AI.

Import a model programmatically

The following examples show how to import a model using various tools:

gcloud

The following example uses the gcloud beta ai models upload command:

gcloud beta ai models upload \
  --region=LOCATION \
  --display-name=MODEL_NAME \
  --container-image-uri=IMAGE_URI \
  --artifact-uri=PATH_TO_MODEL_ARTIFACT_DIRECTORY

Replace the following:

  • LOCATION: The region where you are using Vertex AI.
  • MODEL_NAME: A display name for the Model.
  • IMAGE_URI: The URI of the container image to use for serving predictions. For example, us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-1:latest. Use a pre-built container or a custom container.
  • PATH_TO_MODEL_ARTIFACT_DIRECTORY: The Cloud Storage URI (beginning with gs://) of a directory in Cloud Storage that contains your model artifacts.

The preceding example demonstrates all the flags necessary to import most models. If you are not using a pre-built container for prediction, then you likely need to specify some additional optional flags so that Vertex AI can use your container image. These flags, which begin with --container-, correspond to fields of your Model's containerSpec.

REST & CMD LINE

Use the following code sample to upload a model using the upload method of the model resource.

Before using any of the request data below, make the following replacements:

  • LOCATION: The region where you are using Vertex AI.
  • PROJECT_ID: Your project ID or project number.
  • MODEL_NAME: A display name for the Model.
  • MODEL_DESCRIPTION: Optional. A description for the model.
  • IMAGE_URI: The URI of the container image to use for serving predictions. For example, us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-1:latest. Use a pre-built container or a custom container.
  • PATH_TO_MODEL_ARTIFACT_DIRECTORY: The Cloud Storage URI (beginning with gs://) of a directory in Cloud Storage that contains your model artifacts. This variable and the artifactUri field are optional if you're using a custom container.
  • labels: Optional. Any set of key-value pairs to organize your models. For example:
    • "env": "prod"
    • "tier": "backend"
  • Specify the LABEL_NAME and LABEL_VALUE for any labels that you want to apply to this training pipeline.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/models:upload

Request JSON body:

{
  "model": {
    "displayName": "MODEL_NAME",
    "predictSchemata": {},
    "containerSpec": {
      "imageUri": "IMAGE_URI"
    },
    "artifactUri": "PATH_TO_MODEL_ARTIFACT_DIRECTORY",
    "labels": {
      "LABEL_NAME_1": "LABEL_VALUE_1",
      "LABEL_NAME_2": "LABEL_VALUE_2"
    }
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file called request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/models:upload

PowerShell

Save the request body in a file called request.json, and execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/models:upload" | Select-Object -Expand Content

Java


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.LocationName;
import com.google.cloud.aiplatform.v1.Model;
import com.google.cloud.aiplatform.v1.ModelContainerSpec;
import com.google.cloud.aiplatform.v1.ModelServiceClient;
import com.google.cloud.aiplatform.v1.ModelServiceSettings;
import com.google.cloud.aiplatform.v1.UploadModelOperationMetadata;
import com.google.cloud.aiplatform.v1.UploadModelResponse;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class UploadModelSample {
  public static void main(String[] args)
      throws InterruptedException, ExecutionException, TimeoutException, IOException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String modelDisplayName = "YOUR_MODEL_DISPLAY_NAME";
    String metadataSchemaUri =
        "gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml";
    String imageUri = "YOUR_IMAGE_URI";
    String artifactUri = "gs://your-gcs-bucket/artifact_path";
    uploadModel(project, modelDisplayName, metadataSchemaUri, imageUri, artifactUri);
  }

  static void uploadModel(
      String project,
      String modelDisplayName,
      String metadataSchemaUri,
      String imageUri,
      String artifactUri)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    ModelServiceSettings modelServiceSettings =
        ModelServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (ModelServiceClient modelServiceClient = ModelServiceClient.create(modelServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);

      ModelContainerSpec modelContainerSpec =
          ModelContainerSpec.newBuilder().setImageUri(imageUri).build();

      Model model =
          Model.newBuilder()
              .setDisplayName(modelDisplayName)
              .setMetadataSchemaUri(metadataSchemaUri)
              .setArtifactUri(artifactUri)
              .setContainerSpec(modelContainerSpec)
              .build();

      OperationFuture<UploadModelResponse, UploadModelOperationMetadata> uploadModelResponseFuture =
          modelServiceClient.uploadModelAsync(locationName, model);
      System.out.format(
          "Operation name: %s\n", uploadModelResponseFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      UploadModelResponse uploadModelResponse = uploadModelResponseFuture.get(5, TimeUnit.MINUTES);

      System.out.println("Upload Model Response");
      System.out.format("Model: %s\n", uploadModelResponse.getModel());
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 */

// const modelDisplayName = 'YOUR_MODEL_DISPLAY_NAME';
// const metadataSchemaUri = 'YOUR_METADATA_SCHEMA_URI';
// const imageUri = 'YOUR_IMAGE_URI';
// const artifactUri = 'YOUR_ARTIFACT_URI';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Model Service Client library
const {ModelServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const modelServiceClient = new ModelServiceClient(clientOptions);

async function uploadModel() {
  // Configure the parent resources
  const parent = `projects/${project}/locations/${location}`;
  // Configure the model resources
  const model = {
    displayName: modelDisplayName,
    metadataSchemaUri: '',
    artifactUri: artifactUri,
    containerSpec: {
      imageUri: imageUri,
      command: [],
      args: [],
      env: [],
      ports: [],
      predictRoute: '',
      healthRoute: '',
    },
  };
  const request = {
    parent,
    model,
  };

  console.log('PARENT AND MODEL');
  console.log(parent, model);
  // Upload Model request
  const [response] = await modelServiceClient.uploadModel(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Upload model response ');
  console.log(`\tModel : ${result.model}`);
}
uploadModel();

Python

This example uses the Vertex SDK for Python. Before you run the following code sample, you must set up authentication.

def upload_model_sample(
    project: str,
    location: str,
    display_name: str,
    serving_container_image_uri: str,
    artifact_uri: Optional[str] = None,
    serving_container_predict_route: Optional[str] = None,
    serving_container_health_route: Optional[str] = None,
    description: Optional[str] = None,
    serving_container_command: Optional[Sequence[str]] = None,
    serving_container_args: Optional[Sequence[str]] = None,
    serving_container_environment_variables: Optional[Dict[str, str]] = None,
    serving_container_ports: Optional[Sequence[int]] = None,
    instance_schema_uri: Optional[str] = None,
    parameters_schema_uri: Optional[str] = None,
    prediction_schema_uri: Optional[str] = None,
    explanation_metadata: Optional[explain.ExplanationMetadata] = None,
    explanation_parameters: Optional[explain.ExplanationParameters] = None,
    sync: bool = True,
):

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model.upload(
        display_name=display_name,
        artifact_uri=artifact_uri,
        serving_container_image_uri=serving_container_image_uri,
        serving_container_predict_route=serving_container_predict_route,
        serving_container_health_route=serving_container_health_route,
        instance_schema_uri=instance_schema_uri,
        parameters_schema_uri=parameters_schema_uri,
        prediction_schema_uri=prediction_schema_uri,
        description=description,
        serving_container_command=serving_container_command,
        serving_container_args=serving_container_args,
        serving_container_environment_variables=serving_container_environment_variables,
        serving_container_ports=serving_container_ports,
        explanation_metadata=explanation_metadata,
        explanation_parameters=explanation_parameters,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

Get operation status

Some requests start long-running operations that require time to complete. These requests return an operation name, which you can use to view the operation's status or cancel the operation. Vertex AI provides helper methods to make calls against long-running operations. For more information, see Working with long-running operations.

What's next