Serve Spark ML models using Vertex AI

Last reviewed 2023-07-11 UTC

Data scientists and machine learning (ML) engineers often require a serving architecture that is fast enough to meet the needs of generating online (or realtime) predictions from their ML models. Vertex AI is capable of meeting this need.

Using Vertex AI, you can serve models from a variety of ML frameworks. For frameworks like TensorFlow, PyTorch, XGBoost, and scikit-learn, Vertex AI provides prebuilt containers in which to run those ML models. If you aren't already using any of those ML frameworks, you'll have to create your own custom container for Vertex AI to use.

This document is for those users who need to create a custom container to serve their Spark ML models. Included in this document is both a description of the serving architecture needed for custom containers and a reference implementation that demonstrates this architecture for a Spark MLib model.

To get the most out of the reference implementation portion of this document, you should be familiar with exporting Spark MLlib models to MLeap format, understand how to use Vertex AI for serving predictions, and have experience using container images.


While prebuilt containers are available for some ML frameworks, users of other ML frameworks, such as Spark, need to build custom containers in which Vertex AI can run predictions. The following diagram illustrates the serving architecture that you need to serve Spark MLib models and other models that require a custom container:

The serving architecture for the model that is used in the document.

This architecture includes the following components:

  • Cloud Storage: provides storage for the model artifacts needed to run your model. For the Spark ML model used in the accompanying reference implementation, the model artifacts consist of an MLeap Bundle and a model schema.
  • Cloud Build: uses the builder image to build a custom container image called the serving container image. The build process compiles and packages the model serving code, builds the serving container image, and then pushes the serving container image to Artifact Registry.
  • Artifact Registry: contains the following objects:
    • The scala-sbt builder container image that Cloud Build uses to build the serving container image.
    • The serving container image that is built by Cloud Build.
  • Vertex AI: contains the ML model that has been uploaded from Cloud Storage. The uploaded model is configured with the location of the model artifacts within Cloud Storage and the location of the serving container image within Artifact Registry. Vertex AI also includes an endpoint to which the model has been deployed. When the model has been deployed to the endpoint, Vertex AI associates physical resources with the model so that model can serve online predictions.

As part of implementing this serving architecture, you would need to export your ML model for use by other applications and define your own serving container image. The reference implementation provided in this document provides the code used to define and build the serving container image. This code also includes the model artifacts for a previously exported Spark ML model. With some configuration changes, you could use this reference implementation to serve your own Spark ML models.

However, you can implement this serving architecture on your own and not use the reference implementation. If you decide to implement your own architecture, you would need to do the following:

  • Export your model so that it can be used by other applications. This process depends on the ML frameworks and tools that you are using. For example, you might choose to export your Spark MLlib models by creating an MLeap bundle as described in the reference implementation. You can see other examples of how to export models in Export model artifacts for prediction.
  • Design your serving container image to meet the custom container requirements that make that image compatible with Vertex AI. The code can be in the programming language of your choice.
  • Package up the code in a package file format compatible with the programming language that you used. For instance, you can use a JAR file for Java code or a Python wheel for Python code.
  • Create a custom container image that is capable of serving your custom mode code.

Reference implementation

The following reference implementation serves a Spark MLib model that predicts the species of iris based upon the length and width of the flower's sepals and petals.

You can find the model that is used in this implementation in the example_model directory in the vertex-ai-spark-ml-serving.git repository. The directory contains the model artifacts that are used by the serving container to run predictions, and includes the following files:

  • The example_model/ file is a logistic regression model that is built using Spark MLlib, has been trained using the Iris dataset, and has been converted to an MLeap Bundle. The model predicts the species of an iris flower by using the length and widths of the flower's sepals and petals.
  • The example_model/schema.json file is a JSON file that describes the model schema. The model schema describes the expected input fields for prediction instances and output fields for prediction results that are required for the MLeap schema.

Use your own Mlib model

To use your own model with this reference implementation, first make sure that your Spark MLlib model has been exported to an MLeap Bundle. Then to serve your Spark MLib model, you must provide the appropriate model artifacts: the MLeap Bundle and the model schema.

MLeap Bundle

The serving container determines the location of the MLeap Bundle by using the AIP_STORAGE_URI environment variable that is passed from Vertex AI to the container on startup. The value of the AIP_STORAGE_URI variable is specified when you upload the model to Vertex AI.

Model schema

The model schema describes a model's input features and prediction output. The model schema is represented using JSON data. The following is the schema used in this reference implementation to predict the species of iris based upon the flower's length and width of its sepals and petals:

  "input": [
      "name": "sepal_length",
      "type": "FLOAT"
      "name": "sepal_width",
      "type": "FLOAT"
      "name": "petal_length",
      "type": "FLOAT"
      "name": "petal_width",
      "type": "FLOAT"
  "output": [
      "name": "probability",
      "type": "DOUBLE",
      "struct": "VECTOR"

In the example schema, the input array contains the input fields (columns) to the model while the output array contains the output fields (columns) to be returned from the model. In both arrays, each object of the array contains the following properties:

  • name: The field (column) name.
  • type: The field (column) type. Valid types include BOOLEAN, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT, and STRING.
  • (optional) struct: The field structure, such as a scalar or array. Valid structures include BASIC (scalar type), ARRAY (Spark Array), and VECTOR (Spark DenseVector). BASIC is used if the struct field is not present.

To pass your model schema to the serving container, you can use one of the following methods:

  • Specify the JSON data that defines the schema in the MLEAP_SCHEMA environment variable. The MLEAP_SCHEMA environment variable should contain the JSON data itself, and not a path to a file that contains the JSON schema.
  • Store the JSON data in a file called schema.json, and make this file available to the container at ${AIP_STORAGE_URI}/schema.json. This is the method that is used for the example MLib model provided with this documentation.

If you use both methods to pass the model schema to the serving container, the JSON data that is stored in the MLEAP_SCHEMA environment variable takes precedence.


This reference implementation uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator.

When you finish this reference implementation, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Enable the Vertex AI, Cloud Build, Cloud Storage, and Artifact Registry APIs.

    Enable the APIs

  4. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  5. Find your project ID and set it in Cloud Shell.
    gcloud config set project ${PROJECT_ID}

    Replace YOUR_PROJECT_ID with your project ID.

Create the scala-sbt builder image

You use Cloud Build with the scala-sbt community builder to build the serving container image. This build process depends on having the sbt-scala builder image in your project's Container Registry.

  1. In Cloud Shell, clone the cloud-builders-community repository:

    git clone
  2. Go to the project directory:

    cd cloud-builders-community/scala-sbt
  3. Build the scala-sbt builder image and push it to Container Registry:

    gcloud builds submit .

Build the serving container image

Vertex AI uses the serving container to run prediction requests for the example model. Your first step in building the serving container image is to create a Docker repository in Artifact Registry in which to store the image. You then need to grant Vertex AI permission to pull the serving container image from the repository. After you create the repository and grant permissions, you can build the serving container image and push the image to Artifact Registry.

  1. In Cloud Shell, create a Docker repository in Artifact Registry:

    gcloud artifacts repositories create $REPOSITORY \
        --repository-format=docker \
  2. Grant the Artifact Registry Reader role to the Vertex AI Service Agent:

    PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID \
    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member="serviceAccount:$SERVICE_ACCOUNT" \
  3. Clone the spark-ml-serving repository:

    git clone
  4. Go to the project directory:

    cd vertex-ai-spark-ml-serving
  5. Build the serving container image in your project:

    gcloud builds submit --config=cloudbuild.yaml \

    The cloudbuild.yaml file specifies two builders: the scala-sbt builder and the docker image builder. Cloud Build uses the scala-sbt builder to compile the model serving code from Cloud Storage, and then to package the compiled code into an executable JAR file. Cloud Build uses the docker builder to build the serving container image that contains the JAR file. After the serving container image is built, the image is pushed to Artifact Registry.

Import the model into Vertex AI

The serving container reads model artifacts from Cloud Storage. You need to create a storage location for these artifacts before you import the model into Vertex AI. When you then import the model, you need both the model artifact storage location and the serving container image in Artifact Registry.

  1. In Cloud Shell, create a bucket for the model artifacts:

    gsutil mb -l $REGION gs://$BUCKET

    Replace YOUR_BUCKET_NAME with the name of your bucket.

  2. Copy the model artifacts to the bucket:

    gsutil cp example_model/* gs://$BUCKET/example_model/
  3. Import the model into Vertex AI:

    DISPLAY_NAME="iris-$(date +'%Y%m%d%H%M%S')"
    gcloud ai models upload \
        --region=$REGION \
        --display-name=$DISPLAY_NAME \
        --container-image-uri=$IMAGE_URI \
        --artifact-uri=$ARTIFACT_URI \
        --container-health-route="/health" \

    In the gcloud ai models upload command, the value of the --artifact-uri parameter specifies the value of the AIP_STORAGE_URI variable. This variable provides the location of the MLeap Bundle that is being imported to Vertex AI.

Deploy the model to a new endpoint

For Vertex AI to run predictions, the imported model needs to be deployed to an endpoint. You need both the endpoint's ID and the model's ID when you deploy the model.

  1. In Cloud Shell, create the model endpoint:

    gcloud ai endpoints create \
        --region=$REGION \

    The gcloud command-line tool might take a few seconds to create the endpoint.

  2. Get the endpoint ID of the newly created endpoint:

    ENDPOINT_ID=$(gcloud ai endpoints list \
        --region=$REGION \
        --filter=display_name=$DISPLAY_NAME \
    # Print ENDPOINT_ID to the console
    echo "Your endpoint ID is: $ENDPOINT_ID"
  3. Get the model ID of the model that you imported in the Import the model into Vertex AI section:

    MODEL_ID=$(gcloud ai models list \
        --region=$REGION \
        --filter=display_name=$DISPLAY_NAME \
    # Print MODEL_ID to the console
    echo "Your model ID is: $MODEL_ID"
  4. Deploy the model to the endpoint:

    gcloud ai endpoints deploy-model $ENDPOINT_ID \
        --region=$REGION \
        --model=$MODEL_ID \
        --display-name=$DISPLAY_NAME \

    The gcloud command deploys the model to the endpoint. Default values are used for the machine resource type, the minimum and maximum number of nodes, and other configuration options. For more information on deployment options for models, see the Vertex AI documentation.

Test the endpoint

After you deploy the model to the endpoint, you are able to test your implementation. To test the endpoint, you can use the example client that is included with the reference implementation code. The example client generates prediction instances and sends prediction requests to the endpoint. Each prediction instance contains randomized values for sepal_length, sepal_width, petal_length, and petal_width. By default, the example client combines multiple prediction instances into a single request. The response from the endpoint response includes a prediction for each instance that is sent in the request. The prediction contains the probabilities for each class in the Iris dataset (setosa, versicolor, and virginica).

  • In Cloud Shell, run the example prediction client:

    cd example_client
    ./ --project $PROJECT_ID \
        --location $LOCATION \
        --endpoint $ENDPOINT_ID

    When you run the script for the first time, the script creates a Python virtual environment and installs dependencies. After installing the dependencies, the script runs the example client. For each request, the client prints the prediction instances and corresponding class probabilities to the terminal. The following shows an excerpt of the output:

    Sending 10 asynchronous prediction requests with 3 instances per request ...
    ==> Response from request #10:
    Instance 1:     sepal_length:   5.925825137450266
                    sepal_width:    4.5047557888651
                    petal_length:   1.0432434310300223
                    petal_width:    0.5050397721287457
    Prediction 1:   setosa:         0.2036041134824573
                    versicolor:     0.6062980065549213
                    virginica:      0.1900978799626214
    Instance 2:     sepal_length:   6.121228622484405
                    sepal_width:    3.406317728235072
                    petal_length:   3.178583759980504
                    petal_width:    2.815141143581328
    Prediction 2:   setosa:         0.471811302254083
                    versicolor:     0.2063720436033448
                    virginica:      0.3218166541425723
    Instance 3:     sepal_length:   7.005781590327274
                    sepal_width:    2.532116893508745
                    petal_length:   2.6351337947193474
                    petal_width:    2.270855223519198
    Prediction 3:   setosa:         0.453579051699638
                    versicolor:     0.2132869980698818
                    virginica:      0.3331339502304803

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this reference implementation, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Delete individual resources

  1. In Cloud Shell, undeploy the model from the endpoint:

    DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe $ENDPOINT_ID \
        --region=$REGION \
    gcloud ai endpoints undeploy-model $ENDPOINT_ID \
        --region=$REGION \
  2. Delete the endpoint:

    gcloud ai endpoints delete $ENDPOINT_ID \
        --region=$REGION \
  3. Delete the model:

    gcloud ai models delete $MODEL_ID \
  4. Delete the serving container image:

    gcloud artifacts docker images delete \
        --delete-tags \
  5. Delete the scala-sbt builder container:

    gcloud container images delete$PROJECT_ID/scala-sbt \
        --force-delete-tags \
  6. Delete any Cloud Storage buckets that are no longer needed:

    gsutil rm -r YOUR_BUCKET_NAME

    Deleting a bucket will also delete all objects stored in that bucket. Deleted buckets and objects cannot be recovered after they are deleted.

What's next