Data scientists and machine learning (ML) engineers often require a serving architecture that is fast enough to meet the needs of generating online (or realtime) predictions from their ML models. Vertex AI is capable of meeting this need.
Using Vertex AI, you can serve models from a variety of ML frameworks. For frameworks like TensorFlow, PyTorch, XGBoost, and scikit-learn, Vertex AI provides prebuilt containers in which to run those ML models. If you aren't already using any of those ML frameworks, you'll have to create your own custom container for Vertex AI to use.
This document is for those users who need to create a custom container to serve their Spark ML models. Included in this document is both a description of the serving architecture needed for custom containers and a reference implementation that demonstrates this architecture for a Spark MLib model.
To get the most out of the reference implementation portion of this document, you should be familiar with exporting Spark MLlib models to MLeap format, understand how to use Vertex AI for serving predictions, and have experience using container images.
Architecture
While prebuilt containers are available for some ML frameworks, users of other ML frameworks, such as Spark, need to build custom containers in which Vertex AI can run predictions. The following diagram illustrates the serving architecture that you need to serve Spark MLib models and other models that require a custom container:
This architecture includes the following components:
- Cloud Storage: provides storage for the model artifacts needed to run your model. For the Spark ML model used in the accompanying reference implementation, the model artifacts consist of an MLeap Bundle and a model schema.
- Cloud Build: uses the builder image to build a custom container image called the serving container image. The build process compiles and packages the model serving code, builds the serving container image, and then pushes the serving container image to Artifact Registry.
- Artifact Registry: contains the following objects:
- The
scala-sbt
builder container image that Cloud Build uses to build the serving container image. - The serving container image that is built by Cloud Build.
- The
- Vertex AI: contains the ML model that has been uploaded from Cloud Storage. The uploaded model is configured with the location of the model artifacts within Cloud Storage and the location of the serving container image within Artifact Registry. Vertex AI also includes an endpoint to which the model has been deployed. When the model has been deployed to the endpoint, Vertex AI associates physical resources with the model so that model can serve online predictions.
As part of implementing this serving architecture, you would need to export your ML model for use by other applications and define your own serving container image. The reference implementation provided in this document provides the code used to define and build the serving container image. This code also includes the model artifacts for a previously exported Spark ML model. With some configuration changes, you could use this reference implementation to serve your own Spark ML models.
However, you can implement this serving architecture on your own and not use the reference implementation. If you decide to implement your own architecture, you would need to do the following:
- Export your model so that it can be used by other applications. This process depends on the ML frameworks and tools that you are using. For example, you might choose to export your Spark MLlib models by creating an MLeap bundle as described in the reference implementation. You can see other examples of how to export models in Export model artifacts for prediction.
- Design your serving container image to meet the custom container requirements that make that image compatible with Vertex AI. The code can be in the programming language of your choice.
- Package up the code in a package file format compatible with the programming language that you used. For instance, you can use a JAR file for Java code or a Python wheel for Python code.
- Create a custom container image that is capable of serving your custom mode code.
Reference implementation
The following reference implementation serves a Spark MLib model that predicts the species of iris based upon the length and width of the flower's sepals and petals.
You can find the model that is used in this implementation in the
example_model
directory in the vertex-ai-spark-ml-serving.git
repository.
The directory contains the model artifacts that are used by the serving
container to run predictions, and includes the following files:
- The
example_model/model.zip
file is a logistic regression model that is built using Spark MLlib, has been trained using the Iris dataset, and has been converted to an MLeap Bundle. The model predicts the species of an iris flower by using the length and widths of the flower's sepals and petals. - The
example_model/schema.json
file is a JSON file that describes the model schema. The model schema describes the expected input fields for prediction instances and output fields for prediction results that are required for the MLeap schema.
Use your own Mlib model
To use your own model with this reference implementation, first make sure that your Spark MLlib model has been exported to an MLeap Bundle. Then to serve your Spark MLib model, you must provide the appropriate model artifacts: the MLeap Bundle and the model schema.
MLeap Bundle
The serving container determines the location of the MLeap Bundle by using the
AIP_STORAGE_URI
environment variable
that is passed from Vertex AI to the container on startup. The
value of the AIP_STORAGE_URI
variable is specified when you upload the model
to Vertex AI.
Model schema
The model schema describes a model's input features and prediction output. The model schema is represented using JSON data. The following is the schema used in this reference implementation to predict the species of iris based upon the flower's length and width of its sepals and petals:
{ "input": [ { "name": "sepal_length", "type": "FLOAT" }, { "name": "sepal_width", "type": "FLOAT" }, { "name": "petal_length", "type": "FLOAT" }, { "name": "petal_width", "type": "FLOAT" } ], "output": [ { "name": "probability", "type": "DOUBLE", "struct": "VECTOR" } ] }
In the example schema, the input
array contains the input fields (columns) to
the model while the output
array contains the output fields (columns) to be
returned from the model. In both arrays, each object of the array contains the
following properties:
name
: The field (column) name.type
: The field (column) type. Valid types includeBOOLEAN
,BYTE
,DOUBLE
,FLOAT
,INTEGER
,LONG
,SHORT
, andSTRING
.- (optional)
struct
: The field structure, such as a scalar or array. Valid structures includeBASIC
(scalar type),ARRAY
(SparkArray
), andVECTOR
(SparkDenseVector
).BASIC
is used if thestruct
field is not present.
To pass your model schema to the serving container, you can use one of the following methods:
- Specify the JSON data that defines the schema in the
MLEAP_SCHEMA
environment variable. TheMLEAP_SCHEMA
environment variable should contain the JSON data itself, and not a path to a file that contains the JSON schema. - Store the JSON data in a file called
schema.json
, and make this file available to the container at${AIP_STORAGE_URI}/schema.json
. This is the method that is used for the example MLib model provided with this documentation.
If you use both methods to pass the model schema to the serving container, the
JSON data that is stored in the MLEAP_SCHEMA
environment variable takes
precedence.
Costs
This reference implementation uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage, use the pricing calculator.
When you finish this reference implementation, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.
Before you begin
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Vertex AI, Cloud Build, Cloud Storage, and Artifact Registry APIs.
-
In the Google Cloud console, activate Cloud Shell.
- Find your project ID and set it in Cloud Shell.
export PROJECT_ID=YOUR_PROJECT_ID gcloud config set project ${PROJECT_ID}
Replace
YOUR_PROJECT_ID
with your project ID.
Create the scala-sbt builder image
You use Cloud Build with the
scala-sbt
community builder
to build the serving container image. This build process depends on having the
sbt-scala
builder image in your project's Container Registry.
In Cloud Shell, clone the
cloud-builders-community
repository:git clone https://github.com/GoogleCloudPlatform/cloud-builders-community.git
Go to the project directory:
cd cloud-builders-community/scala-sbt
Build the
scala-sbt
builder image and push it to Container Registry:gcloud builds submit .
Build the serving container image
Vertex AI uses the serving container to run prediction requests for the example model. Your first step in building the serving container image is to create a Docker repository in Artifact Registry in which to store the image. You then need to grant Vertex AI permission to pull the serving container image from the repository. After you create the repository and grant permissions, you can build the serving container image and push the image to Artifact Registry.
In Cloud Shell, create a Docker repository in Artifact Registry:
REPOSITORY="vertex-ai-prediction" LOCATION="us-central1" gcloud artifacts repositories create $REPOSITORY \ --repository-format=docker \ --location=$LOCATION
Grant the Artifact Registry Reader role to the Vertex AI Service Agent:
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID \ --format="value(projectNumber)") SERVICE_ACCOUNT="service-$PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com" gcloud projects add-iam-policy-binding $PROJECT_ID \ --member="serviceAccount:$SERVICE_ACCOUNT" \ --role="roles/artifactregistry.reader"
Clone the
spark-ml-serving
repository:git clone https://github.com/GoogleCloudPlatform/vertex-ai-spark-ml-serving.git
Go to the project directory:
cd vertex-ai-spark-ml-serving
Build the serving container image in your project:
IMAGE=spark-ml-serving gcloud builds submit --config=cloudbuild.yaml \ --substitutions="_LOCATION=$LOCATION,_REPOSITORY=$REPOSITORY,_IMAGE=$IMAGE" .
The
cloudbuild.yaml
file specifies two builders: thescala-sbt
builder and thedocker
image builder. Cloud Build uses thescala-sbt
builder to compile the model serving code from Cloud Storage, and then to package the compiled code into an executable JAR file. Cloud Build uses thedocker
builder to build the serving container image that contains the JAR file. After the serving container image is built, the image is pushed to Artifact Registry.
Import the model into Vertex AI
The serving container reads model artifacts from Cloud Storage. You need to create a storage location for these artifacts before you import the model into Vertex AI. When you then import the model, you need both the model artifact storage location and the serving container image in Artifact Registry.
In Cloud Shell, create a bucket for the model artifacts:
REGION="us-central1" BUCKET="YOUR_BUCKET_NAME" gcloud storage buckets create gs://$BUCKET --location=$REGION
Replace
YOUR_BUCKET_NAME
with the name of your bucket.Copy the model artifacts to the bucket:
gcloud storage cp example_model/* gs://$BUCKET/example_model/
Import the model into Vertex AI:
DISPLAY_NAME="iris-$(date +'%Y%m%d%H%M%S')" IMAGE_URI="${LOCATION}-docker.pkg.dev/$PROJECT_ID/${REPOSITORY}/${IMAGE}" ARTIFACT_URI="gs://$BUCKET/example_model/" gcloud ai models upload \ --region=$REGION \ --display-name=$DISPLAY_NAME \ --container-image-uri=$IMAGE_URI \ --artifact-uri=$ARTIFACT_URI \ --container-health-route="/health" \ --container-predict-route="/predict"
In the
gcloud ai models upload
command, the value of the--artifact-uri
parameter specifies the value of theAIP_STORAGE_URI
variable. This variable provides the location of the MLeap Bundle that is being imported to Vertex AI.
Deploy the model to a new endpoint
For Vertex AI to run predictions, the imported model needs to be deployed to an endpoint. You need both the endpoint's ID and the model's ID when you deploy the model.
In Cloud Shell, create the model endpoint:
gcloud ai endpoints create \ --region=$REGION \ --display-name=$DISPLAY_NAME
The
gcloud
command-line tool might take a few seconds to create the endpoint.Get the endpoint ID of the newly created endpoint:
ENDPOINT_ID=$(gcloud ai endpoints list \ --region=$REGION \ --filter=display_name=$DISPLAY_NAME \ --format='value(name)') # Print ENDPOINT_ID to the console echo "Your endpoint ID is: $ENDPOINT_ID"
Get the model ID of the model that you imported in the Import the model into Vertex AI section:
MODEL_ID=$(gcloud ai models list \ --region=$REGION \ --filter=display_name=$DISPLAY_NAME \ --format='value(name)') # Print MODEL_ID to the console echo "Your model ID is: $MODEL_ID"
Deploy the model to the endpoint:
gcloud ai endpoints deploy-model $ENDPOINT_ID \ --region=$REGION \ --model=$MODEL_ID \ --display-name=$DISPLAY_NAME \ --traffic-split="0=100"
The
gcloud
command deploys the model to the endpoint. Default values are used for the machine resource type, the minimum and maximum number of nodes, and other configuration options. For more information on deployment options for models, see the Vertex AI documentation.
Test the endpoint
After you deploy the model to the endpoint, you are able to test your
implementation. To test the endpoint, you can use the example client that is
included with the reference implementation code. The example client generates
prediction instances and sends prediction requests to the endpoint. Each
prediction instance contains randomized values for sepal_length
,
sepal_width
, petal_length
, and petal_width
. By default, the example client
combines multiple prediction instances into a single request. The response from
the endpoint response includes a prediction for each instance that is sent in
the request. The prediction contains the probabilities for each class in the
Iris dataset (setosa
, versicolor
, and virginica
).
In Cloud Shell, run the example prediction client:
cd example_client ./run_client.sh --project $PROJECT_ID \ --location $LOCATION \ --endpoint $ENDPOINT_ID
When you run the script for the first time, the script creates a Python virtual environment and installs dependencies. After installing the dependencies, the script runs the example client. For each request, the client prints the prediction instances and corresponding class probabilities to the terminal. The following shows an excerpt of the output:
Sending 10 asynchronous prediction requests with 3 instances per request ... ==> Response from request #10: Instance 1: sepal_length: 5.925825137450266 sepal_width: 4.5047557888651 petal_length: 1.0432434310300223 petal_width: 0.5050397721287457 Prediction 1: setosa: 0.2036041134824573 versicolor: 0.6062980065549213 virginica: 0.1900978799626214 Instance 2: sepal_length: 6.121228622484405 sepal_width: 3.406317728235072 petal_length: 3.178583759980504 petal_width: 2.815141143581328 Prediction 2: setosa: 0.471811302254083 versicolor: 0.2063720436033448 virginica: 0.3218166541425723 Instance 3: sepal_length: 7.005781590327274 sepal_width: 2.532116893508745 petal_length: 2.6351337947193474 petal_width: 2.270855223519198 Prediction 3: setosa: 0.453579051699638 versicolor: 0.2132869980698818 virginica: 0.3331339502304803
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this reference implementation, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete individual resources
In Cloud Shell, undeploy the model from the endpoint:
DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe $ENDPOINT_ID \ --region=$REGION \ --format='value(deployedModels.id)') gcloud ai endpoints undeploy-model $ENDPOINT_ID \ --region=$REGION \ --deployed-model-id=$DEPLOYED_MODEL_ID
Delete the endpoint:
gcloud ai endpoints delete $ENDPOINT_ID \ --region=$REGION \ --quiet
Delete the model:
gcloud ai models delete $MODEL_ID \ --region=$REGION
Delete the serving container image:
gcloud artifacts docker images delete \ $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE \ --delete-tags \ --quiet
Delete the
scala-sbt
builder container:gcloud container images delete gcr.io/$PROJECT_ID/scala-sbt \ --force-delete-tags \ --quiet
Delete any Cloud Storage buckets that are no longer needed:
gcloud storage rm YOUR_BUCKET_NAME --recursive
Deleting a bucket will also delete all objects stored in that bucket. Deleted buckets and objects cannot be recovered after they are deleted.
What's next
- Learn more about running predictions using Vertex AI.
- Learn more about Spark on Google Cloud.
- For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.