Custom prediction routines

Custom prediction routines (CPR) lets you build custom containers with pre/post processing code easily, without dealing with the details of setting up an HTTP server or building a container from scratch. You can use preprocessing to normalize/transform the inputs or make calls to external services to get additional data, and use post processing to format the model prediction or run business logic.

The following diagram depicts the user workflow both with and without custom prediction routines.

The main differences are:

  • You don't need to write a model server or a Dockerfile. The model server, which is the HTTP server that hosts the model, is provided for you.

  • You can deploy and debug the model locally, speeding up the iteration cycle during development.

Build and deploy a custom container

This section describes how to use CPR to build a custom container with pre/post processing logic and deploy to both a local and online endpoint.

Setup

You must have Vertex AI SDK and Docker installed in your environment.

Write custom Predictor

Implement the Predictor interface.

class Predictor(ABC):
    """Interface of the Predictor class for Custom Prediction Routines.
    The Predictor is responsible for the ML logic for processing a prediction request.
    Specifically, the Predictor must define:
    (1) How to load all model artifacts used during prediction into memory.
    (2) The logic that should be executed at predict time.
    When using the default PredictionHandler, the Predictor will be invoked as follows:
      predictor.postprocess(predictor.predict(predictor.preprocess(prediction_input)))
    """

    @abstractmethod
    def load(self, artifacts_uri: str) -> None:
        """Loads the model artifact.
        Args:
            artifacts_uri (str):
                Required. The value of the environment variable AIP_STORAGE_URI.
        """
        pass

    def preprocess(self, prediction_input: Any) -> Any:
        """Preprocesses the prediction input before doing the prediction.
        Args:
            prediction_input (Any):
                Required. The prediction input that needs to be preprocessed.
        Returns:
            The preprocessed prediction input.
        """
        return prediction_input

    @abstractmethod
    def predict(self, instances: Any) -> Any:
        """Performs prediction.
        Args:
            instances (Any):
                Required. The instance(s) used for performing prediction.
        Returns:
            Prediction results.
        """
        pass

    def postprocess(self, prediction_results: Any) -> Any:
        """Postprocesses the prediction results.
        Args:
            prediction_results (Any):
                Required. The prediction results.
        Returns:
            The postprocessed prediction results.
        """
        return prediction_results

For example, see Sklearn's Predictor implementation.

Write custom Handler (optional)

Custom handlers have access to the raw request object, and thus, are useful in rare cases where you need to customize web server related logic, such as supporting additional request/response headers or deserializing non-JSON formatted prediction requests.

Here is a sample notebook that implements both Predictor and Handler.

Although it is not required, for better code organization and reusability, we recommend that you implement the web server logic in the Handler and the ML logic in the Predictor as shown in the default handler.

Build custom container

Put your custom code and an additional requirements.txt file, if you need to install any packages in your images, in a directory.

Use Vertex SDK to build custom containers as shown below:

from google.cloud.aiplatform.prediction import LocalModel

# {import your predictor and handler}

local_model = LocalModel.build_cpr_model(
    {PATH_TO_THE_SOURCE_DIR},
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE}",
    predictor={PREDICTOR_CLASS},
    handler={HANDLER_CLASS},
    requirements_path={PATH_TO_REQUIREMENTS_TXT},
)

You can inspect the container's spec to get useful information such as image URI and environment variables.

local_model.get_serving_container_spec()

Run the container locally (optional)

This step is required only if you want to run and test the container locally which is useful for faster iteration. In the following example, we deploy to local endpoint and send a prediction request (format for request body).

with local_model.deploy_to_local_endpoint(
    artifact_uri={GCS_PATH_TO_MODEL_ARTIFACTS},
    credential_path={PATH_TO_CREDENTIALS},
) as local_endpoint:
    health_check_response = local_endpoint.run_health_check()
    predict_response = local_endpoint.predict(
        request_file={PATH_TO_INPUT_FILE},
        headers={ANY_NEEDED_HEADERS},
    )

Print out the health check and prediction response.

print(health_check_response, health_check_response.content)
print(predict_response, predict_response.content)

Print out all the container logs.

local_endpoint.print_container_logs(show_all=True)

Upload to Vertex AI Model Registry

Your model will need to access your model artifacts (the files from training), so make sure you've uploaded them to Google Cloud Storage.

Push the image to the Artifact Registry.

local_model.push_image()

Then, upload to Model Registry.

from google.cloud import aiplatform

model = aiplatform.Model.upload(
    local_model=local_model,
    display_name={MODEL_DISPLAY_NAME},
    artifact_uri={GCS_PATH_TO_MODEL_ARTIFACTS},
)

Once your model is uploaded to the Vertex AI Model Registry, it may be used to get batch predictions or deployed to a Vertex AI Endpoint to get online predictions.

Deploy to Vertex AI endpoint

endpoint = model.deploy(machine_type="n1-standard-4")

Once deployed, you can get online predictions.

Notebook Samples

The samples showcase the different ways you can deploy a model with custom pre/post-processing on Vertex AI Prediction.