Try Online Predictions

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Online Predictions API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Before trying online predictions, perform the following steps:

  1. Create and train a prediction model targeting one of the supported containers.
  2. If you don't have a project, work with your Platform Administrator (PA) to create one.
  3. Work with your Infrastructure Operator (IO) to ensure the Prediction user cluster exists and your user project allows incoming external traffic.
  4. Deploy your model to an endpoint.

Format your input for online prediction

Create a JSON file with the request in the format required by the target container. For more information about the JSON representation with examples for TensorFlow, see Request body details.

Send an online prediction request

Send an online prediction request to the model's endpoint URL using HTTP or gRPC.

HTTP

The following example uses HTTP to send an online prediction request.

Use the curl tool to call the HTTP endpoint. For example:

curl -X POST -H "Content-Type: application/json; charset=utf-8"
https://ENDPOINT_URL_PATH.GDC_URL:443/v1/model:predict -d @JSON_FILE_NAME.json

{
    "predictions": [[-357.10849], [-171.621658]
    ]
}

Replace the following:

  • ENDPOINT_URL_PATH: the endpoint URL path for the online prediction request.
  • GDC_URL: the URL of your organization in Distributed Cloud, for example, org-1.zone1.gdch.test.
  • JSON_FILE_NAME: the name of the JSON file with the request body details for your online prediction.

You obtain the output following the command. The API response is in JSON format.

gRPC

The following example uses gRPC to send an online prediction request:

  1. Install the google-cloud-aiplatform Python client library by following the instructions from Install Vertex AI client libraries.

    When downloading the client library you want to install, choose one of the following library files, depending on your operating system:

    • CentOS: centos-google-cloud-aiplatform-1.34.0.tar.gz
    • Ubuntu: ubuntu-google-cloud-aiplatform-1.34.0.tar.gz

    Use the following URL to download the client library:

    https://GDC_URL/.well-known/static/client-libraries/LIBRARY_FILE
    

    Replace the following:

    • GDC_URL: the URL of your organization in Distributed Cloud.
    • LIBRARY_FILE: the name of the library file depending on the operating system, for example, ubuntu-google-cloud-aiplatform-1.34.0.tar.gz.
  2. Save the following code to a Python script:

    import json
    import os
    from typing import Sequence
    
    import grpc
    from absl import app
    from absl import flags
    
    from google.protobuf import json_format
    from google.protobuf.struct_pb2 import Value
    from google.cloud.aiplatform_v1.services import prediction_service
    
    _INPUT = flags.DEFINE_string("input", None, "input", required=True)
    _HOST = flags.DEFINE_string("host", None, "Prediction endpoint", required=True)
    _ENDPOINT_ID = flags.DEFINE_string("endpoint_id", None, "endpoint id", required=True)
    
    os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = "path-to-ca-cert-file.cert"
    
    # ENDPOINT_RESOURCE_NAME is a placeholder value that doesn't affect prediction behavior.
    ENDPOINT_RESOURCE_NAME="projects/000000000000/locations/us-central1/endpoints/00000000000000"
    
    # predict_client_secure builds a client that requires TLS
    def predict_client_secure(host):
        with open(os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"], 'rb') as f:
            creds = grpc.ssl_channel_credentials(f.read())
    
        channel_opts = ()
        channel_opts += (('grpc.ssl_target_name_override', host),)
        client = prediction_service.PredictionServiceClient(
            transport=prediction_service.transports.grpc.PredictionServiceGrpcTransport(
                channel=grpc.secure_channel(target=host+":443", credentials=creds, options=channel_opts)))
        return client
    
    def predict_func(client, instances):
        resp = client.predict(
            endpoint=ENDPOINT_RESOURCE_NAME,
            instances=instances,
            metadata=[
                ("x-vertex-ai-endpoint-id", _ENDPOINT_ID.value)])
        print(resp)
    
    def main(argv: Sequence[str]):
        del argv  # Unused.
        with open(_INPUT.value) as json_file:
            data = json.load(json_file)
            instances = [json_format.ParseDict(s, Value()) for s in data["instances"]]
    
        client = predict_client_secure(_HOST.value)
    
        predict_func(client=client, instances=instances)
    
    if __name__=="__main__":
        app.run(main)
    
  3. Make the gRPC call to the prediction server:

    python PYTHON_FILE_NAME.py --input JSON_FILE_NAME.json \
        --host ENDPOINT_URL_PATH.GDC_URL \
        --endpoint_id ENDPOINT_ID \
    

    Replace the following:

    • PYTHON_FILE_NAME: the name of the Python file where you saved the script.
    • JSON_FILE_NAME: the name of the JSON file with the request body details for your online prediction.
    • ENDPOINT_URL_PATH: the endpoint URL path for the online prediction request.
    • GDC_URL: the URL of your organization in Distributed Cloud, for example, org-1.zone1.gdch.test.
    • ENDPOINT_ID: the value of the endpoint ID.

If successful, you receive a JSON response similar to one of the responses on Response body examples.