Use Private Service Connect endpoints for online prediction

Private Service Connect lets you access Vertex AI online predictions securely from multiple consumer projects and VPC networks without the need for public IP addresses, public internet access, or an explicitly peered internal IP address range.

We recommend Private Service Connect for online prediction use cases that have the following requirements:

Require private and secure connections
Require low latency
Don't need to be publicly accessible

Private Service Connect uses a forwarding rule in your VPC network to send traffic unidirectionally to the Vertex AI online prediction service. The forwarding rule connects to a service attachment that exposes the Vertex AI service to your VPC network. For more information, see About accessing Vertex AI services through Private Service Connect. To learn more about setting up Private Service Connect, see the Private Service Connect overview in the Virtual Private Cloud (VPC) documentation.

Create the online prediction endpoint

Use one of the following methods to create an online prediction endpoint with Private Service Connect enabled:

Console

In the Google Cloud console, in Vertex AI, go to the Online prediction page.

Go to Online prediction
Click Create.
Provide a display name for the endpoint.
Select Private.
Select Private Service Connect.
Click Select project IDs.
Select projects to add to the allowlist for the endpoint.
Click Continue.
Choose your model specifications. For more information, see Deploy a model to an endpoint.
Click Create to create your endpoint and deploy your model to it.
Make a note of the endpoint ID in the response.

API

REST

Before using any of the request data, make the following replacements:

VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you're creating the online prediction endpoint.
REGION: the region where you're using Vertex AI.
VERTEX_AI_ENDPOINT_NAME: the display name for the online prediction endpoint.
ALLOWED_PROJECTS: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks, for example, ["PROJECTID1", "PROJECTID2"]. If a project isn't contained in this list, you won't be able to send prediction requests to the Vertex AI endpoint from it. Make sure to include VERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.

HTTP method and URL:

POST https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints

Request JSON body:

{
  "displayName": "VERTEX_AI_ENDPOINT_NAME",
  "privateServiceConnectConfig": {
    "enablePrivateServiceConnect": true,
    "projectAllowlist": ["ALLOWED_PROJECTS"]
  }
}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints"

PowerShell (Windows)

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://REGION-aiplatform.googleapis.com/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/VERTEX_AI_PROJECT_NUMBER/locations/REGION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

Make a note of the ENDPOINT_ID.

Python

Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

PROJECT_ID = "VERTEX_AI_PROJECT_ID"
REGION = "REGION"
VERTEX_AI_ENDPOINT_NAME = "VERTEX_AI_ENDPOINT_NAME"

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

# Create the forwarding rule in the consumer project
psc_endpoint = aiplatform.PrivateEndpoint.create(
display_name=VERTEX_AI_ENDPOINT_NAME,
project=PROJECT_ID,
location=REGION,
private_service_connect_config=aiplatform.PrivateEndpoint.PrivateServiceConnectConfig(
    project_allowlist=["ALLOWED_PROJECTS"],
    ),
)

Replace the following:

VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you're creating the online prediction endpoint
REGION: the region where you're using Vertex AI
VERTEX_AI_ENDPOINT_NAME: the display name for the online prediction endpoint
ALLOWED_PROJECTS: a comma-separated list of Google Cloud project IDs, each enclosed in quotation marks. For example, ["PROJECTID1", "PROJECTID2"]. If a project isn't contained in this list, you won't be able to send prediction requests to the Vertex AI endpoint from it. Make sure to include VERTEX_AI_PROJECT_ID in this list so that you can call the endpoint from the same project it's in.

Make a note of the ENDPOINT_ID at the end of the returned endpoint URI:

INFO:google.cloud.aiplatform.models:To use this PrivateEndpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.PrivateEndpoint('projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID')

Deploy the model

After you create your online prediction endpoint with Private Service Connect enabled, deploy your model to it, following the steps outlined in Deploy a model to an endpoint.

Get the service attachment URI

When you deploy your model, a service attachment is created for the online prediction endpoint. This service attachment represents the Vertex AI online prediction service that's being exposed to your VPC network. Run the gcloud ai endpoints describe command to get the service attachment URI.

List only the serviceAttachment value from the endpoint details:
```
gcloud ai endpoints describe ENDPOINT_ID \
--project=VERTEX_AI_PROJECT_ID \
--region=REGION \
| grep -i serviceAttachment
```
Replace the following:
- ENDPOINT_ID: the ID of your online prediction endpoint
- VERTEX_AI_PROJECT_ID: the ID of the Google Cloud project where you created your online prediction endpoint
- REGION: the region for this request
The output is similar to the following:
```
serviceAttachment: projects/ac74a9f84c2e5f2a1-tp/regions/us-central1/serviceAttachments/gkedpm-c6e6a854a634dc99472bb802f503c1
```
Make a note of the entire string in the serviceAttachment field. This is the service attachment URI.

Create a forwarding rule

You can reserve an internal IP address and create a forwarding rule with that address. To create the forwarding rule, you need the service attachment URI from the previous step.

To reserve an internal IP address for the forwarding rule, use the gcloud compute addresses create command:
```
gcloud compute addresses create ADDRESS_NAME \
--project=VPC_PROJECT_ID \
--region=REGION \
--subnet=SUBNETWORK \
--addresses=INTERNAL_IP_ADDRESS
```
Replace the following:
- ADDRESS_NAME: a name for the internal IP address
- VPC_PROJECT_ID: the ID of the Google Cloud project that hosts your VPC network. If your online prediction endpoint and your Private Service Connect forwarding rule are hosted in the same project, use VERTEX_AI_PROJECT_ID for this parameter.
- REGION: the Google Cloud region where the Private Service Connect forwarding rule is to be created
- SUBNETWORK: the name of the VPC subnet that contains the IP address
- INTERNAL_IP_ADDRESS: the internal IP address to reserve. This parameter is optional.
  - If this parameter is specified, the IP address must be within the subnet's primary IP address range. The IP address can be an RFC 1918 address or a subnet with non-RFC ranges.
  - If this parameter is omitted, an internal IP address is allocated automatically.
  - For more information, see Reserve a new static internal IPv4 or IPv6 address.
To verify that the IP address is reserved, use the gcloud compute addresses list command:
```
gcloud compute addresses list --filter="name=(ADDRESS_NAME)" \
--project=VPC_PROJECT_ID
```
In the response, verify that a RESERVED status appears for the IP address.
To create the forwarding rule and point it to the online prediction service attachment, use the gcloud compute forwarding-rules create command:
```
gcloud compute forwarding-rules create PSC_FORWARDING_RULE_NAME \
    --address=ADDRESS_NAME \
    --project=VPC_PROJECT_ID \
    --region=REGION \
    --network=VPC_NETWORK_NAME \
    --target-service-attachment=SERVICE_ATTACHMENT_URI
```
Replace the following:
- PSC_FORWARDING_RULE_NAME: a name for the forwarding rule
- VPC_NETWORK_NAME: the name of the VPC network where the endpoint is to be created
- SERVICE_ATTACHMENT_URI: the service attachment that you made a note of earlier
To verify that the service attachment accepts the endpoint, use the gcloud compute forwarding-rules describe command:
```
gcloud compute forwarding-rules describe PSC_FORWARDING_RULE_NAME \
--project=VPC_PROJECT_ID \
--region=REGION
```
In the response, verify that an ACCEPTED status appears in the pscConnectionStatus field.

Optional: Get the internal IP address

If you didn't specify a value for INTERNAL_IP_ADDRESS when you created the forwarding rule, you can get the address that was allocated automatically by using the gcloud compute forwarding-rules describe command:

gcloud compute forwarding-rules describe PSC_FORWARDING_RULE_NAME \
--project=VERTEX_AI_PROJECT_ID \
--region=REGION \
| grep -i IPAddress

Replace the following:

VERTEX_AI_PROJECT_ID: your project ID
REGION: the region name for this request

Get online predictions

Getting online predictions from an endpoint with Private Service Connect is similar to getting online predictions from public endpoints, except for the following considerations:

The request must be sent from a project that was specified in the projectAllowlist when the online prediction endpoint was created.
If global access isn't enabled, the request must be sent from the same region.
To get predictions using REST, you must connect using the endpoint's static IP address, unless you create a DNS record for the internal IP address. For example, you must send your predict requests to the following endpoint:
```
https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict
```
Replace INTERNAL_IP_ADDRESS with the internal IP address that you reserved earlier.

The following sections provide examples of how you can send the predict request using Python.

First example

psc_endpoint = aiplatform.PrivateEndpoint("projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID")
REQUEST_FILE = "PATH_TO_INPUT_FILE"
import json

import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

with open(REQUEST_FILE) as json_file:
    data = json.load(json_file)
    response = psc_endpoint.predict(
        instances=data["instances"], endpoint_override=INTERNAL_IP_ADDRESS
    )
print(response)

Replace PATH_TO_INPUT_FILE with a path to a JSON file containing the request input.

Second example

import json
import requests
import urllib3
import google.auth.transport.requests

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

REQUEST_FILE = "PATH_TO_INPUT_FILE"

# Programmatically get credentials and generate an access token
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
access_token = creds.token
# Note: the credential lives for 1 hour by default
# After expiration, it must be refreshed
# See https://cloud.google.com/docs/authentication/token-types#at-lifetime

with open(REQUEST_FILE) as json_file:
    data = json.load(json_file)
    url = "https://INTERNAL_IP_ADDRESS/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict"
    headers = {
      "Content-Type": "application/json",
      "Authorization": f"Bearer {access_token}"  # Add access token to headers
    }
    payload = {
      "instances": data["instances"],
    }

response = requests.post(url, headers=headers, json=payload, verify=False)

print(response.json())

Optional: Create a DNS record for the internal IP address

We recommend that you create a DNS record so that you can get online predictions from your endpoint without needing to specify the internal IP address.

For more information, see Other ways to configure DNS.

Create a private DNS zone by using the gcloud dns managed-zones create command. This zone is associated with the VPC network that the forwarding rule was created in.

DNS_NAME_SUFFIX="prediction.p.vertexai.goog."  # DNS names have "." at the end.
gcloud dns managed-zones create ZONE_NAME \
--project=VPC_PROJECT_ID \
--dns-name=$DNS_NAME_SUFFIX \
--networks=VPC_NETWORK_NAME \
--visibility=private \
--description="A DNS zone for Vertex AI endpoints using Private Service Connect."

Replace the following:

ZONE_NAME: the name of the DNS zone

To create a DNS record in the zone, use the gcloud dns record-sets create command:
```
DNS_NAME=ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.$DNS_NAME_SUFFIX
gcloud dns record-sets create $DNS_NAME \
--rrdatas=INTERNAL_IP_ADDRESS \
--zone=ZONE_NAME \
--type=A \
--ttl=60 \
--project=VPC_PROJECT_ID
```
Replace the following:
- VERTEX_AI_PROJECT_NUMBER: the project number for your VERTEX_AI_PROJECT_ID project. You can locate this project number in the Google Cloud console. For more information, see Identifying projects.
- INTERNAL_IP_ADDRESS: the internal IP address of your online prediction endpoint
Now you can send your predict requests to:
```
https://ENDPOINT_ID-REGION-VERTEX_AI_PROJECT_NUMBER.prediction.p.vertexai.goog/v1/projects/VERTEX_AI_PROJECT_ID/locations/REGION/endpoints/ENDPOINT_ID:predict
```

The following is an example of how you can send the predict request to the DNS zone using Python:

REQUEST_FILE = "PATH_TO_INPUT_FILE"
import json

import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

with open(REQUEST_FILE) as json_file:
    data = json.load(json_file)
    response = psc_endpoint.predict(
        instances=data["instances"], endpoint_override=DNS_NAME
    )
print(response)

Replace DNS_NAME with the DNS name that you specified in the gcloud dns record-sets create command.

Limitations

Vertex AI endpoints with Private Service Connect are subject to the following limitations:

Private egress from within the endpoint isn't supported. Because Private Service Connect forwarding rules are unidirectional, other private Google Cloud workloads aren't accessible inside your container.
An endpoint's projectAllowlist value can't be changed.
Vertex Explainable AI isn't supported.
If all models are undeployed for more than 10 minutes, the service attachment might be deleted. Check the Private Service Connect connection status; if it's CLOSED, recreate the forwarding rule.
A project can have up to 10 different projectAllowlist values in its Private Service Connect configurations.

Use Private Service Connect endpoints for online prediction

Create the online prediction endpoint

Console

API

REST

curl (Linux, macOS, or Cloud Shell)

PowerShell (Windows)

Python

Deploy the model

Get the service attachment URI

Create a forwarding rule

Optional: Get the internal IP address

Get online predictions

First example

Second example

Optional: Create a DNS record for the internal IP address

Limitations

What's next