Use Private Service Connect interface for Vertex AI Training

Private Service Connect interface is recommended for private connectivity since it reduces the chance of IP exhaustion and allows for transitive peering.

Private Service Connect interface is supported on Vertex AI custom jobs and persistent resources.

Overview

Private Service Connect interface is supported on Vertex AI Training custom jobs and persistent resources. To use Private Service Connect interface, you need to set up a VPC network, subnetwork, and network attachment in your user project. See Set up a Private Service Connect interface. The network attachment name must be included in the request to create a custom job or persistent resource to enable Private Service Connect interface.

Vertex AI Private Service Connect egress connectivity to other networks

Vertex AI has integrated the egress network connectivities that are supported by Private Service Connect (see Connecting to workloads in other networks), with the following exceptions:

Egress to a customer's Private Google Access isn't supported. Instead Private Service Connect egress would resolve locally for Private Google Access.
Egress to Cloud NAT is supported only when VPC Service Control is enabled.

Limitations

Private Service Connect interfaces don't support external IP addresses.
Network attachments can't be deleted unless the producer (Vertex AI) deletes the allocated resources. To initiate the delete process, you must contact Vertex AI support.

Pricing

Pricing for Private Service Connect interfaces is described in the "Using a Private Service Connect interface for access to a producer or consumer VPC network" section in the All networking pricing page.

Before you begin

Set up your resources for Private Service Connect interface on your user project.

Create a custom training job with a Private Service Connect interface

You can create a custom training job with Private Service Connect interface by using the Vertex AI SDK for Python or the REST API.

Vertex AI SDK for Python

To create a custom training job with PSC-I using the Vertex AI SDK for Python, configure the job using the aiplatform_v1beta1/services/job_service definition.

Vertex AI SDK for Python

from google.cloud import aiplatform
from google.cloud import aiplatform_v1beta1


def create_custom_job_psci_sample(
    project: str,
    location: str,
    bucket: str,
    display_name: str,
    machine_type: str,
    replica_count: int,
    image_uri: str,
    network_attachment: str,
):
    """Custom training job sample with PSC-I through aiplatform_v1beta1."""
    aiplatform.init(project=project, location=location, staging_bucket=bucket)

    client_options = {"api_endpoint": f"{location}-aiplatform.googleapis.com"}

    client = aiplatform_v1beta1.JobServiceClient(client_options=client_options)

    request = aiplatform_v1beta1.CreateCustomJobRequest(
        parent=f"projects/{project}/locations/{location}",
        custom_job=aiplatform_v1beta1.CustomJob(
            display_name=display_name,
            job_spec=aiplatform_v1beta1.CustomJobSpec(
                worker_pool_specs=[
                    aiplatform_v1beta1.WorkerPoolSpec(
                        machine_spec=aiplatform_v1beta1.MachineSpec(
                            machine_type=machine_type,
                        ),
                        replica_count=replica_count,
                        container_spec=aiplatform_v1beta1.ContainerSpec(
                            image_uri=image_uri,
                        ),
                    )
                ],
                psc_interface_config=aiplatform_v1beta1.PscInterfaceConfig(
                    network_attachment=network_attachment,
                ),
            ),
        ),
    )

    response = client.create_custom_job(request=request)

    return response

project: Your project ID. You can find these IDs in the Google Cloud console welcome page.
location: See list of available locations.
bucket: Replace bucket with the name of a bucket you have access to.
display_name: The display name of the persistent resource.
machine_type: Specify the compute resources.
replica_count: The number of worker replicas to use for each trial.
service_attachment: The name of the service attachment resource. Populated if Private Service Connect is enabled.
image_uri: The URI of a Docker container image with your training code. Learn how to create a custom container image.
network_attachment: The name or full path of the network attachment you created when setting up your resources for Private Service Connect.

REST

To create a custom training job, send a POST request by using the customJobs.create method.

Before using any of the request data, make the following replacements:

LOCATION: The region where the container or Python package will be run.
PROJECT_ID: Your project ID.
JOB_NAME: A display name for the CustomJob.
REPLICA_COUNT: The number of worker replicas to use. In most cases, set this to 1 for your first worker pool.
If your training application runs in a custom container, specify the following:
- IMAGE_URI: the URI of a Docker container image with your training code. Learn how to create a custom container image.
- NETWORK_ATTACHMENT: The name or full path of the network attachment you created when you set up the Private Service Connect interface.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/customJobs

Request JSON body:

"display_name": JOB_NAME,
"job_spec": {
    "worker_pool_specs": [
      {
        "machine_spec": {
          "machine_type": "n1-standard-4",
        },
        "replica_count": REPLICA_COUNT,
        "container_spec": {
          "image_uri": IMAGE_URI,
        },
      },
    ],
    "psc_interface_config": {
      "network_attachment": NETWORK_ATTACHMENT
    },
    "enable_web_access": 1
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/customJobs"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/customJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

Response

{
  "name": "projects/PROJECT_ID/locations/LOCATION/customJobs/JOB_ID",
  "displayName": "JOB_NAME",
  "jobSpec": {
    "workerPoolSpecs": [
      {
        "machineSpec": {
          "machineType": "MACHINE_TYPE"
        },
        "replicaCount": "REPLICA_COUNT",
        "diskSpec": DISK_SPEC,
        "containerSpec": {
          "imageUri": "IMAGE_URI"
        }
      }
    ],
    "enableWebAccess": True,
    "pscInterfaceConfig": {
      "networkAttachment": "NETWORK_ATTACHMENT"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "CREATE_TIME",
  "updateTime": "UPDATE_TIME"
}