Vertex AI Pipelines lets you run machine learning (ML) pipelines that were built using the Kubeflow Pipelines SDK or TensorFlow Extended in a serverless manner. This document describes how to run an ML pipeline and how to schedule a recurring pipeline run.
If you have not yet built an ML pipeline, refer to Build a pipeline.
Before you begin
Before you run a pipeline with Vertex AI Pipelines, use the following instructions to set up your Google Cloud project and development environment.
To get your Cloud project ready to run ML pipelines, follow the instructions in the guide to configuring your Cloud project.
To author a pipeline using Python, you must use one of the following SDKs.
To run a pipeline using the Vertex AI SDK for Python, install the Vertex SDK.
- Install the Vertex AI SDK.
Create a pipeline run
Use the following instructions to run an ML pipeline using Google Cloud console or Python.
Use the following instructions to run an ML pipeline using Cloud console.
In the Cloud console, in the Vertex AI section, go to the Pipelines page.
In the Region drop-down list, select the region that you want to create a pipeline run in.
ClickCreate run to open the Create pipeline run pane.
Specify the following Run details.
In the File field, click Choose to open the file selector. Navigate to the compiled pipeline JSON file that you want to run, select the pipeline, and click Open.
The Pipeline name defaults to the name that you specified in the pipeline definition. Optionally, specify a different Pipeline name.
Specify a Run name to uniquely identify this pipeline run.
To specify that this pipeline run uses a custom service account, a customer-managed encryption key, or a peered VPC network, click Advanced options.
Use the following instructions to configure advanced options such as a custom service account.
To specify a service account, select a service account from the Service account drop-down list.
If you do not specify a service account, Vertex AI Pipelines runs your pipeline using the default Compute Engine service account.
Learn more about configuring a service account for use with Vertex AI Pipelines.
To use a customer-managed encryption key (CMEK), select Use a customer-managed encryption key. The Select a customer-managed key drop-down list appears. In the Select a customer-managed key drop-down list, select the key that you want to use.
To use a peered VPC network in this pipeline run, enter the VPC network name in the Peered VPC network box.
The pipeline run parameters pane appears.
If your pipeline has parameters, specify your pipeline run parameters.
Click Submit to create your pipeline run.
Vertex AI SDK for Python
Use the following instructions to run an ML pipeline using the Vertex AI SDK for Python. Before you run the following code sample, you must set up authentication.
How to set up authentication
To set up authentication, you need to create a service account key and set an environment variable for the file path to the service account key.
Create a service account:
In the Cloud console, go to the Create service account page.
- In the Service account name field, enter a name.
- Optional: In the Service account description field, enter a description.
- Click Create.
- Click the Select a role field. Under All roles, select Vertex AI > Vertex AI User.
Click Done to create the service account.
Do not close your browser window. You will use it in the next step.
Create a service account key for authentication:
- In the Cloud console, click the email address for the service account that you created.
- Click Keys.
- Click Add key, then Create new key.
- Click Create. A JSON key file is downloaded to your computer.
- Click Close.
- Click to return to the list of service accounts.
Click the name of the service account that you use to run pipelines. The Service account details page appears.
If you followed the instructions in the guide to configuring your project for Vertex AI Pipelines, this is the same service account that you created in the Configure a service account with granular permissions section. Otherwise, Vertex AI uses the Compute Engine default service account to run pipelines. The Compute Engine default service account is named like the following:
- Click the Permissions tab.
- Click Grant access. The Add principals panel appears.
- In the New principals box, enter the email address for the service account you created in a previous step.
- In the Role drop-down list, select Service accounts > Service account user.
- Click Save
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.
Example: Linux or macOS
Replace [PATH] with the file path of the JSON file that contains your service account key.
Replace [PATH] with the file path of the JSON file that contains your service account key, and [FILE_NAME] with the filename.
With command prompt:
Running a Vertex AI
PipelineJob requires you to create a
PipelineJob object, and then invoke the
from google.cloud import aiplatform job = aiplatform.PipelineJob(display_name = DISPLAY_NAME, template_path = COMPILED_PIPELINE_PATH, job_id = JOB_ID, pipeline_root = PIPELINE_ROOT_PATH, parameter_values = PIPELINE_PARAMETERS, enable_caching = ENABLE_CACHING, encryption_spec_key_name = CMEK, labels = LABELS, credentials = CREDENTIALS, project = PROJECT_ID, location = LOCATION) job.submit(service_account=SERVICE_ACCOUNT, network=NETWORK)
Replace the following:
- DISPLAY_NAME: The name of the pipeline, this will show up in the Google Cloud console.
- COMPILED_PIPELINE_PATH: The path to your compiled pipeline JSON file. It can be a local path or a Google Cloud Storage URI.
- JOB_ID: (optional) A unique identifier for this pipeline run. If the job ID is not specified, Vertex AI Pipelines creates a job ID for you using the pipeline name and the timestamp of when the pipeline run was started.
- PIPELINE_ROOT_PATH: (optional) To override the pipeline root path specified in the pipeline definition, specify a path that your pipeline job can access, such as a Cloud Storage bucket URI.
- PIPELINE_PARAMETERS: (optional) The pipeline parameters
to pass to this run. For example, create a
dict()with the parameter names as the dictionary keys and the parameter values as the dictionary values.
- ENABLE_CACHING: (optional) Specifies if this pipeline run uses execution caching. Execution caching reduces costs by skipping pipeline steps where the output is known for the current set of inputs. If the enable caching argument is not specified, execution caching is used in this pipeline run. Learn more about execution caching.
- CMEK: (optional) The name of the customer-managed encryption key that you want to use for this pipeline run.
- LABELS: (optional) The user defined labels to organize
- CREDENTIALS: (optional) Custom credentials to use to
PipelineJob. Overrides credentials set in
- PROJECT_ID: The project that you want to run the pipeline in.
- LOCATION: The region that you want to run the pipeline in.
For more information about the regions that
Vertex AI Pipelines is available in, see the
Vertex AI locations guide. If this variable
is not set, the default location set in
- SERVICE_ACCOUNT: (optional) The name of the service account to use for this pipeline run. If you do not specify a service account, Vertex AI Pipelines runs your pipeline using the default Compute Engine service account.
- NETWORK: (optional) :The name of the VPC peered network to use for this pipeline run.