Run a pipeline

Vertex AI Pipelines lets you run machine learning (ML) pipelines that were built using the Kubeflow Pipelines SDK or TensorFlow Extended in a serverless manner. This document describes how to run an ML pipeline.

You can also create pipeline runs using prebuilt templates in the Template Gallery. For more information about the Template Gallery, see Use a prebuilt template from the Template Gallery.

Before you begin

Before you run a pipeline with Vertex AI Pipelines, use the following instructions to set up your Google Cloud project and development environment:

Build a pipeline.
To run a pipeline using the Vertex AI SDK for Python, install the Vertex SDK.
- Install the Vertex AI SDK.

Create a pipeline run

Use the following instructions to run an ML pipeline using Google Cloud console or Python.

Console

Use the following instructions to run an ML pipeline using Google Cloud console.

In the Google Cloud console, in the Vertex AI section, go to the Pipelines page.

Go to Pipelines
In the Region drop-down list, select the region to create the pipeline run.
Click Create run to open the Create pipeline run pane.
In the Run details section, do the following:
1. Click a Run source. The following options are available:
  - Select from existing pipelines: To create a pipeline run based on an existing pipeline template, click Select from existing pipelines and enter the following details:
    1. Select the Repository containing the pipeline or component definition file.
    2. Select the Pipeline or component and Version.
    3. Specify a Run name to uniquely identify the pipeline run.
  - Select a Template Gallery pipeline: To create a pipeline run based on a Google-authored pipeline template from the Template Gallery, click Select a Template Gallery pipeline and enter the following details:
    1. In the Template Gallery pipeline list, select the pipeline template.
    2. Optional: Modify the default Run name that uniquely identifies the pipeline run.
    Note: These instructions describe how to create a pipeline run using the default interface of the Create pipeline run page, which includes the Run details and the Runtime configuration sections. For some templates from the Template gallery, this page has additional sections. For example, the AutoML for Tabular Classification / Regression template also includes the Training Method, Training options, and Compute and pricing sections.
  - Upload file: To upload a compiled pipeline definition, click Upload file and enter the following details:
    1. Click Browse to open the file selector. Navigate to the compiled pipeline YAML file that you want to run, select the pipeline, and click Open.
    2. The Pipeline or component name shows the name specified in the pipeline definition, by default. Optionally, specify a different Pipeline name.
    3. Specify a Run name to uniquely identify the pipeline run.
  - Import from Cloud Storage: To import a pipeline definition file from Cloud Storage, click Import from Cloud Storage and enter the following details:
    1. Click Browse to navigate to the Cloud Storage bucket containing the pipeline definition object, select the file, and then click Select.
    2. Specify the Pipeline or component name.
    3. Specify a Run name to uniquely identify the pipeline run.
2. Optional: To schedule recurring pipeline runs, specify the Run schedule, as follows:
  1. Select Recurring.
  2. Under Start time, specify when the schedule becomes active.
    - To schedule the first run to occur immediately after schedule creation, select Immediately.
    - To schedule the first run to occur at a specific time and date, select On.
  3. In the Frequency field, specify the frequency to schedule and execute the pipeline runs, using a cron schedule expression based on unix-cron.
  4. Under Ends, specify when the schedule ends.
    - To indicate that the schedule creates pipeline runs indefinitely, select Never.
    - To indicate that the schedule ends on a specific date and time, select On, and specify the end date and time for the schedule.
  5. Optional: To specify that the pipeline run uses a custom service account, a customer-managed encryption key (CMEK), or a peered VPC network, click Advanced options, and then follow these instructions:
    - To specify a service account, select a service account from the Service account drop-down list.
      
      If you don't specify a service account, Vertex AI Pipelines runs your pipeline using the default Compute Engine service account.
      
      Learn more about configuring a service account for use with Vertex AI Pipelines.
    - To use a CMEK, select Use a customer-managed encryption key. The Select a customer-managed key drop-down list appears. In the Select a customer-managed key drop-down list, select the key that you want to use.
    - To use a peered VPC network in this pipeline run, enter the VPC network name in the Peered VPC network box.
3. Click Continue.
In the Runtime configuration section, configure the pipeline run, as follows:
1. Under Cloud storage location, click Browse to select the Cloud Storage bucket for storing the pipeline output artifacts, and then click Select.
2. Optional: To configure the failure policy and the cache for the pipeline run, click Advanced options, and then use the following instructions:
  - Under Failure policy, specify the failure policy for the entire pipeline. Learn more about pipeline failure policies.
    - To configure the pipeline to continue scheduling tasks after one task fails, select Run all steps to completion. This option is selected, by default.
    - To configure the pipeline to fail after one task fails, select Fail this run as soon as one step fails.
  - Under Caching configuration, specify the cache configuration for the entire pipeline.
    - To use the task-level cache configuration for task in the pipeline, select Do not override task-level cache configuration.
    - To turn on caching for all the tasks in the pipeline and override any task-level cache configuration, select Enable read from cache for all steps (fastest).
    - To turn off caching for all the tasks in the pipeline and override any task-level cache configuration, select Disable read from cache for all steps (fastest).
3. Optional: If your pipeline has parameters, under Pipeline parameters, specify your pipeline run parameters.
To create your pipeline run, click Submit.

Vertex AI SDK for Python

Use the following instructions to run an ML pipeline using the Vertex AI SDK for Python. Before you run the following code sample, you must set up authentication.

Set up authentication

To set up authentication, you must create a service account key, and set an environment variable for the path to the service account key.

Create a service account:
1. In the Google Cloud console, go to the Create service account page.
  
  Go to Create service account
2. In the Service account name field, enter a name.
3. Optional: In the Service account description field, enter a description.
4. Click Create.
5. Click the Select a role field. Under All roles, select Vertex AI > Vertex AI User.
6. Click Done to create the service account.
  
  Do not close your browser window. You will use it in the next step.
Create a service account key for authentication:
1. In the Google Cloud console, click the email address for the service account that you created.
2. Click Keys.
3. Click Add key, then Create new key.
4. Click Create. A JSON key file is downloaded to your computer.
5. Click Close.
Grant your new service account access to the service account that you use to run pipelines.
1. Click to return to the list of service accounts.
2. Click the name of the service account that you use to run pipelines. The Service account details page appears.
  
  If you followed the instructions in the guide to configuring your project for Vertex AI Pipelines, this is the same service account that you created in the Configure a service account with granular permissions section. Otherwise, Vertex AI uses the Compute Engine default service account to run pipelines. The Compute Engine default service account is named like the following: PROJECT_NUMBER-compute@developer.gserviceaccount.com
3. Click the Permissions tab.
4. Click Grant access. The Add principals panel appears.
5. In the New principals box, enter the email address for the service account you created in a previous step.
6. In the Role drop-down list, select Service accounts > Service account user.
7. Click Save
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

Example: Linux or macOS

Replace [PATH] with the path of the JSON file that contains your service account key.
```
export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
```
For example:
```
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
```
Example: Windows

Replace [PATH] with the path of the JSON file that contains your service account key, and [FILE_NAME] with the filename.

With PowerShell:
```
$env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"
```
For example:
```
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"
```
With command prompt:
```
set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
```

Run a pipeline

Running a Vertex AI PipelineJob requires you to create a PipelineJob object, and then invoke the submit method.

Special input types supported by KFP

While creating a pipeline run, you can also pass the following placeholders supported by the KFP SDK as inputs:

{{$.pipeline_job_name_placeholder}}
{{$.pipeline_job_resource_name_placeholder}}
{{$.pipeline_job_id_placeholder}}
{{$.pipeline_task_name_placeholder}}
{{$.pipeline_task_id_placeholder}}
{{$.pipeline_job_create_time_utc_placeholder}}
{{$.pipeline_root_placeholder}}

For more information, see Special input types in the Kubeflow Pipelines v2 documentation.

from google.cloud import aiplatform

job = aiplatform.PipelineJob(display_name = DISPLAY_NAME,
                             template_path = COMPILED_PIPELINE_PATH,
                             job_id = JOB_ID,
                             pipeline_root = PIPELINE_ROOT_PATH,
                             parameter_values = PIPELINE_PARAMETERS,
                             enable_caching = ENABLE_CACHING,
                             encryption_spec_key_name = CMEK,
                             labels = LABELS,
                             credentials = CREDENTIALS,
                             project = PROJECT_ID,
                             location = LOCATION,
                             failure_policy = FAILURE_POLICY)

job.submit(service_account=SERVICE_ACCOUNT,
           network=NETWORK)

Replace the following:

DISPLAY_NAME: The name of the pipeline, this will show up in the Google Cloud console.
COMPILED_PIPELINE_PATH: The path to your compiled pipeline YAML file. It can be a local path or a Cloud Storage URI.

Optional: To specify a particular version of a compiled pipeline, include the version tag in any one of the following formats:
- COMPILED_PIPELINE_PATH:TAG, where TAG is the version tag.
- COMPILED_PIPELINE_PATH@SHA256_TAG, where SHA256_TAG is the sha256 hash value of the pipeline version.
JOB_ID: (optional) A unique identifier for this pipeline run. If the job ID is not specified, Vertex AI Pipelines creates a job ID for you using the pipeline name and the timestamp of when the pipeline run was started.
PIPELINE_ROOT_PATH: (optional) To override the pipeline root path specified in the pipeline definition, specify a path that your pipeline job can access, such as a Cloud Storage bucket URI.
PIPELINE_PARAMETERS: (optional) The pipeline parameters to pass to this run. For example, create a dict() with the parameter names as the dictionary keys and the parameter values as the dictionary values.
ENABLE_CACHING: (optional) Specifies if this pipeline run uses execution caching. Execution caching reduces costs by skipping pipeline tasks where the output is known for the current set of inputs. If the enable caching argument is not specified, execution caching is used in this pipeline run. Learn more about execution caching.
CMEK: (optional) The name of the customer-managed encryption key that you want to use for this pipeline run.
LABELS: (optional) The user defined labels to organize this PipelineJob. For more information about resource labels, see Creating and managing labels in the Resource Manager documentation.

Vertex AI Pipelines automatically attaches the following label to your pipeline run:

vertex-ai-pipelines-run-billing-id: pipeline_run_id

where pipeline_run_id is the unique ID of the pipeline run.

This label connects the usage of Google Cloud resources generated by the pipeline run in billing reports.
CREDENTIALS: (optional) Custom credentials to use to create this PipelineJob. Overrides credentials set in aiplatform.init.
PROJECT_ID: (optional) The Google Cloud project that you want to run the pipeline in. If you don't set this parameter, the project set in aiplatform.init is used.
LOCATION: (optional) The region that you want to run the pipeline in. For more information about the regions that Vertex AI Pipelines is available in, see the Vertex AI locations guide. If you don't set this parameter, the default location set in aiplatform.init is used.
FAILURE_POLICY: (optional) Specify the failure policy for the entire pipeline. The following configurations are available:
- To configure the pipeline to fail after one task fails, enter fast.
- To configure the pipeline to continue scheduling tasks after one task fails, enter slow.
If you don't set this parameter, the failure policy configuration is set to slow, by default. Learn more about pipeline failure policies.
SERVICE_ACCOUNT: (optional) The name of the service account to use for this pipeline run. If you don't specify a service account, Vertex AI Pipelines runs your pipeline using the default Compute Engine service account.
NETWORK: (optional) :The name of the VPC peered network to use for this pipeline run.