Run a pipeline

Stay organized with collections Save and categorize content based on your preferences.

Vertex AI Pipelines lets you run machine learning (ML) pipelines that were built using the Kubeflow Pipelines SDK or TensorFlow Extended in a serverless manner. This document describes how to run an ML pipeline and how to schedule a recurring pipeline run.

If you have not yet built an ML pipeline, refer to Build a pipeline.

Before you begin

Before you run a pipeline with Vertex AI Pipelines, use the following instructions to set up your Google Cloud project and development environment.

  1. To get your Cloud project ready to run ML pipelines, follow the instructions in the guide to configuring your Cloud project.

  2. To author a pipeline using Python, you must use one of the following SDKs.

  3. To run a pipeline using the Vertex AI SDK for Python, install the Vertex SDK.

Create a pipeline run

Use the following instructions to run an ML pipeline using Google Cloud console or Python.

Console

Use the following instructions to run an ML pipeline using Google Cloud console.

  1. In the Google Cloud console, in the Vertex AI section, go to the Pipelines page.

    Go to Pipelines

  2. In the Region drop-down list, select the region that you want to create a pipeline run in.

  3. Click Create run to open the Create pipeline run pane.

  4. Specify the following Run details.

    • In the File field, click Choose to open the file selector. Navigate to the compiled pipeline JSON file that you want to run, select the pipeline, and click Open.

    • The Pipeline name defaults to the name that you specified in the pipeline definition. Optionally, specify a different Pipeline name.

    • Specify a Run name to uniquely identify this pipeline run.

  5. To specify that this pipeline run uses a custom service account, a customer-managed encryption key, or a peered VPC network, click Advanced options.

    Use the following instructions to configure advanced options such as a custom service account.

    • To specify a service account, select a service account from the Service account drop-down list.

      If you do not specify a service account, Vertex AI Pipelines runs your pipeline using the default Compute Engine service account.

      Learn more about configuring a service account for use with Vertex AI Pipelines.

    • To use a customer-managed encryption key (CMEK), select Use a customer-managed encryption key. The Select a customer-managed key drop-down list appears. In the Select a customer-managed key drop-down list, select the key that you want to use.

    • To use a peered VPC network in this pipeline run, enter the VPC network name in the Peered VPC network box.

  6. Click Continue.

    The pipeline run parameters pane appears.

  7. If your pipeline has parameters, specify your pipeline run parameters.

  8. Click Submit to create your pipeline run.

Vertex AI SDK for Python

Use the following instructions to run an ML pipeline using the Vertex AI SDK for Python. Before you run the following code sample, you must set up authentication.

How to set up authentication

To set up authentication, you need to create a service account key and set an environment variable for the file path to the service account key.

  1. Create a service account:

    1. In the Google Cloud console, go to the Create service account page.

      Go to Create service account

    2. In the Service account name field, enter a name.
    3. Optional: In the Service account description field, enter a description.
    4. Click Create.
    5. Click the Select a role field. Under All roles, select Vertex AI > Vertex AI User.
    6. Click Done to create the service account.

      Do not close your browser window. You will use it in the next step.

  2. Create a service account key for authentication:

    1. In the Google Cloud console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, then Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  3. Grant your new service account access to the service account that you use to run pipelines.
    1. Click to return to the list of service accounts.
    2. Click the name of the service account that you use to run pipelines. The Service account details page appears.

      If you followed the instructions in the guide to configuring your project for Vertex AI Pipelines, this is the same service account that you created in the Configure a service account with granular permissions section. Otherwise, Vertex AI uses the Compute Engine default service account to run pipelines. The Compute Engine default service account is named like the following: PROJECT_NUMBER-compute@developer.gserviceaccount.com

    3. Click the Permissions tab.
    4. Click Grant access. The Add principals panel appears.
    5. In the New principals box, enter the email address for the service account you created in a previous step.
    6. In the Role drop-down list, select Service accounts > Service account user.
    7. Click Save
  4. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the file path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

    Example: Linux or macOS

    Replace [PATH] with the file path of the JSON file that contains your service account key.

    export GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

    For example:

    export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

    Example: Windows

    Replace [PATH] with the file path of the JSON file that contains your service account key, and [FILE_NAME] with the filename.

    With PowerShell:

    $env:GOOGLE_APPLICATION_CREDENTIALS="[PATH]"

    For example:

    $env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\[FILE_NAME].json"

    With command prompt:

    set GOOGLE_APPLICATION_CREDENTIALS=[PATH]

Running a Vertex AI PipelineJob requires you to create a PipelineJob object, and then invoke the submit method.

from google.cloud import aiplatform

job = aiplatform.PipelineJob(display_name = DISPLAY_NAME,
                             template_path = COMPILED_PIPELINE_PATH,
                             job_id = JOB_ID,
                             pipeline_root = PIPELINE_ROOT_PATH,
                             parameter_values = PIPELINE_PARAMETERS,
                             enable_caching = ENABLE_CACHING,
                             encryption_spec_key_name = CMEK,
                             labels = LABELS,
                             credentials = CREDENTIALS,
                             project = PROJECT_ID,
                             location = LOCATION,
                             failure_policy = FAILURE_POLICY))

job.submit(service_account=SERVICE_ACCOUNT,
           network=NETWORK)

Replace the following:

  • DISPLAY_NAME: The name of the pipeline, this will show up in the Google Cloud console.
  • COMPILED_PIPELINE_PATH: The path to your compiled pipeline JSON file. It can be a local path or a Google Cloud Storage URI.
  • JOB_ID: (optional) A unique identifier for this pipeline run. If the job ID is not specified, Vertex AI Pipelines creates a job ID for you using the pipeline name and the timestamp of when the pipeline run was started.
  • PIPELINE_ROOT_PATH: (optional) To override the pipeline root path specified in the pipeline definition, specify a path that your pipeline job can access, such as a Cloud Storage bucket URI.
  • PIPELINE_PARAMETERS: (optional) The pipeline parameters to pass to this run. For example, create a dict() with the parameter names as the dictionary keys and the parameter values as the dictionary values.
  • ENABLE_CACHING: (optional) Specifies if this pipeline run uses execution caching. Execution caching reduces costs by skipping pipeline tasks where the output is known for the current set of inputs. If the enable caching argument is not specified, execution caching is used in this pipeline run. Learn more about execution caching.
  • CMEK: (optional) The name of the customer-managed encryption key that you want to use for this pipeline run.
  • LABELS: (optional) The user defined labels to organize this PipelineJob.
  • CREDENTIALS: (optional) Custom credentials to use to create this PipelineJob. Overrides credentials set in aiplatform.init.
  • PROJECT_ID: (optional) The Google Cloud project that you want to run the pipeline in. If you don't set this parameter, the project set in aiplatform.init is used.
  • LOCATION: (optional) The region that you want to run the pipeline in. For more information about the regions that Vertex AI Pipelines is available in, see the Vertex AI locations guide. If you don't set this parameter, the default location set in aiplatform.init is used.
  • FAILURE_POLICY: (optional) Specify the failure policy for the entire pipeline. The following configurations are available:

    • To configure the pipeline to fail after one task fails, enter fast.

    • To configure the pipeline to continue scheduling tasks after one task fails, enter slow.

    If you don't set this parameter, the failure policy configuration is set to slow, by default. Learn more about pipeline failure policies.

  • SERVICE_ACCOUNT: (optional) The name of the service account to use for this pipeline run. If you do not specify a service account, Vertex AI Pipelines runs your pipeline using the default Compute Engine service account.

  • NETWORK: (optional) :The name of the VPC peered network to use for this pipeline run.