Configure your Google Cloud project for Vertex AI Pipelines

Before you use Vertex AI Pipelines to orchestrate your machine learning (ML) pipelines, you must set up your Google Cloud project. Some resources, such as the metadata store used by Vertex ML Metadata, are created in your Google Cloud project the first time that you run a pipeline.

Use the following instructions to configure your project for Vertex AI Pipelines.

  1. Create your Google Cloud project and configure it for use with Vertex AI Pipelines.

  2. If you do not specify a service account, Vertex AI Pipelines uses the Compute Engine default service account to run your pipelines. For more information about the Compute Engine default service account, see Using the Compute Engine Default Service Account.

    We recommend that you create a service account to run your pipelines and then grant this account granular permissions to the Google Cloud resources that are needed to run your pipeline.

  3. Vertex AI Pipelines uses Cloud Storage to store the artifacts of your pipeline runs. Create a Cloud Storage bucket and grant your service account access to this bucket.

  4. Vertex AI Pipelines uses Vertex ML Metadata to store the metadata created by your pipeline runs. When you run a pipeline for the first time, if the metadata store of your project doesn't exist, Vertex AI creates your project's metadata store.

    If you want your data encrypted using a customer-managed encryption key (CMEK), you can manually create your metadata store using a CMEK key before you run a pipeline. Otherwise, if there's no existing default metadata store in your project, Vertex AI creates your project's metadata store using the CMEK key used when you run the pipeline for the first time. After the metadata store is created, it uses a CMEK key that's different from the CMEK key used in a pipeline run.

Set up your Google Cloud project

Use the following instructions to create a Google Cloud project and configure it for use with Vertex AI Pipelines.

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Vertex AI and Cloud Storage APIs.

    Enable the APIs

  5. Install the Google Cloud CLI.
  6. To initialize the gcloud CLI, run the following command:

    gcloud init
  7. Update and install gcloud components:
    gcloud components update
    gcloud components install beta
  8. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  9. Make sure that billing is enabled for your Google Cloud project.

  10. Enable the Vertex AI and Cloud Storage APIs.

    Enable the APIs

  11. Install the Google Cloud CLI.
  12. To initialize the gcloud CLI, run the following command:

    gcloud init
  13. Update and install gcloud components:
    gcloud components update
    gcloud components install beta

Configure a service account with granular permissions

When you run a pipeline, you can specify a service account. Your pipeline run acts with the permissions of this service account.

If you do not specify a service account, your pipeline run uses the Compute Engine default service account. For more information about the Compute Engine default service account, see Using the Compute Engine Default Service Account.

  • Use the following instructions to create a service account and grant it granular permissions to Google Cloud resources.

    1. Run the following command to create a service account.

      gcloud iam service-accounts create SERVICE_ACCOUNT_ID \
          --description="DESCRIPTION" \
          --display-name="DISPLAY_NAME" \
          --project=PROJECT_ID
      

      Replace the following values:

      • SERVICE_ACCOUNT_ID: The ID for the service account.
      • DESCRIPTION: (Optional.) A description of the service account.
      • DISPLAY_NAME: The display name for this service account.
      • PROJECT_ID: The project to create your service account in.

      Learn more about creating a service account.

    2. Grant your service account access to Vertex AI. Note that it might take some time for the access change to propagate. For more information, see Access change propagation.

      gcloud projects add-iam-policy-binding PROJECT_ID \
          --member="serviceAccount:SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com" \
          --role="roles/aiplatform.user"
      

      Replace the following values:

      • PROJECT_ID: The project that your service account was created in.
      • SERVICE_ACCOUNT_ID: The ID for the service account.
    3. You can use Artifact Registry to host container images and Kubeflow Pipelines templates.

      For more information about Artifact Registry, see the Artifact Registry documentation.

    4. Grant your service account access to any Google Cloud resources that you use in your pipelines.

      gcloud projects add-iam-policy-binding PROJECT_ID \
          --member="serviceAccount:SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com" \
          --role="ROLE_NAME"
      

      Replace the following values:

      • PROJECT_ID: The project that your service account was created in.
      • SERVICE_ACCOUNT_ID: The ID for the service account.
      • ROLE_NAME: The Identity and Access Management role to grant to this service account.
    5. To use Vertex AI Pipelines to run pipelines with this service account, run the following command to grant your user account the roles/iam.serviceAccountUser role for your service account.

      gcloud iam service-accounts add-iam-policy-binding \
          SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com \
          --member="user:USER_EMAIL" \
          --role="roles/iam.serviceAccountUser"
      

      Replace the following values:

      • SERVICE_ACCOUNT_ID: The ID for the service account.
      • PROJECT_ID: The project that your service account was created in.
      • USER_EMAIL: The email address of the user that runs pipelines as this service account.
  • If you prefer to use the Compute Engine default service account to run your pipelines, enable the Compute Engine API and grant your default service account access to Vertex AI. Note that it might take some time for the access change to propagate. For more information, see Access change propagation.

    gcloud projects add-iam-policy-binding PROJECT_ID \
        --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
        --role="roles/aiplatform.user"
    

    Replace the following values:

    • PROJECT_ID: The project that your default service account was created in.
    • PROJECT_NUMBER: The project number that your default service account was created in.

    Enable the API

Configure a Cloud Storage bucket for pipeline artifacts

Vertex AI Pipelines stores the artifacts of your pipeline runs using Cloud Storage. Use the following instructions to create a Cloud Storage bucket and grant your service account (or the Compute Engine default service account) access to read and write objects in that bucket.

  1. Run the following command to create a Cloud Storage bucket in the region that you want to run your pipelines in.

    gsutil mb -p PROJECT_ID -l BUCKET_LOCATION gs://BUCKET_NAME
    

    Replace the following values:

    • PROJECT_ID: Specify the project that your bucket is associated with.
    • BUCKET_LOCATION: Specify the location of your bucket — for example, US-CENTRAL1.
    • BUCKET_NAME: The name you want to give your bucket, subject to naming requirements. For example, my-bucket.

    Learn more about creating Cloud Storage buckets.

  2. Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step.

    gsutil iam ch \
    serviceAccount:SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com:roles/storage.objectCreator,objectViewer \
    gs://BUCKET_NAME
    

    Replace the following values:

    • SERVICE_ACCOUNT_ID: The ID for the service account.
    • PROJECT_ID: The project that your service account was created in.
    • BUCKET_NAME: The name of the bucket you are granting your service account access to.

    Alternatively, if you prefer to use the Compute Engine default service account to run your pipelines, run the gcloud iam service-accounts list command to locate the project number for that account.

    gcloud iam service-accounts list
    

    The Compute Engine default service account is named like the following: PROJECT_NUMBER-compute@developer.gserviceaccount.com.

    Run the following command to grant the Compute Engine default service account access to read and write pipeline artifacts in the bucket that you created in the previous step.

    gsutil iam ch \
    serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com:roles/storage.objectCreator,objectViewer \
    gs://BUCKET_NAME
    

    Replace the following values:

    • PROJECT_NUMBER: The project number for the Compute Engine default service account.
    • BUCKET_NAME: The name of the bucket you are granting your service account access to.

    Learn more about controlling access to Cloud Storage buckets.

Create a metadata store that uses a CMEK (optional)

Use the following instructions to create a CMEK and set up a Vertex ML Metadata metadata store that uses this CMEK.

  1. Use Cloud Key Management Service to configure a customer-managed encryption key.

  2. Use the following REST call to create your project's default metadata store using your CMEK.

    Before using any of the request data, make the following replacements:

    • LOCATION_ID: Your region.
    • PROJECT_ID: Your project ID.
    • KEY_RING: The name of the Cloud Key Management Service key ring that your encryption key is on.
    • KEY_NAME: The name of the encryption key that you want to use for this metadata store.

    HTTP method and URL:

    POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/metadataStores?metadata_store_id=default

    Request JSON body:

    {
      "encryption_spec": {
        "kms_key_name": "projects/PROJECT_ID/locations/LOCATION_ID/keyRings/KEY_RING/cryptoKeys/KEY_NAME"
      },
    }
    

    To send your request, expand one of these options:

    You should receive a JSON response similar to the following:

    {
      "name": "projects/PROJECT_ID/locations/LOCATION_ID/operations/OPERATIONS_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateMetadataStoreOperationMetadata",
        "genericMetadata": {
          "createTime": "2021-05-18T18:47:14.494997Z",
          "updateTime": "2021-05-18T18:47:14.494997Z"
        }
      }
    }