Configuring a Google Kubernetes Engine cluster for AI Platform Pipelines

AI Platform Pipelines makes it easier to get started with Kubeflow Pipelines with TensorFlow Extended on Google Kubernetes Engine by saving you the difficulty of:

  • Creating a GKE cluster
  • Deploying Kubeflow Pipelines to your GKE cluster
  • Creating a Cloud Storage bucket to use to store pipeline artifacts

If you prefer, you can use AI Platform Pipelines to deploy Kubeflow Pipelines on an existing cluster that does not already have Kubeflow Pipelines installed. Use this guide to ensure that your cluster is configured correctly to deploy and run Kubeflow Pipelines.

Ensure that your GKE cluster has enough resources for AI Platform Pipelines

To use Google Cloud Marketplace to deploy Kubeflow Pipelines on a GKE cluster, the following must be true:

  • Your cluster must have at least 3 nodes. Each node must have at least 2 CPUs and 4 GB of memory available.
  • The cluster's access scope must grant full access to all Cloud APIs, or your cluster must use a custom service account.
  • The cluster must not already have Kubeflow Pipelines installed.

Use the following instructions to check if your cluster has sufficient resources to install AI Platform Pipelines.

  1. Open AI Platform Pipelines in the Google Cloud console.

    Go to AI Platform Pipelines

  2. In the AI Platform Pipelines toolbar, click New instance. Kubeflow Pipelines opens in Google Cloud Marketplace.

  3. Click Configure. The Deploy Kubeflow Pipelines form opens.

  4. Click Cluster to expand the list. GKE clusters that do not have enough resources or permissions are listed as Ineligible clusters. Each ineligible cluster includes a description of why Kubeflow Pipelines cannot be installed, such as:

Allocate more resources to your GKE cluster

To install Kubeflow Pipelines from Google Cloud Marketplace to an existing GKE cluster, your cluster must have at least 3 nodes with 2 CPU and 4 GB available.

Use the following instructions to replace the node pool in your cluster with one that has enough CPU and memory resources for AI Platform Pipelines.

  1. Open Google Kubernetes Engine clusters in the Google Cloud console.

    Go to GKE clusters

  2. Click your cluster name. The cluster's details appear.

  3. In the GKE toolbar, click Add node pool. The Add a new node pool form opens.

  4. Supply the following information to the Add a new node pool form.

    • Number of nodes: Specify the number of nodes in your node pool. Your cluster must have 3 or more nodes to install Kubeflow Pipelines using Google Cloud Marketplace.
    • Machine type: Specify the Compute Engine machine type to use for instances in the node pool. Select a machine type with at least 2 CPUs and 4 GB of memory, such as n1-standard-2.

    • Access scopes: Click Allow full access to all Cloud APIs in Access scopes.

    Otherwise, configure your node pool as desired. Learn more about adding node pools to a cluster.

  5. Click Create node pool. Creating the node pool takes several minutes to complete.

  6. For each node pool in the Node pools section, except for the node pool you created in the previous step, click delete. The Delete a node pool dialog appears to confirm that you want to delete this node pool.

  7. Click Delete. Deleting the node pool takes several minutes.

  8. Once you have deleted the old node pools, check that your cluster has sufficient resources and access to install Kubeflow Pipelines from Google Cloud Marketplace.

Grant your GKE cluster access to Google Cloud resources and APIs

There are three ways to grant your ML pipelines access to Google Cloud resources and APIs:

When deploying AI Platform Pipelines, you must grant your GKE cluster full access to Google Cloud resources and APIs or grant your cluster access to Google Cloud using a service account.

Configuring your GKE cluster with full access to Google Cloud APIs

To make it easier for your ML pipelines and other GKE cluster workloads to access your project's Google Cloud resources, configure your cluster to the https://www.googleapis.com/auth/cloud-platform access scope. This access scope provides full access to the Google Cloud resources and APIs that you have enabled in your project. If granting this access scope provides excessive access to Google Cloud, configure granular access using a service account.

Use the following instructions to replace your cluster's node pool with one that allows all workloads on this cluster to access all Google Cloud APIs that are enabled in your project. Before you change your GKE cluster, discuss these changes with your GKE administrator.

  1. Open Google Kubernetes Engine clusters in the Google Cloud console.

    Go to GKE clusters

  2. Click your cluster name. The cluster's details appear.

  3. In the GKE toolbar, click Add node pool. The Add a new node pool form opens.

  4. Supply the following information to the Add a new node pool form.

    • Number of nodes: Specify the number of nodes in your node pool. Your cluster must have 3 or more nodes to install Kubeflow Pipelines using Google Cloud Marketplace.
    • Machine type: Specify the Compute Engine machine type to use for instances in the node pool. Select a machine type with at least 2 CPUs and 4 GB of memory, such as n1-standard-2.

    • Access scopes: Click Allow full access to all Cloud APIs in Access scopes.

    Otherwise, configure your node pool as desired. Learn more about adding node pools to a cluster.

  5. Click Create node pool. Creating the node pool takes several minutes to complete.

  6. For each node pool in the Node pools section, except for the node pool you created in the previous step, click delete. The Delete a node pool dialog appears to confirm that you want to delete this node pool.

  7. Click Delete. Deleting the node pool takes several minutes.

  8. Once you have deleted the old node pools, check that your cluster has sufficient resources and access to install Kubeflow Pipelines from Google Cloud Marketplace.

Configuring your GKE cluster with granular access to Google Cloud APIs

Use the following instructions to configure a service account for your GKE cluster and replace your cluster's node pool with one that uses your service account. By creating a service account, you can granularly manage which Google Cloud resources the workloads on your cluster have access to. Before you change your GKE cluster, discuss these changes with your GKE administrator.

  1. Open a Cloud Shell session.

    Open Cloud Shell

    Cloud Shell opens in a frame at the bottom of the Google Cloud console.

  2. Run the following commands in Cloud Shell to create your service account and grant it sufficient access to run AI Platform Pipelines. Learn more about the roles required to run AI Platform Pipelines with a user-managed service account.

    export PROJECT=PROJECT_ID
    export SERVICE_ACCOUNT=SERVICE_ACCOUNT_NAME
    gcloud iam service-accounts create $SERVICE_ACCOUNT \
      --display-name=$SERVICE_ACCOUNT \
      --project=$PROJECT
    gcloud projects add-iam-policy-binding $PROJECT \
      --member="serviceAccount:$SERVICE_ACCOUNT@$PROJECT.iam.gserviceaccount.com" \
      --role=roles/logging.logWriter
    gcloud projects add-iam-policy-binding $PROJECT \
      --member="serviceAccount:$SERVICE_ACCOUNT@$PROJECT.iam.gserviceaccount.com" \
      --role=roles/monitoring.metricWriter
    gcloud projects add-iam-policy-binding $PROJECT \
      --member="serviceAccount:$SERVICE_ACCOUNT@$PROJECT.iam.gserviceaccount.com" \
      --role=roles/monitoring.viewer
    gcloud projects add-iam-policy-binding $PROJECT \
      --member="serviceAccount:$SERVICE_ACCOUNT@$PROJECT.iam.gserviceaccount.com" \
      --role=roles/storage.objectViewer

    Replace the following:

    • SERVICE_ACCOUNT_NAME: The name of the service account to create.
    • PROJECT_ID: The Google Cloud project that the service account is created in.
  3. Grant your service account access to any Google Cloud resources or APIs that your ML pipelines require. Learn more about Identity and Access Management roles and managing service accounts.

  4. Grant your user account the Service Account User (iam.serviceAccountUser) role on your service account.

    gcloud iam service-accounts add-iam-policy-binding \
      "SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com" \
      --member=user:USERNAME \
      --role=roles/iam.serviceAccountUser
    

    Replace the following:

    • SERVICE_ACCOUNT_NAME: The name of your service account.
    • PROJECT_ID: Your Google Cloud project.
    • USERNAME: Your username on Google Cloud.
  5. Open Google Kubernetes Engine clusters in the Google Cloud console.

    Go to GKE clusters

  6. Click your cluster name. The cluster's details appear.

  7. In the GKE toolbar, click Add node pool. The Add a new node pool form opens.

  8. Supply the following information to the Add a new node pool form.

    • Number of nodes: Specify the number of nodes in your node pool. Your cluster must have 3 or more nodes to install Kubeflow Pipelines using Google Cloud Marketplace.
    • Machine type: Specify the Compute Engine machine type to use for instances in the node pool. Select a machine type with at least 2 CPUs and 4 GB of memory, such as n1-standard-2.

    • Service account: Select the service account that you created in an earlier step.

    Otherwise, configure your node pool as desired. Learn more about adding node pools to a cluster.

  9. Click Create node pool. Creating the node pool takes several minutes to complete.

  10. For each node pool in the Node pools section, except for the node pool you created in the previous step, click delete. The Delete a node pool dialog appears to confirm that you want to delete this node pool.

  11. Click Delete. Deleting the node pool takes several minutes.

  12. Once you have deleted the old node pools, check that your cluster has sufficient resources and access to install Kubeflow Pipelines from Google Cloud Marketplace.

Use a Kubernetes secret to grant your cluster access to Google Cloud resources and APIs

Pipelines that are developed using the use_gcp_secret operator in the Kubeflow Pipelines SDK authenticate to Google Cloud resources using a Kubernetes secret.

Use these instructions to create a service account, grant the account access to the resources used by your pipelines, and then add the service account to your cluster as a Kubernetes secret.

  1. Open Google Kubernetes Engine clusters in the Google Cloud console.

    Go to GKE clusters

  2. In the row for your cluster, find the cluster name and zone.

  3. Open a Cloud Shell session.

    Open Cloud Shell

    Cloud Shell opens in a frame at the bottom of the Google Cloud console. Use Cloud Shell to complete the rest of this process.

  4. Set the following environment variables.

    export PROJECT_ID=PROJECT_ID
    export ZONE=ZONE
    export CLUSTER=CLUSTER_NAME
    export NAMESPACE=NAMESPACE
    export SA_NAME=SERVICE_ACCOUNT_NAME
    

    Replace the following:

    • PROJECT_ID: The Google Cloud project that your GKE cluster was created in.
    • ZONE: The Google Cloud zone that your GKE cluster was created in.
    • CLUSTER_NAME: The name of your GKE cluster.
    • NAMESPACE: The namespace in your GKE cluster where Kubeflow Pipelines is installed.

      Namespaces are used to manage resources in large Kubernetes clusters. If your cluster does not use namespaces, enter default as the kubernetes-namespace.

    • SERVICE_ACCOUNT_NAME: The name of the service account to create for your Kubeflow Pipelines cluster to access Google Cloud resources and APIs.

  5. Create a service account for your cluster.

    gcloud iam service-accounts create $SA_NAME \
      --display-name $SA_NAME --project "$PROJECT_ID"
    
  6. To grant your service account access to Google Cloud resources, bind Identity and Access Management roles to the service account. Use the following instructions to grant IAM roles to your service account. Call this command once for each role that you want to grant to your service account.

    gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member=serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \
      --role=iam-role
    
    • iam-role: The IAM role to grant to your service account. For example, roles/storage.admin grants full control of Cloud Storage buckets and objects in your project.

      To learn more about IAM roles, read the guide to understanding IAM roles.

  7. Create a private key for your service account in the current directory.

    gcloud iam service-accounts keys create ./service-account-key.json \
    --iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com
    
  8. Configure kubectl to connect to your cluster, then create the user-gcp-sa Kubernetes secret.

    gcloud container clusters get-credentials "$CLUSTER" --zone "$ZONE" \
      --project "$PROJECT_ID"
    
    kubectl create secret generic user-gcp-sa \
      --from-file=user-gcp-sa.json=./service-account-key.json \
      -n $NAMESPACE --dry-run -o yaml  |  kubectl apply -f -
    
  9. Clean up the service account's private key.

    rm ./service-account-key.json