Configuring a Google Kubernetes Engine cluster for AI Platform Pipelines

AI Platform Pipelines makes it easier to get started with Kubeflow Pipelines with TensorFlow Extended on Google Kubernetes Engine by saving you the difficulty of:

  • Creating a GKE cluster
  • Deploying Kubeflow Pipelines to your GKE cluster
  • Creating a Cloud Storage bucket to use to store pipeline artifacts

If you prefer, you can use AI Platform Pipelines to deploy Kubeflow Pipelines on an existing cluster that does not already have Kubeflow Pipelines installed. Use this guide to ensure that your cluster is configured correctly to deploy and run Kubeflow Pipelines.

Ensure that your GKE cluster has enough resources for AI Platform Pipelines

To use Google Cloud Marketplace to deploy Kubeflow Pipelines on a GKE cluster, the following must be true:

  • Your cluster must have at least 3 nodes with 2 CPUs and 4 GB of memory available.
  • The cluster's access scope must grant full access to all Cloud APIs.
  • The cluster must not already have Kubeflow Pipelines installed.

Use the following instructions to check if your cluster has sufficient resources to install AI Platform Pipelines.

  1. Open AI Platform Pipelines in Google Cloud Console.

    Go to AI Platform Pipelines

  2. In the AI Platform Pipelines toolbar, click New instance. Kubeflow Pipelines opens in Google Cloud Marketplace.

  3. Click Configure. The Deploy Kubeflow Pipelines form opens.

  4. Click Cluster to expand the list. GKE clusters that do not have enough resources or permissions are listed as Ineligible clusters. Each ineligible cluster includes a description of why Kubeflow Pipelines cannot be installed, such as:

Allocate more resources to your GKE cluster

To install Kubeflow Pipelines from Google Cloud Marketplace to an existing GKE cluster, your cluster must have at least 3 nodes with 2 CPU and 4 GB available.

Use the following instructions to replace the node pool in your cluster with one that has enough CPU and memory resources for AI Platform Pipelines.

  1. Open Google Kubernetes Engine clusters in Google Cloud Console.

    Go to GKE clusters

  2. Click your cluster name. The cluster's details appear.

  3. In the GKE toolbar, click Add node pool. The Add a new node pool form opens.

  4. Supply the following information to the Add a new node pool form.

    • Number of nodes: Specify the number of nodes in your node pool. Your cluster must have 3 or more nodes to install Kubeflow Pipelines using Google Cloud Marketplace.
    • Machine type: Specify the Compute Engine machine type to use for instances in the node pool. Select a machine type with at least 2 CPUs and 4 GB of memory, such as n1-standard-2.
    • Access scopes: Click Allow full access to all Cloud APIs in Access scopes.

    Otherwise, configure your node pool as desired. Learn more about adding node pools to a cluster.

  5. Click Create node pool. Creating the node pool takes several minutes to complete.

  6. For each node pool in the Node pools section, except for the node pool you created in the previous step, click delete. The Delete a node pool dialog appears to confirm that you want to delete this node pool.

  7. Click Delete.

  8. Deleting the node pool takes several minutes. Once the node pool has been deleted, check that your cluster has sufficient resources and access to install Kubeflow Pipelines from Google Cloud Marketplace.

Grant your GKE cluster access to Google Cloud resources and APIs

There are three ways to grant your ML pipelines access to Google Cloud resources and APIs:

AI Platform Pipelines requires you to grant your cluster access to all Google Cloud APIs when deploying Kubeflow Pipelines.

Use the following instructions to replace your cluster's node pool with one that is compatible with AI Platform Pipelines. Before you change your GKE cluster, discuss these changes with your GKE administrator. These changes allow all workloads on this cluster to access all Google Cloud APIs that are enabled in your project.

  1. Open Google Kubernetes Engine clusters in Google Cloud Console.

    Go to GKE clusters

  2. Click your cluster name. The cluster's details appear.

  3. In the GKE toolbar, click Add node pool. The Add a new node pool form opens.

  4. Supply the following information to the Add a new node pool form.

    • Number of nodes: Specify the number of nodes in your node pool. Your cluster must have 3 or more nodes to install Kubeflow Pipelines using Google Cloud Marketplace.
    • Machine type: Specify the Compute Engine machine type to use for instances in the node pool. Select a machine type with at least 2 CPUs and 4 GB of memory, such as n1-standard-2.
    • Access scopes: Click Allow full access to all Cloud APIs in Access scopes.

    Otherwise, configure your node pool as desired. Learn more about adding node pools to a cluster.

  5. Click Create node pool. Creating the node pool takes several minutes to complete.

  6. For each node pool in the Node pools section, except for the node pool you created in the previous step, click delete. The Delete a node pool dialog appears to confirm that you want to delete this node pool.

  7. Click Delete.

  8. Deleting the node pool takes several minutes. Once the node pool has been deleted, check that your cluster has sufficient resources and access to install Kubeflow Pipelines from Google Cloud Marketplace.

Use a Kubernetes secret to grant your cluster access to Google Cloud resources and APIs

Pipelines that are developed using the use_gcp_secret operator in the Kubeflow Pipelines SDK authenticate to Google Cloud resources using a Kubernetes secret.

Use these instructions to create a service account, grant the account access to the resources used by your pipelines, and then add the service account to your cluster as a Kubernetes secret.

  1. Open Google Kubernetes Engine clusters in Google Cloud Console.

    Go to GKE clusters

  2. In the row for your cluster, find the cluster name and zone.

  3. Open a Cloud Shell session.

    Open Cloud Shell

    Cloud Shell opens in a frame at the bottom of Google Cloud Console. Use Cloud Shell to complete the rest of this process.

  4. Set the following environment variables.

    export PROJECT_ID=project-id
    export ZONE=zone
    export CLUSTER=cluster-name
    export NAMESPACE=namespace
    export SA_NAME=service-account-name
    

    Replace the following:

    • project-id: The Google Cloud project that your GKE cluster was created in.
    • zone: The Google Cloud zone that your GKE cluster was created in.
    • cluster-name: The name of your GKE cluster.
    • namespace: The namespace in your GKE cluster where Kubeflow Pipelines is installed.

      Namespaces are used to manage resources in large Kubernetes clusters. If your cluster does not use namespaces, enter default as the kubernetes-namespace.

    • service-account-name: The name of the service account to create for your Kubeflow Pipelines cluster to access Google Cloud resources and APIs.

  5. Create a service account for your cluster.

    gcloud iam service-accounts create $SA_NAME \
      --display-name $SA_NAME --project "$PROJECT_ID"
    
  6. To grant your service account access to Google Cloud resources, bind Cloud Identity and Access Management roles to the service account. Use the following instructions to grant Cloud IAM roles to your service account. Call this command once for each role that you want to grant to your service account.

    gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member=serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com \
      --role=iam-role
    
    • iam-role: The Cloud IAM role to grant to your service account. For example, roles/storage.admin grants full control of Cloud Storage buckets and objects in your project.

      To learn more about Cloud IAM roles, read the guide to understanding Cloud IAM roles.

  7. Create a private key for your service account in the current directory.

    gcloud iam service-accounts keys create ./service-account-key.json \
    --iam-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com
    
  8. Configure kubectl to connect to your cluster, then create the user-gcp-sa Kubernetes secret.

    gcloud container clusters get-credentials "$CLUSTER" --zone "$ZONE" \
      --project "$PROJECT_ID"
    
    kubectl create secret generic user-gcp-sa \
      --from-file=user-gcp-sa.json=./service-account-key.json \
      -n $NAMESPACE --dry-run -o yaml  |  kubectl apply -f -
    
  9. Clean up the service account's private key.

    rm ./service-account-key.json