Troubleshooting AI Platform Pipelines

Use the following tips to fix problems in your AI Platform Pipelines cluster.

Understanding why a pipeline run failed

Each step in a pipeline run has a log that describes that step's activity. Read and analyze these logs to better understand why the run failed. Use the following instructions to check the logs for a pipeline run.

  1. Open AI Platform Pipelines in the Google Cloud Console.

    Go to AI Platform Pipelines

  2. Click Open pipelines dashboard for your Kubeflow Pipelines cluster. The Kubeflow Pipelines user interface opens in a new tab.

  3. In the left navigation panel, click Experiments. A list of pipeline experiments appears.

  4. Click All runs. A list of pipeline runs appears.

  5. Click the name of the pipeline run that you want to troubleshoot. A graph displaying the steps in the pipeline opens.

  6. Pipeline steps with a green check mark completed successfully. Steps with a red exclamation point failed.

    Click the pipeline step that you want to troubleshoot. A section with the step's artifacts, inputs, outputs, volumes, manifest, and logs appears.

  7. Review each tab to understand the inputs and outputs, artifacts created, and activity recorded in the log. You may need to research several steps to find the source of the error.

Access forbidden to Kubeflow Pipelines dashboard

If you get the message forbidden while accessing the Kubeflow Pipelines dashboard for an AI Platform Pipelines cluster, you don't have sufficient permissions to access the cluster. This issue can occur when someone else creates a Google Kubernetes Engine cluster and deploys AI Platform Pipelines for you.

To resolve this issue, ask your Google Cloud administrator to use the following instructions to grant your account access to AI Platform Pipelines:

Use the following instructions to grant a user account access to your AI Platform Pipelines cluster.

  1. Open AI Platform Pipelines in the Google Cloud Console.

    Go to AI Platform Pipelines

  2. Find your AI Platform Pipelines cluster. Take note of the Cluster and Zone for use in subsequent steps.

  3. Open a Cloud Shell session.

    Open Cloud Shell

    Cloud Shell opens in a frame at the bottom of the Google Cloud Console. Use Cloud Shell to complete the rest of this process.

  4. Run the following command to set the default Cloud project for this Cloud Shell session.

    gcloud config set project PROJECT_ID
    

    Replace PROJECT_ID with your Cloud project ID.

  5. Run the following command to find the service account that your GKE cluster uses.

    gcloud container clusters describe CLUSTER_NAME --zone ZONE \
    --format="flattened(nodePools[].config.serviceAccount)"
    

    Replace the following:

    • CLUSTER_NAME: The name of your GKE cluster.
    • ZONE: The zone that your cluster was created in.

    The response might indicate that your cluster uses a service account named default. This value refers to the default service account for Compute Engine. Run the following command to find the full name of this service account.

    gcloud iam service-accounts list \
    --filter "compute@developer.gserviceaccount.com"
    

    Learn more about the Compute Engine default service account.

  6. Grant your user account the Service Account User role on your GKE cluster's service account.

    gcloud iam service-accounts add-iam-policy-binding \
    SERVICE_ACCOUNT_NAME \
    --member=user:USERNAME \
    --role=roles/iam.serviceAccountUser
    

    Replace the following:

    • SERVICE_ACCOUNT_NAME: The name of your GKE cluster's service account, which you found in the previous step. Service account names are formatted like *@*.gserviceaccount.com.
    • USERNAME: Your username on Google Cloud.
  7. Grant your user account access to the GKE Cluster Viewer role on the project.

    gcloud projects add-iam-policy-binding PROJECT_ID \
    --member user:USERNAME --role roles/container.clusterViewer
    

    Replace the following:

    • PROJECT_ID: The ID of your Google Cloud project.
    • USERNAME: Your username on Google Cloud.

Insufficient permissions while running a pipeline

While running a pipeline that accesses Google Cloud resources, you may get an insufficient permissions error. For example:

Error executing an HTTP request: HTTP response code 403 with body '{
  "error": {
    "errors": [
      {
       "domain": "global",
       "reason": "insufficientPermissions",
       "message": "Insufficient Permission"
      }
    ],
    "code": 403,
    "message": "Insufficient Permission"
  }
}'

For a pipeline step to access Google Cloud resources or APIs, the Google Kubernetes Engine cluster and pipeline must do the following:

"Server was only able to partially fulfill your request" warning message

You may see the following message when a cluster is upgrading, or when AI Platform Pipelines is being deployed.

Sorry, the server was only able to partially fulfill
your request. Some data might not be rendered.

If you see this message, wait for five minutes and then refresh the page.