Unauthorized access to AI Platform training jobs

Problem

When you try to create an AI Platform training job, it fails to access a Google Cloud Platform resource, with permission denied or unauthorized errors.

Environment

  • AI Platform training job with access other Google Cloud Platform resources (Example: training data source)

Solution

Follow the documentation on the Google Cloud Platform resource that is being accessed and give the adequate permission to the training project's.

  1. Compute Engine default Service Account and the
  2. Cloud ML Service Agent Service Account (if present).

Their default form is, respectively:

  1. <project_number>-compute@developer.gserviceaccount.com
  2. service-<project_number>@cloud-ml.google.com.iam.gserviceaccount.com

The second SA may not be present in all projects and both could have a different ID format (Example: if the customer has replaced them). If they are different, they can be found in the IAM Service Accounts panel by looking, respectively, for accounts that have this role:

  1. Compute Engine default service account
  2. Cloud ML Service Agent

Cause

AI Platform may execute training code under the Service Accounts above. As such, the SAs need access to resources being accessed by the training job.