Working with Cloud Storage

AI Platform Training reads data from Cloud Storage locations where you have granted access to your AI Platform Training project. This page gives a quick guide to using Cloud Storage with AI Platform Training.

Overview

Using Cloud Storage is required or recommended for the following aspects of AI Platform Training services:

  • Staging your training application and custom dependencies.
  • Storing your training input data, such as tabular or image data.
  • Storing your training output data.

Region considerations

When you create a Cloud Storage bucket to use with AI Platform Training you should:

  • Assign it to a specific compute region, not to a multi-region value.
  • Use the same region where you run your training jobs.

See more about the AI Platform Training available regions.

Setting up your Cloud Storage buckets

This section shows you how to create a new bucket. You can use an existing bucket, but it must be in the same region where you plan on running AI Platform jobs. Additionally, if it is not part of the project you are using to run AI Platform Training, you must explicitly grant access to the AI Platform Training service accounts.

  1. Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.

    BUCKET_NAME="YOUR_BUCKET_NAME"

    For example, use your project name with -aiplatform appended:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    BUCKET_NAME=${PROJECT_ID}-aiplatform
  2. Check the bucket name that you created.

    echo $BUCKET_NAME
  3. Select a region for your bucket and set a REGION environment variable.

    Use the same region where you plan on running AI Platform Training jobs. See the available regions for AI Platform Training services.

    For example, the following code creates REGION and sets it to us-central1:

    REGION=us-central1
  4. Create the new bucket:

    gsutil mb -l $REGION gs://$BUCKET_NAME

Model organization in buckets

Organize the folder structure in your bucket to accommodate many iterations of your model.

  • Place each saved model into its own separate directory within your bucket.
  • Consider using timestamps to name the directories in your bucket.

For example, you can place your first model in a structure similar to gs://your-bucket/your-model-DATE1/your-saved-model-file. To name the directories for each subsequent iteration of your model, use an updated timestamp (gs://your-bucket/your-model-DATE2/your-saved-model-file and so on).

Accessing Cloud Storage during training

Use a Python module that can read from Cloud Storage in your training code, like the Python Client for Google Cloud Storage, TensorFlow's tf.io.gfile.GFile module, or pandas 0.24.0 or later. AI Platform Training takes care of authentication.

Using a Cloud Storage bucket from a different project

This section describes how to configure Cloud Storage buckets from outside of your project so that AI Platform Training can access them.

If you set up your Cloud Storage bucket in the same project where you are using AI Platform Training, your AI Platform Training service accounts already have the necessary permissions to access your Cloud Storage bucket.

These instructions are provided for the following cases:

  • You are unable to use a bucket from your project, such as when a large dataset is shared across multiple projects.
  • If you use multiple buckets with AI Platform Training, you must grant access to the AI Platform Training service accounts separately for each one.

Step 1: Get required information from your cloud project

Console

  1. Open the IAM page in the Google Cloud console.

    Open the IAM Page

  2. The IAM page displays a list of all principals that have access to your project, along with their associated role(s). Your AI Platform Training project has multiple service accounts. Locate the service account in the list that has the role Cloud ML Service Agent and copy that service account ID, which looks similar to this:

    "service-111111111111@cloud-ml.google.com.iam.gserviceaccount.com".

    You need to paste this service account ID into a different page in the Google Cloud console during the next steps.

Command Line

The steps in this section get information about your Google Cloud project in order to use them to change access control for your project's AI Platform Training service account. You need to store the values for later use in environment variables.

  1. Get your project identifier by using the Google Cloud CLI with your project selected:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    
  2. Get the access token for your project by using gcloud:

    AUTH_TOKEN=$(gcloud auth print-access-token)
    
  3. Get the service account information by requesting project configuration from the REST service:

    SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
        -H "Authorization: Bearer $AUTH_TOKEN" \
        https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
        | python3 -c "import json; import sys; response = json.load(sys.stdin); \
        print(response['serviceAccount'])")
    

Step 2: Configure access to your Cloud Storage bucket

Console

  1. Open the Storage page in the Google Cloud console.

    Open the Storage Page

  2. Select the Cloud Storage bucket you use to deploy models by checking the box to the left of the bucket name.

  3. Click the Show Info Panel button in the upper right corner to display the Permissions tab.

  4. Paste the service account ID into the Add Principals field. To the right of that field, select your desired role(s), such as Storage Legacy Bucket Reader.

    If you are not sure which role to select, you may select multiple roles to see them displayed below the Add Principals field, each with a brief description of its permissions.

  5. To assign your desired role(s) to the service account, click the Add button to the right of the Add Principals field.

Command Line

Now that you have your project and service account information, you need to update the access permissions for your Cloud Storage bucket. These steps use the same variable names used in the previous section.

  1. Set the name of your bucket in an environment variable named BUCKET_NAME:

    BUCKET_NAME="your_bucket_name"
    
  2. Grant the service account read access to your Cloud Storage bucket:

    gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET_NAME
    
  3. If your bucket already contains objects that you need to access, you must grant read access to them explicitly:

    gsutil -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET_NAME
    
  4. Grant write access:

    gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET_NAME
    

To choose a role to grant to your AI Platform Training service account, see the Cloud Storage IAM roles. For more general information about updating IAM roles in Cloud Storage, see how to grant access to a service account for a resource.

What's next