Working with Cloud Storage

Cloud ML Engine reads data from Cloud Storage locations where you have granted access to your Cloud ML Engine project. This page gives a quick guide to using Cloud Storage with Cloud ML Engine.

Overview

Using Cloud Storage is required or recommended for the following aspects of Cloud ML Engine services:

Training

  • Staging your training application and custom dependencies.
  • Storing your training input data.
  • Storing your training output data.

Prediction

  • Storing your saved model to make it into a model version.

Batch prediction

  • Storing your batch prediction input files.
  • Storing your batch prediction output.

Region considerations

When you create a Cloud Storage bucket to use with Cloud ML Engine you should:

  • Assign it to a specific compute region, not to a multi-region value.
  • Use the same region where you run your training jobs.

See more about the Cloud ML Engine available regions.

Setting up your Cloud Storage buckets

This section shows you how to create a new bucket. You can use an existing bucket, but if it is not part of the project you are using to run Cloud ML Engine, you must explicitly grant access to the Cloud ML Engine service accounts.

  1. Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.

    BUCKET_NAME="your_bucket_name"

    For example, use your project name with -mlengine appended:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    BUCKET_NAME=${PROJECT_ID}-mlengine
  2. Check the bucket name that you created.

    echo $BUCKET_NAME
  3. Select a region for your bucket and set a `REGION` environment variable.

    Warning: You must specify a region (like us-central1) for your bucket, not a multi-region location (like us). See the available regions for Cloud ML Engine services. For example, the following code creates REGION and sets it to us-central1.

    REGION=us-central1
  4. Create the new bucket:

    gsutil mb -l $REGION gs://$BUCKET_NAME

    Note: Use the same region where you plan on running Cloud ML Engine jobs. The example uses us-central1 because that is the region used in the getting-started instructions.

Model organization in buckets

Organize the folder structure in your bucket to accommodate many iterations of your model.

  • Place each saved model into its own separate directory within your bucket.
  • Consider using timestamps to name the directories in your bucket.

For example, you can place your first model in a structure similar to gs://your-bucket/your-model-DATE1/your-saved-model-file. To name the directories for each subsequent iteration of your model, use an updated timestamp (gs://your-bucket/your-model-DATE2/your-saved-model-file and so on).

Using a Cloud Storage bucket from a different project

This section describes how to configure Cloud Storage buckets from outside of your project so that Cloud ML Engine can access them.

If you set up your Cloud Storage bucket in the same project where you are using Cloud ML Engine, your Cloud ML Engine service accounts already have the necessary permissions to access your Cloud Storage bucket.

These instructions are provided for the following cases:

  • You are unable to use a bucket from your project, such as when a large dataset is shared across multiple projects.
  • If you use multiple buckets with Cloud ML Engine, you must grant access to the Cloud ML Engine service accounts separately for each one.

Step 1: Get required information from your cloud project

Console

  1. Open the IAM page in the Google Cloud Platform Console.

    Open the IAM Page

  2. The IAM page displays a list of all members in your project with their associated role(s). Your Cloud ML Engine project has multiple service accounts. Locate the service account in the list that has the role Cloud ML Service Agent and copy that service account ID, which looks similar to this:

    "service-111111111111@cloud-ml.google.com.iam.gserviceaccount.com".

    You need to paste this service account ID into a different page in the GCP Console during the next steps.

Command Line

The steps in this section get information about your Google Cloud Platform project in order to use them to change access control for your project's Cloud ML Engine service account. You need to store the values for later use in environment variables.

  1. Get your project identifier by using the gcloud command-line tool with your project selected:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    
  2. Get the access token for your project by using gcloud:

    AUTH_TOKEN=$(gcloud auth print-access-token)
    
  3. Get the service account information by requesting project configuration from the REST service:

    SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
        -H "Authorization: Bearer $AUTH_TOKEN" \
        https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
        | python -c "import json; import sys; response = json.load(sys.stdin); \
        print response['serviceAccount']")
    

Step 2: Configure access to your Cloud Storage bucket

Console

  1. Open the Storage page in the GCP Console.

    Open the Storage Page

  2. Select the Cloud Storage bucket you use to deploy models by checking the box to the left of the bucket name.

  3. Click the Show Info Panel button in the upper right corner to display the Permissions tab.

  4. Paste the service account ID into the Add Members field. To the right of that field, select your desired role(s), such as Storage Legacy Bucket Reader.

    If you are not sure which role to select, you may select multiple roles to see them displayed below the Add Members field, each with a brief description of its permissions.

  5. To assign your desired role(s) to the service account, click the Add button to the right of the Add Members field.

Command Line

Now that you have your project and service account information, you need to update the access permissions for your Cloud Storage bucket. These steps use the same variable names used in the previous section.

  1. Set the name of your bucket in an environment variable named BUCKET_NAME:

    BUCKET_NAME="your_bucket_name"
    
  2. Grant the service account read access to your Cloud Storage bucket:

    gsutil -m defacl ch -u $SVC_ACCOUNT:R gs://$BUCKET_NAME
    
  3. If your bucket already contains objects that you need to access, you must grant read access to them explicitly:

    gsutil -m acl ch -u $SVC_ACCOUNT:R -r gs://$BUCKET_NAME
    
  4. Grant write access:

    gsutil -m acl ch -u $SVC_ACCOUNT:W gs://$BUCKET_NAME
    

To choose a role to grant to your Cloud ML Engine service account, see the Cloud Storage IAM roles. For more general information about updating IAM roles in Cloud Storage, see how to grant access to a service account for a resource.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud ML Engine for TensorFlow