Hide
Google Cloud Storage

Python Example

This tutorial shows you how to write a simple Python program that performs basic Google Cloud Storage operations using the XML API. This document assumes you are familiar with Python and the Google Cloud Storage concepts and operations presented in the Hello Google Cloud Storage! guide.

Setting up your environment

Before starting this tutorial, you must do the following:

  1. Install gsutil on your computer.
  2. Install the boto library and gcs-oauth2-boto-plugin.

    boto is an open source Python library that is used as an interface to Google Cloud Storage. gcs-oauth2-boto-plugin is an authentication plugin for the boto auth plugin framework. It provides OAuth 2.0 credentials that can be used with Google Cloud Storage.

    Setup to use the boto library and oauth2 plugin will depend on the system you are using. Use the setup examples below as guidance. These commands install pip and then use pip to install other packages. The last three commands show testing the import of the two modules to verify the installation.

    Debian and Ubuntu

    $ wget https://bootstrap.pypa.io/get-pip.py
    $ sudo python get-pip.py
    $ sudo apt-get update
    $ sudo apt-get upgrade
    $ sudo apt-get install gcc python-dev python-setuptools libffi-dev
    $ sudo pip install virtualenv
    $ virtualenv venv
    $ source ./venv/bin/activate
    $ (venv)pip install gcs-oauth2-boto-plugin==1.8
    $ (venv)python
    >>>import boto
    >>>import gcs_oauth2_boto_plugin
    

    CentOS, RHEL, and Fedora

    $ wget https://bootstrap.pypa.io/get-pip.py
    $ sudo python get-pip.py
    $ sudo yum install gcc openssl-devel python-devel python-setuptools libffi-devel
    $ sudo pip install virtualenv
    $ virtualenv venv
    $ source ./venv/bin/activate
    $ (venv)pip install gcs-oauth2-boto-plugin==1.8
    $ (venv)python
    >>>import boto
    >>>import gcs_oauth2_boto_plugin
    
  3. Set up your boto configuration file to use OAuth2.0.

    You can configure your boto configuration file to use service account or user account credentials. Service account credentials are the preferred type of credential to use when authenticating on behalf of a service or application. User account credentials are the preferred type of credentials for authenticating requests on behalf of a specific user (i.e., a human). For more information about these two credential types, see Supported Credential Types.

    Using service account credentials

    1. Use an existing service account or create a new one, and download the associated private key.

    2. Configure the .boto file with the service account. You can do this with gsutil:
      $ gsutil config -e

      The command will prompt you for the service account email address and the location of the service account private key (.p12). Be sure to have the private key on the computer where you are running the gsutil command.

    Using user account credentials

    1. If you don't already have a .boto file create one. You can do this with gsutil.

      $ gsutil config
    2. Use an existing client ID for an application or create a new one.

    3. Edit the .boto file. In the [OAuth2] section, specify the client_id and client_secret values with the ones you generated.

    4. Run the gsutil config again command to generate a refresh token based on the client ID and secret you entered.

      If you get an error message that indicates the .boto cannot be backed up, remove or rename the backup configuration file .boto.bak.

    5. Configure refresh token fallback logic.

      The gcs-oauth2-boto-plugin requires fallback logic for generating auth tokens when you are using application credentials. Fallback logic is not needed when you use a service account.

      You have the following options for enabling fallback:

      1. Set the client_id and the client_secret in the .boto config file. This is the recommended option, and it is required for using gsutil with your new .boto config file.
      2. Set environment variables OAUTH2_CLIENT_ID and OAUTH2_CLIENT_SECRET.
      3. Use the SetFallbackClientIdAndSecret function as shown in the examples below.

Setting up your Python source file

To start this tutorial, use your favorite text editor to create a new Python file. Then, add the following directives, import statements, configuration, and constant assignments shown.

Note that in the code here, we use the SetFallbackClientIdAndSecret function as a fallback for generating refresh tokens. See Using application credentials for other ways to specify a fallback. If you are using a service account to authenticate, you do not need to include the fallback logic.

#!/usr/bin/python

import boto
import gcs_oauth2_boto_plugin
import os
import shutil
import StringIO
import tempfile
import time

# URI scheme for Google Cloud Storage.
GOOGLE_STORAGE = 'gs'
# URI scheme for accessing local files.
LOCAL_FILE = 'file'

# Fallback logic. In https://console.developers.google.com
# under Credentials, create a new client ID for an installed application.
# Required only if you have not configured client ID/secret in
# the .boto file or as environment variables.
CLIENT_ID = 'your client id'
CLIENT_SECRET = 'your client secret'
gcs_oauth2_boto_plugin.SetFallbackClientIdAndSecret(CLIENT_ID, CLIENT_SECRET)

Creating buckets

This code creates two buckets. Because bucket names must be globally unique (see the naming guidelines), a timestamp is appended to each bucket name to help guarantee uniqueness.

If these bucket names are already in use, you'll need to modify the code to generate unique bucket names.

now = time.time()
CATS_BUCKET = 'cats-%d' % now
DOGS_BUCKET = 'dogs-%d' % now

# Your project ID can be found at https://console.developers.google.com/
# If there is no domain for your project, then project_id = 'YOUR_PROJECT'
project_id = 'YOUR_DOMAIN:YOUR_PROJECT'

for name in (CATS_BUCKET, DOGS_BUCKET):
  # Instantiate a BucketStorageUri object.
  uri = boto.storage_uri(name, GOOGLE_STORAGE)
  # Try to create the bucket.
  try:
    # If the default project is defined,
    # you do not need the headers.
    # Just call: uri.create_bucket()
    header_values = {"x-goog-project-id": project_id}
    uri.create_bucket(headers=header_values)

    print 'Successfully created bucket "%s"' % name
  except boto.exception.StorageCreateError, e:
    print 'Failed to create bucket:', e

Listing buckets

To retrieve a list of all buckets, call storage_uri() to instantiate a BucketStorageUri object, specifying the empty string as the URI. Then, call the get_all_buckets() instance method.

uri = boto.storage_uri('', GOOGLE_STORAGE)
# If the default project is defined, call get_all_buckets() without arguments.
for bucket in uri.get_all_buckets(headers=header_values):
  print bucket.name

Uploading objects

To upload objects, create a file object (opened for read) that points to your local file and a storage URI object that points to the destination object on Google Cloud Storage. Call the set_file_from_contents() instance method, specifying the file handle as the argument.

# Make some temporary files.
temp_dir = tempfile.mkdtemp(prefix='googlestorage')
tempfiles = {
    'labrador.txt': 'Who wants to play fetch? Me!',
    'collie.txt': 'Timmy fell down the well!'}
for filename, contents in tempfiles.iteritems():
  with open(os.path.join(temp_dir, filename), 'w') as fh:
    fh.write(contents)

# Upload these files to DOGS_BUCKET.
for filename in tempfiles:
  with open(os.path.join(temp_dir, filename), 'r') as localfile:

    dst_uri = boto.storage_uri(
        DOGS_BUCKET + '/' + filename, GOOGLE_STORAGE)
    # The key-related functions are a consequence of boto's
    # interoperability with Amazon S3 (which employs the
    # concept of a key mapping to localfile).
    dst_uri.new_key().set_contents_from_file(localfile)
  print 'Successfully created "%s/%s"' % (
      dst_uri.bucket_name, dst_uri.object_name)

shutil.rmtree(temp_dir)  # Don't forget to clean up!

Listing objects

To list all objects in a bucket, call storage_uri() and specify the bucket's URI and the Google Cloud Storage URI scheme as the arguments. Then, retrieve a list of objects using the get_bucket() instance method.

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for obj in uri.get_bucket():
  print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)
  print '  "%s"' % obj.get_contents_as_string()

Downloading and copying objects

This code reads objects in DOGS_BUCKET and copies them to both your home directory and CATS_BUCKET. It also demonstrates that you can use the boto library to operate against both local files and Google Cloud Storage objects using the same interface.

dest_dir = os.getenv('HOME')
for filename in ('collie.txt', 'labrador.txt'):
  src_uri = boto.storage_uri(
      DOGS_BUCKET + '/' + filename, GOOGLE_STORAGE)

  # Create a file-like object for holding the object contents.
  object_contents = StringIO.StringIO()

  # The unintuitively-named get_file() doesn't return the object
  # contents; instead, it actually writes the contents to
  # object_contents.
  src_uri.get_key().get_file(object_contents)

  local_dst_uri = boto.storage_uri(
      os.path.join(dest_dir, filename), LOCAL_FILE)

  bucket_dst_uri = boto.storage_uri(
      CATS_BUCKET + '/' + filename, GOOGLE_STORAGE)

  for dst_uri in (local_dst_uri, bucket_dst_uri):
    object_contents.seek(0)
    dst_uri.new_key().set_contents_from_file(object_contents)

  object_contents.close()

Changing object ACLs

This code grants the specified Google account FULL_CONTROL permissions for labrador.txt. Remember to replace valid-email-address with a valid Google account email address.

uri = boto.storage_uri(DOGS_BUCKET + '/labrador.txt', GOOGLE_STORAGE)
print str(uri.get_acl())
uri.add_email_grant('FULL_CONTROL', 'valid-email-address')
print str(uri.get_acl())

Reading bucket and object metadata

This code retrieves and prints the metadata associated with a bucket and an object.

# Print ACL entries for DOGS_BUCKET.
bucket_uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE)
for entry in bucket_uri.get_bucket().get_acl().entries.entry_list:
  entry_id = entry.scope.id
  if not entry_id:
    entry_id = entry.scope.email_address
  print 'SCOPE: %s' % entry_id
  print 'PERMISSION: %s\n' % entry.permission

# Print object metadata and ACL entries.
object_uri = boto.storage_uri(DOGS_BUCKET + '/labrador.txt', GOOGLE_STORAGE)
key = object_uri.get_key()
print ' Object size:\t%s' % key.size
print ' Last mod:\t%s' % key.last_modified
print ' MIME type:\t%s' % key.content_type
print ' MD5:\t%s' % key.etag.strip('"\'') # Remove surrounding quotes
for entry in key.get_acl().entries.entry_list:
  entry_id = entry.scope.id
  if not entry_id:
    entry_id = entry.scope.email_address
  print 'SCOPE: %s' % entry_id
  print 'PERMISSION: %s\n' % entry.permission

Deleting objects and buckets

To conclude this tutorial, this code deletes the objects and buckets that you have created. A bucket must be empty before it can be deleted, so its objects are first deleted.

for bucket in (CATS_BUCKET, DOGS_BUCKET):
  uri = boto.storage_uri(bucket, GOOGLE_STORAGE)
  for obj in uri.get_bucket():
    print 'Deleting object: %s...' % obj.name
    obj.delete()
  print 'Deleting bucket: %s...' % uri.bucket_name
  uri.delete_bucket()

Back to top