Python 2 is no longer supported by the community. We recommend that you migrate Python 2 apps to Python 3.

Reading and Writing to Cloud Storage

This document describes how to store and retrieve data using Cloud Storage in an App Engine app using the App Engine client library for Cloud Storage. It assumes that you completed the tasks described in Setting Up for Cloud Storage to activate a Cloud Storage bucket and download the client libraries. It also assumes that you know how to build an App Engine application, as described in the Quickstart for Python 2 App Engine standard environment.

Required imports

The main.py file contains the typical imports used for accessing Cloud Storage using the client library:

import logging
import os
import cloudstorage as gcs
import webapp2

from google.appengine.api import app_identity

You need the os module and the app_identity API to get the default bucket name at runtime. Note that if you don't use the default bucket, you'll need some other way to supply the bucket name.

Specifying the Cloud Storage bucket

Before doing any operations in Cloud Storage, you need to supply the bucket name. The easiest way to do this is to use the default bucket for your project, which can be obtained as follows:

def get(self):
  bucket_name = os.environ.get('BUCKET_NAME',
                               app_identity.get_default_gcs_bucket_name())

  self.response.headers['Content-Type'] = 'text/plain'
  self.response.write('Demo GCS Application running from Version: '
                      + os.environ['CURRENT_VERSION_ID'] + '\n')
  self.response.write('Using bucket name: ' + bucket_name + '\n\n')

The call to get_default_gcs_bucket_name succeeds only if you have created the default bucket for your project.

Writing to Cloud Storage

The following sample shows how to write to the bucket:

def create_file(self, filename):
  """Create a file.

  The retry_params specified in the open call will override the default
  retry params for this particular file handle.

  Args:
    filename: filename.
  """
  self.response.write('Creating file %s\n' % filename)

  write_retry_params = gcs.RetryParams(backoff_factor=1.1)
  gcs_file = gcs.open(filename,
                      'w',
                      content_type='text/plain',
                      options={'x-goog-meta-foo': 'foo',
                               'x-goog-meta-bar': 'bar'},
                      retry_params=write_retry_params)
  gcs_file.write('abcde\n')
  gcs_file.write('f'*1024*4 + '\n')
  gcs_file.close()
  self.tmp_filenames_to_clean_up.append(filename)

Notice that in the call to open the file for write, the sample specifies certain Cloud Storage headers that write custom metadata for the file; this metadata can be retrieved using cloudstorage.stat(). You can find the list of supported headers in the cloudstorage.open() reference.

Notice also that the x-goog-acl header is not set. That means the default Cloud Storage ACL of public read is going to be applied to the object when it is written to the bucket.

Finally, notice the call to close the file after you finish the write. If you don't do this, the file is not written to Cloud Storage. Be aware that after you call the Python file function close(), you cannot append to the file. If you need to modify a file, you'll have to call the Python file function open() to open the file again in write mode, which does an overwrite, not an append.

Reading from Cloud Storage

The following sample shows how to read a full file from the bucket:

def read_file(self, filename):
  self.response.write('Reading the full file contents:\n')

  gcs_file = gcs.open(filename)
  contents = gcs_file.read()
  gcs_file.close()
  self.response.write(contents)

To read selected lines from the file, use seek():

def read_partial_file(self, filename):
  self.response.write('Abbreviated file content (first line and last 1K):\n')

  gcs_file = gcs.open(filename)
  self.response.write(gcs_file.readline())
  gcs_file.seek(-1024, os.SEEK_END)
  self.response.write(gcs_file.read())
  gcs_file.close()

In both examples, the filename argument that you pass to cloudstorage.open() is the path to your file in YOUR_BUCKET_NAME/PATH_IN_GCS format. Note that the default for cloudstorage.open() is read-only mode. You do not need to specify a mode when opening a file to read it.

Listing bucket contents

The sample code shows how to page through a bucket with a large number of files, using the marker, and max_keys parameters to page through a list of the contents of the bucket:

def list_bucket(self, bucket):
  """Create several files and paginate through them.

  Production apps should set page_size to a practical value.

  Args:
    bucket: bucket.
  """
  self.response.write('Listbucket result:\n')

  page_size = 1
  stats = gcs.listbucket(bucket + '/foo', max_keys=page_size)
  while True:
    count = 0
    for stat in stats:
      count += 1
      self.response.write(repr(stat))
      self.response.write('\n')

    if count != page_size or count == 0:
      break
    stats = gcs.listbucket(bucket + '/foo', max_keys=page_size,
                           marker=stat.filename)

Note that the complete file name is displayed as one string without directory delimiters. If you want to display the file with its more recognizable directory hierarchy, set the delimiter parameter to the directory delimiter you want to use.

Deleting files in Cloud Storage

The code below demonstrates how to delete a file from Cloud Storage using the cloudstorage.delete() method (imported as gcs).

def delete_files(self):
  self.response.write('Deleting files...\n')
  for filename in self.tmp_filenames_to_clean_up:
    self.response.write('Deleting file %s\n' % filename)
    try:
      gcs.delete(filename)
    except gcs.NotFoundError:
      pass

This example cleans up the files that were written to the bucket in the Writing to Cloud Storage section.

What's next