Scheduling an Export

This page describes how to schedule an automatic export of the entities that your application stores in Google Cloud Datastore.

The instructions on this page assume that your Google Cloud Platform project is already set up to use Cloud Datastore.

Recommendation

The recommended approach for scheduling Cloud Datastore exports is to use Google App Engine. This requires deploying an App Engine service that will handle a request from a cron job and execute the export request. The deployed service runs using the identity of the App Engine default service account.

Before you begin

  1. Ensure that you are using a billable account for your Cloud Platform project. Only Cloud Platform projects with billable accounts can use the export and import functionality. For information about billing, see Billing and Payments Support.

  2. If you haven't already, create a Cloud Storage bucket for your project. All exports and imports rely on Cloud Storage.

  3. Assign the Cloud Datastore Import Export Admin role to the App Engine default service account. This account is of the form YOUR_PROJECT_ID@appspot.gserviceaccount.com. You can use the gcloud command-line tool to assign the role:

    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com \
    --role roles/datastore.importExportAdmin
    

    Replace YOUR_PROJECT_ID with the ID of your Cloud Platform project.

    For more information and other options for assigning a role to a service account, see Granting roles to a service account for specific resources.

  4. Assign the Cloud Storage bucket write permission to the App Engine default service account. You can use the gsutil command-line tool to assign the permission:

    gsutil iam ch serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com:objectCreator \
    gs://BUCKET_NAME
    

    Replace YOUR_PROJECT_ID with the ID of your Cloud Platform project, and replace BUCKET_NAME with the name of your Cloud Storage bucket.

    For more information and other options for assigning Cloud Storage bucket permissions, see Using IAM with buckets.

Application files

In a new folder on your development machine, create the following files that provide the code for a cron job:

  • app.yaml
  • cloud_datastore_admin.py
  • cron.yaml

Use the following code for the files.

app.yaml

runtime: python27
api_version: 1
threadsafe: true
service: cloud-datastore-admin

libraries:
- name: webapp2
  version: "latest"

handlers:
- url: /cloud-datastore-export
  script: cloud_datastore_admin.app
  login: admin

cloud_datastore_admin.py

import datetime
import httplib
import json
import logging
import webapp2

from google.appengine.api import app_identity
from google.appengine.api import urlfetch


class Export(webapp2.RequestHandler):

  def get(self):
    access_token, _ = app_identity.get_access_token(
        'https://www.googleapis.com/auth/datastore')
    app_id = app_identity.get_application_id()
    timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%S')

    output_url_prefix = self.request.get('output_url_prefix')
    assert output_url_prefix and output_url_prefix.startswith('gs://')
    if '/' not in output_url_prefix[5:]:
      # Only a bucket name has been provided - no prefix or trailing slash
      output_url_prefix += '/' + timestamp
    else:
      output_url_prefix += timestamp

    entity_filter = {
        'kinds': self.request.get_all('kind'),
        'namespace_ids': self.request.get_all('namespace_id')
    }
    request = {
        'project_id': app_id,
        'output_url_prefix': output_url_prefix,
        'entity_filter': entity_filter
    }
    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer ' + access_token
    }
    url = 'https://datastore.googleapis.com/v1beta1/projects/%s:export' % app_id
    try:
      result = urlfetch.fetch(
          url=url,
          payload=json.dumps(request),
          method=urlfetch.POST,
          deadline=60,
          headers=headers)
      if result.status_code == httplib.OK:
        logging.info(result.content)
      elif result.status_code >= 500:
        logging.error(result.content)
      else:
        logging.warning(result.content)
      self.response.status_int = result.status_code
    except urlfetch.Error:
      logging.exception('Failed to initiate export.')
      self.response.status_int = httplib.INTERNAL_SERVER_ERROR


app = webapp2.WSGIApplication(
    [
        ('/cloud-datastore-export', Export),
    ], debug=True)

cron.yaml

If you have a default App Engine service already deployed for your Cloud Platform project, you can use the following file.

cron:
- description: "Daily Cloud Datastore Export"
  url: /cloud-datastore-export?namespace_id=&output_url_prefix=gs://BUCKET_NAME[/NAMESPACE_PATH]
  target: cloud-datastore-admin
  schedule: every 24 hours

If you do not have a default App Engine service already deployed for your Cloud Platform project, use the file as above, except remove the target: cloud-datastore-admin line. The cron job will then become your default App Engine service.

In cron.yaml, replace BUCKET_NAME with the name of your Cloud Storage bucket. If you are using the optional NAMESPACE_PATH for a Cloud Storage namespace path, replace it with your Cloud Storage namespace path (this is not a Cloud Datastore namespace). For more information about Cloud Storage namespace path, see Object name considerations.

The example cron.yaml sets the export to occur every 24 hours. For different schedule options, see Schedule format.

If you want to export entities of only specific kinds, modify the url value in cron.yaml to use a kind parameter. Similarly, if you want to export entities from only specific namespaces, modify the url value in cron.yaml to use a namespace_id parameter.

This example is to export entities of kind Song:

url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME&kind=Song

This example is to export entities of kind Song and kind Album:

url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME&kind=Song&kind=Album

This example is to export entities of kind Song and kind Album if they are in either the Classical namespace or the Pop namespace:

url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME&namespace_id=Classical&namespace_id=Pop&kind=Song&kind=Album

Deploy the cron app

Run the following command in the same directory where you created the files:

gcloud app deploy app.yaml cron.yaml

(If needed, use the --project flag to set the active Cloud Platform project.)

For more information about deploying an app, see Deploying a Python App.

Test your cron app

You can test your deployed cron job by starting it in the Cron Jobs page of the Google Cloud Platform Console.

  1. Open the Cron Jobs page in the Cloud Platform Console.
    Open the Cron Jobs page

  2. For the cron job with a description of Daily Cloud Datastore Export, click Run now.

  3. After the job completes, you can see the status message under Status. Click View to see the job log. The status message and job log will provide information on whether the job succeeded or if errors were encountered.

View your exports

After a cron job successfully completes, you can view the exports in your Cloud Storage bucket.

  1. Open the Cloud Storage browser in the Cloud Platform Console.
    Open the Cloud Storage browser

  2. In the list of buckets, click on the bucket that you created for your exports.

  3. Verify exports are listed in the bucket.

What's next

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Datastore Documentation