Scheduling an Export

This page describes how to schedule an automatic export of your Cloud Firestore in Datastore mode entities.

To run exports on a schedule, we recommend following the steps on this page to deploy an App Engine service that uses the Datastore mode managed export feature to start export operations. Once deployed, you can run this service on a schedule with the App Engine Cron Service.

Before you begin

Before you can schedule data exports with App Engine and the managed export feature, you must complete the following tasks:

  1. Enable billing for your GCP project. Only GCP projects with billing enabled can use the export and import feature.

  2. Create a Cloud Storage bucket for your project. All exports and imports rely on Cloud Storage. You must use the same location for your Cloud Storage bucket and Datastore mode. To find your Datastore mode location, see viewing the location of your project.

  3. Install the Google Cloud SDK to deploy the application.

Setting up scheduled exports

After completing the requirements above, set up scheduled exports by completing the procedures in this section.

Configuring access permissions

This app uses the App Engine default service account to authenticate and authorize its Datastore mode export requests. When you create a project, App Engine creates a default service account for you with the following format:

YOUR_PROJECT_ID@appspot.gserviceaccount.com

The service account requires permission to start Datastore mode export operations and to write to your Cloud Storage bucket. To grant these permissions, assign the following IAM roles to the default service account:

  • Datastore mode Import Export Admin
  • Storage Admin of the Cloud Storage bucket

You can use the gcloud and gsutil command-line tools from the Cloud SDK to assign these roles:

  1. Use the gcloud command-line tool to assign the Datastore mode Export Admin role:

    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
        --member serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com \
        --role roles/datastore.importExportAdmin
    

    Alternatively, you can assign this role using the GCP Console.

  2. Use the gsutil command-line tool to assign the Storage Admin role on your bucket:

    gsutil iam ch serviceAccount:YOUR_PROJECT_ID@appspot.gserviceaccount.com:storage.admin \
        gs://BUCKET_NAME
    

    Alternatively, you can assign this role using the GCP Console.

Deploying the app

Deploy the following sample app in either Python or Java:

Python

Create the app files

In a new folder on your development machine, create the following files that provide the code for an App Engine app:

  • app.yaml
  • cloud_datastore_admin.py

Use the following code for the files.

app.yaml

runtime: python27
api_version: 1
threadsafe: true
service: cloud-datastore-admin

libraries:
- name: webapp2
  version: "latest"

handlers:
- url: /cloud-datastore-export
  script: cloud_datastore_admin.app
  login: admin

cloud_datastore_admin.py

import datetime
import httplib
import json
import logging
import webapp2

from google.appengine.api import app_identity
from google.appengine.api import urlfetch


class Export(webapp2.RequestHandler):

  def get(self):
    access_token, _ = app_identity.get_access_token(
        'https://www.googleapis.com/auth/datastore')
    app_id = app_identity.get_application_id()
    timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')

    output_url_prefix = self.request.get('output_url_prefix')
    assert output_url_prefix and output_url_prefix.startswith('gs://')
    if '/' not in output_url_prefix[5:]:
      # Only a bucket name has been provided - no prefix or trailing slash
      output_url_prefix += '/' + timestamp
    else:
      output_url_prefix += timestamp

    entity_filter = {
        'kinds': self.request.get_all('kind'),
        'namespace_ids': self.request.get_all('namespace_id')
    }
    request = {
        'project_id': app_id,
        'output_url_prefix': output_url_prefix,
        'entity_filter': entity_filter
    }
    headers = {
        'Content-Type': 'application/json',
        'Authorization': 'Bearer ' + access_token
    }
    url = 'https://datastore.googleapis.com/v1/projects/%s:export' % app_id
    try:
      result = urlfetch.fetch(
          url=url,
          payload=json.dumps(request),
          method=urlfetch.POST,
          deadline=60,
          headers=headers)
      if result.status_code == httplib.OK:
        logging.info(result.content)
      elif result.status_code >= 500:
        logging.error(result.content)
      else:
        logging.warning(result.content)
      self.response.status_int = result.status_code
    except urlfetch.Error:
      logging.exception('Failed to initiate export.')
      self.response.status_int = httplib.INTERNAL_SERVER_ERROR


app = webapp2.WSGIApplication(
    [
        ('/cloud-datastore-export', Export),
    ], debug=True)

Deploy the app

  1. Make sure gcloud is configured for the correct project:

    gcloud config set project <PROJECT_NAME>
    
  2. From the same directory as your app.yaml file, deploy the app to your project:

    gcloud app deploy
    

Java

The following sample app assumes you have set up Maven with the App Engine plugin

Download the app

Download the java-docs-samples repository and navigate to the datastore-schedule-export app directory:

  1. Clone the sample app repository to your local machine:

    git clone https://github.com/GoogleCloudPlatform/java-docs-samples.git
    

    Alternatively, download the sample as a zip file and extract it.

  2. Navigate to the directory that contains the sample code:

    cd java-docs-samples/appengine-java8/datastore-schedule-export/
    

The app sets up a servlet in the DatastoreExportServlet.java file.

Deploy the app

  1. Make sure gcloud is configured for the correct project:

    gcloud config set project <PROJECT_NAME>
    
  2. Deploy the app to your project:

    mvn appengine:deploy
    

The service receives export requests at YOUR_PROJECT_ID.appspot.com/cloud-datastore-export and sends an authenticated request to the Datastore mode Admin API to begin the export.

The service uses the following URL parameters to configure the export request:

  • output_url_prefix (required): specifies where to save your Datastore mode export. If the URL ends with a /, it's used as is. Otherwise, the app adds a timestamp to the url.
  • kind (optional, multiple): restricts export to only these kinds.
  • namespace_id (optional, multiple): restricts export to only these namespaces.

Deploying the cron job

To set up a cron job that calls the schedule-datastore-exports app, create and deploy a cron.yaml file.

  1. Create a cron.yaml file with the following content:

      cron:
      - description: "Daily Cloud Datastore Export"
        url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME
        schedule: every 24 hours
    

    Replace BUCKET_NAME with the name of your Cloud Storage bucket.

    The example cron.yaml starts an export request of every entity once every 24 hours. For more scheduling options, see Schedule format.

    To export entities of only specific kinds, add kind parameters to the url value. Similarly, add a namespace_id parameter to export entities from specific namespaces. For example:

    Export entities of kind Song:

      url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME&kind=Song
    

    Export entities of kind Song and kind Album:

      url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME&kind=Song&kind=Album
    

    Export entities of kind Song and kind Album if they are in either the Classical namespace or the Pop namespace:

      url: /cloud-datastore-export?output_url_prefix=gs://BUCKET_NAME&namespace_id=Classical&namespace_id=Pop&kind=Song&kind=Album
    
  2. Deploy the cron job by running the following command in the same directory as your cron.yaml file:

      gcloud app deploy cron.yaml
    

Testing your cron app

You can test your deployed cron job by running the cron job early in the Cron Jobs page of the Google Cloud Platform Console:

  1. Open the Cron Jobs page in the GCP Console.
    Open the Cron Jobs page

  2. For the cron job with a description of Daily Cloud Datastore Export, click Run now.

  3. After the job completes, you can see the status message under Status. Click View to see the job log. The status message and job log provide information on whether the job succeeded or if it encountered errors.

Viewing your exports

After a cron job successfully completes, you can view the exports in your Cloud Storage bucket:

  1. Open the Cloud Storage browser in the GCP Console.
    Open the Cloud Storage browser

  2. In the list of buckets, click on the bucket that you created for your exports.

  3. Verify exports are listed in the bucket.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Datastore Documentation