Exporting and Importing Entities

This page describes how to export and import the entities that your application stores in Google Cloud Datastore.

Before you begin

  1. Ensure that you are using a billable account for your Google Cloud Platform project. Only Cloud Platform projects with billable accounts can use the export and import functionality. For information about billing, see Billing and Payments Support.

  2. If you haven't already, create a Google Cloud Storage bucket for your project using the same location as your Cloud Datastore location. All exports and imports rely on Cloud Storage.

  3. Assign the appropriate datastore.databases.export or datastore.databases.import permission to the account that you will use for the export or import. Alternatively, assign the Cloud Datastore Import Export Admin role to the account that you will use for the export or import. This role has datastore.databases.export and datastore.databases.import permission, as well as several other permissions.

  4. Assign the appropriate Cloud Storage bucket read or write permission to the account that you will use for the export or import.

Set up your environment

Set up environment variables to simplify running the commands, and authenticate using the gcloud command line tool.

  1. Set an environment variable for your Cloud Platform project ID.

    PROJECT_ID="YOUR_PROJECT_ID"
    
  2. Use this variable to set your project as the active gcloud configuration.

    gcloud config set project ${PROJECT_ID}
    
  3. Authenticate using the gcloud command-line tool.

    gcloud auth login
    
  4. Set an environment variable for your Cloud Storage bucket ID.

    BUCKET="YOUR_BUCKET_NAME[/NAMESPACE_PATH]"
    

    where BUCKET_NAME is the name of the Cloud Storage bucket and NAMESPACE_PATH is an optional Cloud Storage namespace path (this is not a Cloud Datastore namespace). For more information about Cloud Storage namespace paths, see Object name considerations.

Exporting entities

To export all kinds in the default namespace, run the following:

gcloud

gcloud beta datastore export --namespaces="(default)" gs://${BUCKET}

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1beta1/projects/${PROJECT_ID}:export \
-d '{
  "outputUrlPrefix": "gs://'${BUCKET}'",
  "entityFilter": {
    "namespaceIds": [""],
  },
}'

To export a specific subset of kinds and/or namespaces, provide an entity filter with values for kinds and namespace IDs.

gcloud

gcloud beta datastore export --kinds="KIND1, KIND2" --namespaces="NAMESPACE1, NAMESPACE2" gs://${BUCKET}

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1beta1/projects/${PROJECT_ID}:export \
-d '{
  "outputUrlPrefix": "gs://'${BUCKET}'",
  "entityFilter": {
    "kinds": ["KIND1", "KIND2", …],
    "namespaceIds": ["NAMESPACE1", "NAMESPACE2", …],
  },
}

A recommended practice before you export entities is to disable Cloud Datastore writes. After the export is complete, re-enable Cloud Datastore writes for your application.

Importing entities

To import entities:

gcloud

gcloud beta datastore import gs://${BUCKET}/[PATH]/[FILE].overall_export_metadata

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1beta1/projects/${PROJECT_ID}:import \
-d '{
  "inputUrl": "gs://'${BUCKET}'/[PATH]/[FILE].overall_export_metadata",
}'

You can determine the value to use for the import location by using the Cloud Storage UI in the Google Cloud Platform Console to view the bucket, or by examining the gcloud beta datastore export output or ExportEntitiesResponse after your export is complete. Here's an example value of an import location:

gcloud

gs://${BUCKET}/2017-05-25T23:54:39_76544/2017-05-25T23:54:39_76544.overall_export_metadata

protocol

"outputUrl": "gs://'${BUCKET}'/2017-05-25T23:54:39_76544/2017-05-25T23:54:39_76544.overall_export_metadata",

A recommended practice before you import entities is to disable Cloud Datastore writes. After the import is complete, re-enable Cloud Datastore writes for your application.

Managing long-running operations

Long-running operations are method calls that may take a substantial amount of time to complete. Cloud Datastore creates long-running operations when you export or import data.

For example, when you start an export, the Cloud Datastore service creates a long-running operation to track the export status. Here's the output from the start of an export:

{
  "name": "projects/[YOUR_PROJECT_ID]/operations/ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI",
  "metadata": {
    "@type": "type.googleapis.com/google.datastore.admin.v1beta1.ExportEntitiesMetadata",
    "common": {
      "startTime": "2017-05-25T23:54:39.583780Z",
      "operationType": "EXPORT_ENTITIES"
    },
    "progressEntities": {},
    "progressBytes": {},
    "entityFilter": {
      "namespaceIds": [
        ""
      ]
    },
    "outputUrlPrefix": "gs://[YOUR_BUCKET]"
  }
}

The value of the name field is the ID of a long-running operation.

Cloud Datastore provides operations APIs that allow you to check the status of long-running operations, as well as cancel, delete, or list long-running operations:

Method Description
projects.operations.cancel Cancel a long-running operation.
projects.operations.delete Delete a long-running operation.

Note: Deleting an operation does not cancel it.
projects.operations.get Get the status of a long-running operation.
projects.operations.list List long-running operations.

To list long-running operations, run the following:

gcloud

gcloud beta datastore operations list

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://datastore.googleapis.com/v1/projects/${PROJECT_ID}/operations

This example output shows a recently completed export operation (note that operations will last only for a few days following an export):

{
  "operations": [
    {
      "name": "projects/[YOUR_PROJECT_ID]/operations/ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI",
      "metadata": {
        "@type": "type.googleapis.com/google.datastore.admin.v1beta1.ExportEntitiesMetadata",
        "common": {
          "startTime": "2017-05-25T23:54:39.583780Z",
          "endTime": "2017-05-25T23:54:58.474750Z",
          "operationType": "EXPORT_ENTITIES"
        },
        "progressEntities": {
          "workCompleted": "3",
          "workEstimated": "1"
        },
        "progressBytes": {
          "workCompleted": "257",
          "workEstimated": "85"
        },
        "entityFilter": {
          "namespaceIds": [
            ""
          ]
        },
        "outputUrlPrefix": "gs://[YOUR_BUCKET]"
      },
      "done": true,
      "response": {
        "@type": "type.googleapis.com/google.datastore.admin.v1beta1.ExportEntitiesResponse",
        "outputUrl": "gs://[YOUR_BUCKET]/2017-05-25T23:54:39_76544/2017-05-25T23:54:39_76544.overall_export_metadata"
      }
    }
  ]
}

Use the input_url value when you import the entities.

Asynchronous exports or imports

Exports and imports can take a long time. When you perform an export or import, you can provide the --async flag to prevent gcloud from waiting for the operation to complete.

Later to check the status of the operation, you can use the name returned from the async call to get the operation. For example:

gcloud beta datastore operations describe ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI

If you forget the --async flag, you can also use ctrl+c to stop waiting on an operation. This will not cancel the operation.

Export and import considerations

The export and import features help you recover from accidental deletion of data and enable you to export data for offline processing. You can export all entities or just specific kinds of entities. Likewise, you can import all data from an export or only specific kinds.

The export service uses eventually consistent reads. You cannot assume an export happens at a single point in time. The export might include entities written after the export begins and exclude entities written before the export begins.

An export does not contain any indexes. When you import, the required indexes are automatically rebuilt using the index definitions currently defined for your application. Per entity property value index settings are exported and honored during import.

Imports do not assign new IDs to entities. Imports use the IDs that existed at the time of the export and overwrite any existing entity with the same ID. During an import, the IDs are reserved as the entities are being imported, preventing ID collisions with new entities if writes are enabled while an import is running. Entities not affected by an import are retained.

Data exported from one application can be imported into another application.

Additional information

How much will I be charged?

As noted in Before you begin, a billable account is a requirement for export and import operations. Export and import operations are charged for entity reads and writes at the rates listed in Cloud Datastore pricing.

How do I fix permission errors?

As described in Before you begin, the account making the request must have the datastore.databases.export or datastore.databases.import permission as appropriate for the request. These permissions can be granted by assigning the Cloud Datastore Import Export Admin role to the account performing the operation. If you are issuing REST requests via curl on the CLI, then the account issuing the requests must be granted this role.

If you use the example cron app, requests use the default service account of the Cloud Platform project. You must grant the service account the Cloud Datastore Import Export Admin role or another role that grants the datastore.databases.export permission.

Additionally, for all export requests, both the account making the request and the default service account for the Cloud Platform project, must have the following permissions granted on the bucket you export into:

Permission name Description
storage.buckets.get Read bucket metadata, excluding IAM policies.
storage.objects.create Add new objects to a bucket.
storage.objects.list List objects in a bucket. Also read object metadata, excluding ACLs, when listing.

For import requests, the account making the request and the default service account for the Cloud Platform project require these permissions on the bucket being read from:

Permission name Description
storage.objects.get Read object data and metadata, excluding ACLs.
storage.objects.list List objects in a bucket. Also read object metadata, excluding ACLs, when listing.

The Storage Object Viewer role grants all the required permissions for import.

How do I import my data into BigQuery?

The exported kind metadata files have a name with the pattern:

gs://[YOUR_BUCKET]/[PATH]/[NAMESPACE]/[KIND]/[NAMESPACE]_[KIND].export_metadata
and can be imported into BigQuery using the same steps documented for the .backup_info files.

Is there a UI for the managed export and import service?

Not at this time.

Are backups created by an export displayed in the Datastore Admin console?

No. This is a new and completely managed service that does not share data with the generally available console-based backup and restore functionality that is part of the Google App Engine built-in service.

Does the managed service support the same metadata as the Datastore Admin backup?

No. The new service does not store any state in your application's data store. Current export/import operation state can be accessed with the GetOperation and ListOperation methods.

Is there a way to check the logs of managed export/import operations?

As this is a managed service, the export/import operations no longer run within your application, so the logs are not directly accessible.

What is workEstimated, and why is it different from workCompleted?

workEstimated (either bytes or entities) is the estimated number of bytes or entities that will be exported or imported, based on the Cloud Datastore Statistics collection run in the past 24-48 hours.

When querying an export or import operation while it is still in progress, workCompleted is the number of bytes or entities read or written up to that point in time. Once the operation has completed, workCompleted will reflect the total number of bytes or entities read or written. Using workCompleted divided by workEstimated will provide a rough progress estimate, with the caveat that the estimate may be inaccurate due to the delay between the most recent statistics collection and the dataset being operated on.

Send feedback about...

Cloud Datastore Documentation