Exporting and Importing Entities

This page describes how to export and import Google Cloud Datastore entities using the managed export and import service. The managed export and import service is available through the gcloud command-line tool and the Cloud Datastore Admin API (REST, RPC).

With the managed export and import service, you can recover from accidental deletion of data and export data for offline processing. You can export all entities or just specific kinds of entities. Likewise, you can import all data from an export or only specific kinds. As you use the managed export and import service, consider the following:

  • The export service uses eventually consistent reads. You cannot assume an export happens at a single point in time. The export might include entities written after the export begins and exclude entities written before the export begins.

  • An export does not contain any indexes. When you import data, the required indexes are automatically rebuilt using your database's current index definitions. Per-entity property value index settings are exported and honored during import.

  • Imports do not assign new IDs to entities. Imports use the IDs that existed at the time of the export and overwrite any existing entity with the same ID. During an import, the IDs are reserved during the time that the entities are being imported. This feature prevents ID collisions with new entities if writes are enabled while an import is running.

  • If an entity in your database is not affected by an import, it will remain in your database after the import.

  • Data exported from one Cloud Datastore database can be imported into another Cloud Datastore database.

  • The managed export and import service limits the number of concurrent exports and imports to 50 and allows a maximum of 20 export and import requests per minute for a project.

Before you begin

Before you can use the managed export and import service, you must complete the following tasks.

  1. Ensure that billing is enabled for your Google Cloud Platform project. Only GCP projects with billing enabled can use the export and import functionality. For more on billing, see Billing and pricing for imports and exports.

  2. Create a Cloud Storage bucket for your project using the same location as your Cloud Datastore location. All exports and imports rely on Cloud Storage and you must use the same location for your Cloud Storage bucket and Cloud Datastore.

  3. Assign an IAM role to your user account that grants the datastore.databases.export permission, if you are exporting data, or the datastore.databases.import permission, if you are importing data. The Cloud Datastore Import Export Admin role, for example, grants both permissions.

  4. Assign a Cloud Storage IAM role to your user account that grants read or write permissions for your Cloud Storage bucket.

Set up your environment

Before you export or import data, you must set up environment variables for the gcloud tool and authenticate using your user account.

  1. Set an environment variable for your GCP project ID.

    PROJECT_ID="YOUR_PROJECT_ID"
    
  2. Use this variable to set your project as the active configuration for the gcloud tool.

    gcloud config set project ${PROJECT_ID}
    
  3. Authenticate using the gcloud tool.

    gcloud auth login
    
  4. Set an environment variable for your Cloud Storage bucket ID.

    BUCKET="YOUR_BUCKET_NAME[/NAMESPACE_PATH]"
    

    where YOUR_BUCKET_NAME is the name of the Cloud Storage bucket and NAMESPACE_PATH is an optional Cloud Storage namespace path (this is not a Cloud Datastore namespace). For more information about Cloud Storage namespace paths, see Object name considerations.

Starting managed export and import operations

This section describes how to start a managed export or import operation and how to check on its progress.

Before you export or import entities, we recommend you disable Cloud Datastore writes. After the export or import completes, re-enable Cloud Datastore writes for your application.

Exporting entities

Use the command below to export all kinds in the default namespace. You can add the --async flag to prevent the gcloud tool from waiting for the operation to complete.

gcloud

gcloud datastore export --namespaces="(default)" gs://${BUCKET}

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1/projects/${PROJECT_ID}:export \
-d '{
  "outputUrlPrefix": "gs://'${BUCKET}'",
  "entityFilter": {
    "namespaceIds": [""],
  },
}'

To export a specific subset of kinds and/or namespaces, provide an entity filter with values for kinds and namespace IDs.

gcloud

gcloud datastore export --kinds="KIND1,KIND2" --namespaces="NAMESPACE1,NAMESPACE2" gs://${BUCKET}

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1/projects/${PROJECT_ID}:export \
-d '{
  "outputUrlPrefix": "gs://'${BUCKET}'",
  "entityFilter": {
    "kinds": ["KIND1", "KIND2", …],
    "namespaceIds": ["NAMESPACE1", "NAMESPACE2", …],
  },
}

Importing entities

Use the command below to import entities previously exported with the managed export and import service. You can add the --async flag to prevent the gcloud tool from waiting for the operation to complete.

gcloud

gcloud datastore import gs://${BUCKET}/[PATH]/[FILE].overall_export_metadata

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://datastore.googleapis.com/v1/projects/${PROJECT_ID}:import \
-d '{
  "inputUrl": "gs://'${BUCKET}'/[PATH]/[FILE].overall_export_metadata",
}'

You can determine the value to use for the import location by using the Cloud Storage UI in the Google Cloud Platform Console to view the bucket, or by examining the gcloud datastore export output or ExportEntitiesResponse after your export is complete. Here's an example value of an import location:

gcloud

gs://${BUCKET}/2017-05-25T23:54:39_76544/2017-05-25T23:54:39_76544.overall_export_metadata

protocol

"outputUrl": "gs://'${BUCKET}'/2017-05-25T23:54:39_76544/2017-05-25T23:54:39_76544.overall_export_metadata",

Asynchronous exports or imports

Exports and imports can take a long time. When you perform an export or import, you can provide the --async flag to prevent the gcloud tool from waiting for the operation to complete.

After you start an export or import operation, you can use the identifier returned by the gcloud tool to check the status of the operation. For example:

gcloud datastore operations describe ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI

If you forget the --async flag, you can also use Ctrl+c to stop waiting on an operation. Typing Ctrl+c will not cancel the operation.

Managing long-running operations

Long-running operations are method calls that may take a substantial amount of time to complete. Cloud Datastore creates long-running operations when you export or import data.

For example, when you start an export, the Cloud Datastore service creates a long-running operation to track the export status. Here's the output from the start of an export:

{
  "name": "projects/[YOUR_PROJECT_ID]/operations/ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI",
  "metadata": {
    "@type": "type.googleapis.com/google.datastore.admin.v1.ExportEntitiesMetadata",
    "common": {
      "startTime": "2017-05-25T23:54:39.583780Z",
      "operationType": "EXPORT_ENTITIES"
    },
    "progressEntities": {},
    "progressBytes": {},
    "entityFilter": {
      "namespaceIds": [
        ""
      ]
    },
    "outputUrlPrefix": "gs://[YOUR_BUCKET_NAME]"
  }
}

The value of the name field is the ID of a long-running operation.

Cloud Datastore provides an operations Admin API to check on the status of long-running operations, as well as cancel, delete, or list long-running operations:

Method Description
projects.operations.cancel Cancel a long-running operation.
projects.operations.delete Delete a long-running operation.

Note: Deleting an operation does not cancel it.
projects.operations.get Get the status of a long-running operation.
projects.operations.list List long-running operations.

Listing long-running operations

To list long-running operations, run the following:

gcloud

gcloud datastore operations list

Protocol

curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://datastore.googleapis.com/v1/projects/${PROJECT_ID}/operations

This example output shows a recently completed export operation. Operations are accessible for a few days after completion:

{
  "operations": [
    {
      "name": "projects/[YOUR_PROJECT_ID]/operations/ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI",
      "metadata": {
        "@type": "type.googleapis.com/google.datastore.admin.v1.ExportEntitiesMetadata",
        "common": {
          "startTime": "2017-12-05T23:01:39.583780Z",
          "endTime": "2017-12-05T23:54:58.474750Z",
          "operationType": "EXPORT_ENTITIES"
        },
        "progressEntities": {
          "workCompleted": "21933027",
          "workEstimated": "21898182"
        },
        "progressBytes": {
          "workCompleted": "12421451292",
          "workEstimated": "9759724245"
        },
        "entityFilter": {
          "namespaceIds": [
            ""
          ]
        },
        "outputUrlPrefix": "gs://[YOUR_BUCKET_NAME]"
      },
      "done": true,
      "response": {
        "@type": "type.googleapis.com/google.datastore.admin.v1.ExportEntitiesResponse",
        "outputUrl": "gs://[YOUR_BUCKET_NAME]/2017-05-25T23:54:39_76544/2017-05-25T23:54:39_76544.overall_export_metadata"
      }
    }
  ]
}

Use the input_url value when you import the entities.

Estimating the completion time

A request for the status of a long-running operation returns the metrics workEstimated and workCompleted. Each of these metrics is returned in both number of bytes and number of entities. workEstimated shows the estimated total number of bytes and entities an operation will process, based on Cloud Datastore Statistics. workCompleted shows the number of bytes and entities processed so far. After the operation completes, workCompleted reflects the total number of bytes and entities that were actually processed, which might be larger than the value of workEstimated.

Divide workCompleted by workEstimated for a rough progress estimate. The estimate might be inaccurate because it depends on delayed statistics collection.

For example, here is the progress status of an export operation:

{
  "operations": [
    {
      "name": "projects/[YOUR_PROJECT_ID]/operations/ASAyMDAwOTEzBxp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKKhI",
      "metadata": {
        "@type": "type.googleapis.com/google.datastore.admin.v1.ExportEntitiesMetadata",
        ...
        "progressEntities": {
          "workCompleted": "1",
          "workEstimated": "3"
        },
        "progressBytes": {
          "workCompleted": "85",
          "workEstimated": "257"
        },
        ...

Billing and pricing for managed exports and imports

You are required to enable billing for your Google Cloud Platform project before you use the managed export and import service. Export and import operations are charged for entity reads and writes at the rates listed in Cloud Datastore pricing.

The costs of export and import operations do not count towards the App Engine spending limit. Also, if you have set a Google Cloud Platform budget, the export or import operation will not trigger alerts until the operation is complete. Similarly, reads and writes performed during an export or import operation are applied to your daily quota after the operation is complete.

For information about billing, see Billing and Payments Support.

Permissions

To start export and import operations, your user account's IAM roles must grant the datastore.databases.export and datastore.databases.import permissions. The Cloud Datastore Import Export Admin role, for example, grants both permissions. Likewise, If you are issuing REST requests from the command line using curl, you must assign an IAM role granting these permissions to your user account. For more on Cloud Datastore permissions, see Identity and Access Management (IAM).

If you use the example cron app, its requests use the default service account of the GCP project. You must grant the service account the Cloud Datastore Import Export Admin role or another role that grants the datastore.databases.export permission.

Additionally, for all export requests, both the account making the request and the default service account for the GCP project, must have an IAM role that grants the following permissions for your Cloud Storage bucket:

Permission name Description
storage.buckets.get Read bucket metadata, excluding IAM policies.
storage.objects.create Add new objects to a bucket.
storage.objects.list List objects in a bucket. Also read object metadata, excluding ACLs, when listing.

For a list of Cloud Storage roles, see Cloud Storage IAM Roles. The Storage Admin role, for example, includes all the necessary Cloud Storage permissions for an export and can be applied to an entire project or a specific bucket.

For import requests, both the account making the request and the default service account for the GCP project, must have an IAM role that grants the following permissions for your Cloud Storage bucket:

Permission name Description
storage.objects.get Read object data and metadata, excluding ACLs.
storage.objects.list List objects in a bucket. Also read object metadata, excluding ACLs, when listing.

The Storage Object Viewer role grants all the required permissions for import.

Differences from Cloud Datastore Admin backups

If you previously used the Cloud Datastore Admin console for backups, you should note the following differences:

  • There is no GUI for the managed export and import service.

  • Exports created by a managed export do not appear in the Cloud Datastore Admin console. Managed exports and imports are a new service that does not share data with App Engine's backup and restore functionality, which is administered through the GCP Console.

  • The managed export and import service does not support the same metadata as the Cloud Datastore Admin backup and does not store progress status in your database. For information on checking the progress of export and import operations, see Managing long-running operations

  • You cannot view service logs of managed export and import operations.

  • The managed import service is backwards compatible with Cloud Datastore Admin backup files. You can import a Cloud Datastore Admin backup file using the managed import service, but you cannot import the output of a managed export using the Cloud Datastore Admin console.

Importing into BigQuery

To import data from a managed export into BigQuery, see Loading Data From Cloud Datastore Backups.

Limitations

  • Data exported without specifying an entity filter cannot be loaded into BigQuery. If you want to import data into BigQuery, your export request must include one or more kind names in the entity filter.

Send feedback about...

Cloud Datastore Documentation