Loading Data From Cloud Datastore

BigQuery supports loading data from Google Cloud Datastore backups. In Cloud Datastore, you can back up each entity type, also known as a kind, into a set of backup files. You can then load the information into BigQuery as a table. You can control which properties BigQuery should load by setting the projectionFields property.

If you prefer to skip the loading process, you can query the backup directly by setting it up as a federated data source. For more information, see Federated Data Sources.

Access control

Loading data into BigQuery requires the following access levels.

Product

Access

BigQuery

WRITE access for the dataset that contains the destination table. For more information, see access control.

Google Cloud Datastore

READ access to the Cloud Datastore backup objects in Google Cloud Storage. For more information, see Access Control - Google Cloud Storage.

Data type conversion

BigQuery converts data from each entity in Cloud Datastore backup files to BigQuery's data types. The following table describes the conversion between data types.

Cloud Datastore data type BigQuery data type
Blob BigQuery discards these values when loading the data.
Blobstore key STRING
Boolean BOOLEAN
Category STRING
Datastore key RECORD
Date and time TIMESTAMP
Email STRING
Embedded entity RECORD
Floating-point number DOUBLE
Geographical point

RECORD

[{"lat","DOUBLE"},
 {"long","DOUBLE"}]
        
IM handle STRING
Integer INTEGER
Link STRING
Phone number STRING
Postal address STRING
Rating INTEGER
Short blob BigQuery discards these values when loading the data.
String STRING (truncated to 64 KB)
User

RECORD

[{"email","STRING"}
 {"userid","STRING"}]
        

Datastore key properties

Each entity in Cloud Datastore has a unique key that contains information such as the namespace and the path. BigQuery creates a RECORD data type for the key, with nested fields for each piece of information, as described in the following table.

Key property Description BigQuery data type
__key__.app The Cloud Datastore app name. STRING
__key__.id The entity's ID, or null if __key__.name is set. INTEGER
__key__.kind The entity's kind. STRING
__key__.name The entity's name, or null if __key__.id is set. STRING
__key__.namespace If the Cloud Datastore app uses a custom namespace, the entity's namespace. Else, the default namespace is represented by an empty string. STRING
__key__.path The flattened ancestral path of the entity, consisting of the sequence of kind-identifier pairs from the root entity to the entity itself. For example: "Country", "USA", "PostalCode", 10011, "Route", 1234. STRING

Creating a Cloud Datastore backup

  1. Create a Cloud Storage bucket:
    1. In the Cloud Platform Console, go to the Cloud Storage browser.

      Go to the Cloud Storage browser

    2. Click Create bucket.
    3. In the Create bucket dialog, specify the following attributes:
    4. Click Create.
  2. Export Datastore entities to the Cloud Storage bucket:
    1. Go to the Datastore Admin page in the Cloud Platform Console

      Go to the Datastore Admin page

    2. Click Enable Datastore Admin if it is not currently enabled.
    3. Click Open Datastore Admin.
    4. Select one or more of the entity kinds that you want to export, then click Backup Entities.
    5. Select Google Cloud Storage for the backup storage destination. Enter the required bucket name in the format /gs/my_bucket.
    6. Click Backup Entities.

Cloud Datastore creates multiple objects in Google Cloud Storage for a backup of a single kind. The object you'll need for the next steps ends with <kind_name>.backup_info.

Loading data using the BigQuery web UI

  1. Create a Cloud Datastore backup.
  2. Open the BigQuery web UI.

  3. If needed, create a new dataset.
  4. In the navigation, hover on the dataset ID that you wish to use. Click the down arrow icon next to the ID and click Create new table.

  5. Select Google Cloud Storage from the Location choices and enter your bucket name in the format: YOUR_BUCKET_NAME/xxxxxxxxx.kind_name.info
  6. Select Cloud Datastore Backup from the File format choices.
  7. Provide a table name. Click the question mark icon to see naming limitations.
  8. Click Create Table.

BigQuery will now create a table and load the exported entity data into it. While BigQuery loads the data, a (loading) string displays after your table name in the navigation. The string disappears after the data has been fully loaded.

Loading data using the BigQuery API or command-line tool

Set the following properties to load data from the API or the command-line tool.

API

  • Set sourceFormat to DATASTORE_BACKUP
  • Set sourceUris to the full path of the Cloud Datastore backup file that ends with <kind_name>.backup_info. The full bucket name format is gs://bucket_name/<Cloud Datastore backup file>

Command-line tool

  • Set source_format to DATASTORE_BACKUP.
  • Set uri to the full path of the Cloud Datastore backup file that ends with <kind_name>.backup_info. The full bucket name format is gs://bucket_name/<Cloud Datastore backup file>

Send feedback about...

BigQuery Documentation