Hide
BigQuery

Loading Data From Cloud Datastore

BigQuery supports loading data from Google Cloud Datastore backups. In Cloud Datastore, you can back up each entity type, also known as a kind, into a set of backup files. You can then load the information into BigQuery as a table. You can control which properties BigQuery should load by setting the projectionFields property.

Contents

Access control

Loading data into BigQuery requires the following access levels.

Product

Access

BigQuery

WRITE access for the dataset that contains the destination table. For more information, see access control.

Google Cloud Datastore

READ access to the Cloud Datastore backup objects in Google Cloud Storage. For more information, see Access Control - Google Cloud Storage.

Back to top

Data type conversion

BigQuery converts data from each entity in Cloud Datastore backup files to BigQuery's data types. The following table describes the conversion between data types.

Cloud Datastore data type BigQuery data type
Blob BigQuery discards these values when loading the data.
Blobstore key STRING
Boolean BOOLEAN
Category STRING
Datastore key RECORD
Date and time TIMESTAMP
Email STRING
Embedded entity RECORD
Floating-point number DOUBLE
Geographical point

RECORD

[{"lat","DOUBLE"},
 {"long","DOUBLE"}]
        
IM handle STRING
Integer INTEGER
Link STRING
Phone number STRING
Postal address STRING
Rating INTEGER
Short blob BigQuery discards these values when loading the data.
String STRING (truncated to 64 KB)
User

RECORD

[{"email","STRING"}
 {"userid","STRING"}]
        

Datastore key properties

Each entity in Cloud Datastore has a unique key that contains information such as the namespace and the path. BigQuery creates a RECORD data type for the key, with nested fields for each piece of information, as described in the following table.

Key property Description BigQuery data type
__key__.app The Cloud Datastore app name. STRING
__key__.id The entity's ID, or null if __key__.name is set. INTEGER
__key__.kind The entity's kind. STRING
__key__.name The entity's name, or null if __key__.id is set. STRING
__key__.namespace If the Cloud Datastore app uses a custom namespace, the entity's namespace. Else, the default namespace is represented by an empty string. STRING
__key__.path The flattened ancestral path of the entity, consisting of the sequence of kind-identifier pairs from the root entity to the entity itself. For example: "Country", "USA", "PostalCode", 10011, "Route", 1234. STRING

Back to top

Creating a Cloud Datastore backup

  1. Navigate to your app in Admin Console.
  2. Click Datastore Admin in the left navigation.
  3. Check the checkbox of the entity kind that you want to export, then click Backup Entities.
  4. Select Google Cloud Storage for the backup storage destination. Enter the required bucket name in the format /gs/my_bucket.
  5. Click Backup Entities.

Cloud Datastore creates multiple objects in Google Cloud Storage for a backup of a single kind. The object you'll need for the next steps ends with <kind_name>.backup_info.

Back to top

Loading data using the BigQuery web UI

  1. Create a Cloud Datastore backup.
  2. Open the BigQuery web UI.

  3. If needed, create a new dataset.
  4. In the navigation, hover on the dataset ID that you wish to use. Click the down arrow icon next to the ID and click Create new table.

  5. Input a table ID. Click the question mark icon to see ID limitations. After inputting the ID, click the Next button.

  6. Select Cloud Datastore Backup for Source Format.

  7. Input the Cloud Storage bucket name. To find the bucket name:

    1. In a separate tab or window, navigate to Google Developers Console.
    2. Access the project that owns the Google Cloud Storage bucket where your backup resides.
    3. Click Storage > Storage browser, then click the bucket name that you specified in creating a Cloud Datastore backup.
    4. Each Cloud Datastore backup includes three objects. Copy the name of the object that ends with <kind_name>.backup_info.

    The full bucket name format is gs://bucket_name/<copied object name from step d>.

  8. Click the Submit button.

Once you complete the above steps, BigQuery creates a table and loads the data into it. While BigQuery loads the data, a (loading) string displays after your table name in the navigation. The string disappears once the data has been fully loaded.

Back to top

Loading data using the BigQuery API or command-line tool

Set the following properties to load data from the API or the command-line tool.

API

  • Set sourceFormat to DATASTORE_BACKUP
  • Set sourceUris to the full path of the Cloud Datastore backup file that ends with <kind_name>.backup_info. The full bucket name format is gs://bucket_name/<Cloud Datastore backup file>

Command-line tool

  • Set source_format to DATASTORE_BACKUP.
  • Set uri to the full path of the Cloud Datastore backup file that ends with <kind_name>.backup_info. The full bucket name format is gs://bucket_name/<Cloud Datastore backup file>

Back to top