BigQuery supports loading data from Datastore exports created using the Datastore managed import and export service. You can use the managed import and export service to export Datastore entities into a Cloud Storage bucket. You can then load the export into BigQuery as a table.
To learn how to create a Datastore export file, see Exporting and importing entities in the Datastore documentation. For information on scheduling exports, see Scheduling an export.
You can control which properties BigQuery should load by setting
the projectionFields
property in the API or by using the --projection_fields
flag in the bq
command-line tool.
If you prefer to skip the loading process, you can query the export directly by setting it up as an external data source. For more information, see External data sources.
When you load data from Cloud Storage into a BigQuery table, the dataset that contains the table must be in the same regional or multi- regional location as the Cloud Storage bucket.
Limitations
When you load data into BigQuery from a Datastore export, note the following restrictions:
- You cannot use a wildcard in the Cloud Storage URI when you specify a Datastore export file.
- You can specify only one Cloud Storage URI when loading data from Datastore exports.
- You cannot append Datastore export data to an existing table with a defined schema.
- For a Datastore export to load correctly, entities in the export data must share a consistent schema with fewer than 10,000 unique property names.
- Data exported without specifying an entity filter cannot be loaded into BigQuery. The export request must include one or more kind names in the entity filter.
- The maximum field size for Datastore exports is 64 KB. When you load a Datastore export, any field larger than 64 KB is truncated.
Required permissions
When you load data into BigQuery, you need permissions to run a load job and permissions that let you load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need permissions to access to the bucket that contains your data.
BigQuery permissions
At a minimum, the following permissions are required to load data into BigQuery. These permissions are required if you are loading data into a new table or partition, or if you are appending or overwriting a table or partition.
bigquery.tables.create
bigquery.tables.updateData
bigquery.jobs.create
The following predefined IAM roles include both
bigquery.tables.create
and bigquery.tables.updateData
permissions:
bigquery.dataEditor
bigquery.dataOwner
bigquery.admin
The following predefined IAM roles include bigquery.jobs.create
permissions:
bigquery.user
bigquery.jobUser
bigquery.admin
In addition, if a user has bigquery.datasets.create
permissions, when that
user creates a dataset, they are granted bigquery.dataOwner
access to it.
bigquery.dataOwner
access lets the user create and
update tables in the dataset by using a load job.
For more information on IAM roles and permissions in BigQuery, see Access control.
Cloud Storage permissions
To load data from a Cloud Storage bucket, you must be granted
storage.objects.get
permissions. If you are using a URI wildcard,
you must also have storage.objects.list
permissions.
The predefined IAM role storage.objectViewer
can be granted to provide both storage.objects.get
and storage.objects.list
permissions.
Loading Datastore export service data
To load data from a Datastore export metadata file:
Console
In the Cloud Console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
In the details panel, click Create table.
On the Create table page, in the Source section:
- For Create table from, select Cloud Storage.
- In the source field, enter the Cloud Storage URI.
The Cloud Storage bucket must be in the same location as
the dataset that contains the table you're creating.
The URI for your Datastore export file should end with
[KIND_NAME].export_metadata
orexport[NUM].export_metadata
. For example:default_namespace_kind_Book.export_metadata
. In this example,Book
is the kind name, anddefault_namespace_kind_Book
is the filename generated by Datastore. - For File format, select Datastore Backup
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Table name field, enter the name of the table you're creating in BigQuery.
Verify that Table type is set to Native table.
In the Schema section, no action is necessary. The schema is inferred from a Datastore export.
Select applicable items in the Advanced options section and then click Create table. For information on the available options, see Datastore options.
bq
Use the bq load
command with source_format
set to DATASTORE_BACKUP
.
Supply the --location
flag and set the value to your location.
bq --location=LOCATION load \
--source_format=FORMAT \
DATASET.TABLE \
PATH_TO_SOURCE
Replace the following:
LOCATION
: your location. The--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location by using the .bigqueryrc file.FORMAT
:DATASTORE_BACKUP
.DATASET
: the dataset that contains the table into which you're loading data.TABLE
: the table into which you're loading data. If the table does not exist, it is created.PATH_TO_SOURCE
: the Cloud Storage URI.
For example, the following command loads the
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
Datastore export file into a table named book_data
.
mybucket
and mydataset
were created in the US
multi-region location.
bq --location=US load \
--source_format=DATASTORE_BACKUP \
mydataset.book_data \
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
API
Set the following properties to load Datastore export data using the API.
Create a load job that points to the source data in Cloud Storage.
Specify your location in the
location
property in thejobReference
section of the job resource.The source URIs must be fully qualified, in the format gs://[BUCKET]/[OBJECT]. The file (object) name must end in
[KIND_NAME].export_metadata
. Only one URI is allowed for Datastore exports, and you cannot use a wildcard.Specify the data format by setting the configuration.load.sourceFormat property to
DATASTORE_BACKUP
.
Datastore Admin backups
If you use the Datastore Admin backup feature to export your
Datastore data, note that the file extension will be
.backup_info
instead of .export_metadata
. When you import
your data into BigQuery, you can use either a .backup_info
file or a .export_metadata
file until the Datastore Admin
backups service becomes unavailable.
Appending to or overwriting a table with Datastore data
When you load Datastore export data into BigQuery, you can create a new table to store the data, or you can overwrite an existing table. You cannot append Datastore export data to an existing table.
If you attempt to append Datastore export data to an existing
table, the following error results: Cannot append a datastore backup to a table
that already has a schema. Try using the WRITE_TRUNCATE write disposition to
replace the existing table
.
To overwrite an existing table with Datastore export data:
Console
In the Cloud Console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
In the details panel, click Create table.
On the Create table page, in the Source section:
For Create table from, select Cloud Storage.
In the source field, enter the Cloud Storage URI. The Cloud Storage bucket must be in the same location as the dataset that contains the table you're creating. The URI for your Datastore export file should end with
[KIND_NAME].export_metadata
. For example:default_namespace_kind_Book.export_metadata
. In this example,Book
is the kind name, anddefault_namespace_kind_Book
is the filename generated by Datastore.Verify
[KIND_NAME]
is specified in your Cloud Storage URI. If you specify the URI without[KIND_NAME]
, you receive the following error:does not contain valid backup metadata. (error code: invalid)
.For File format, select Datastore Backup
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Table name field, enter the name of the table you're creating in BigQuery.
Verify that Table type is set to Native table.
In the Schema section, no action is necessary. The schema is inferred from a Datastore export.
In the Advanced options section, for Write preference, select Overwrite table.
Click Create table.
bq
Use the bq load
command with the --replace
flag and with source_format
set to DATASTORE_BACKUP
. Supply the --location
flag and set the value to
your location.
bq --location=LOCATION load \
--source_format=FORMAT \
--replace \
DATASET.TABLE \
PATH_TO_SOURCE
Replace the following:
LOCATION
: your location. The--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value toasia-northeast1
. You can set a default value for the location by using the .bigqueryrc file.FORMAT
:DATASTORE_BACKUP
.DATASET
: the dataset containing the table into which you're loading data.TABLE
: the table you're overwriting.PATH_TO_SOURCE
: the Cloud Storage URI.
For example, the following command loads the
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
Datastore export file and overwrites a table named
book_data
:
bq load --source_format=DATASTORE_BACKUP \
--replace \
mydataset.book_data \
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
API
Set the following properties to load data from the API.
Create a load job that points to the source data in Cloud Storage.
Specify your location in the
location
property in thejobReference
section of the job resource.The source URIs must be fully qualified, in the format gs://[BUCKET]/[OBJECT]. The file (object) name must end in
[KIND_NAME].export_metadata
. Only one URI is allowed for Datastore exports, and you cannot use a wildcard.Specify the data format by setting the configuration.load.sourceFormat property to
DATASTORE_BACKUP
.Specify the write disposition by setting the configuration.load.writeDisposition property to
WRITE_TRUNCATE
.
Datastore options
To change how BigQuery parses Datastore export data, specify the following option:
Console option | bq tool flag |
BigQuery API property | Description |
---|---|---|---|
Not available | --projection_fields |
projectionFields | A comma-separated list that indicates which entity properties to load into BigQuery from a Datastore export. Property names are case sensitive and must be top-level properties. If no properties are specified, BigQuery loads all properties. If any named property isn't found in the Datastore export, an invalid error is returned in the job result. The default value is ''. |
Data type conversion
BigQuery converts data from each entity in Datastore export files to BigQuery data types. The following table describes the conversion between data types.
Datastore data type | BigQuery data type |
---|---|
Blob | BYTES |
Blobstore key | STRING |
Boolean | BOOLEAN |
Category | STRING |
Datastore key | RECORD |
Date and time | TIMESTAMP |
STRING | |
Embedded entity | RECORD |
Floating-point number | FLOAT |
Geographical point |
RECORD [{"lat","DOUBLE"}, {"long","DOUBLE"}] |
IM handle | STRING |
Integer | INTEGER |
Link | STRING |
Phone number | STRING |
Postal address | STRING |
Rating | INTEGER |
Short blob | BYTES |
String | STRING (truncated to 64 KB) |
User |
RECORD [{"email","STRING"} {"userid","STRING"}] |
Datastore key properties
Each entity in Datastore has a unique key that contains
information such as the namespace and the path. BigQuery creates
a RECORD
data type for the key, with nested fields for each piece of
information, as described in the following table.
Key property | Description | BigQuery data type |
---|---|---|
__key__.app |
The Datastore app name. | STRING |
__key__.id |
The entity's ID, or null if __key__.name is
set. |
INTEGER |
__key__.kind |
The entity's kind. | STRING |
__key__.name |
The entity's name, or null if __key__.id is
set. |
STRING |
__key__.namespace |
If the Datastore app uses a custom namespace, the entity's namespace. Else, the default namespace is represented by an empty string. | STRING |
__key__.path |
The flattened
ancestral path of the entity,
consisting of the sequence of kind-identifier pairs from the root entity
to the entity itself. For example: "Country", "USA", "PostalCode",
10011, "Route", 1234 . |
STRING |