Loading data from Firestore exports
BigQuery supports loading data from Firestore exports created using the Firestore managed import and export service. The managed import and export service exports Firestore documents into a Cloud Storage bucket. You can then load the exported data into a BigQuery table.
Limitations
When you load data into BigQuery from a Firestore export, note the following restrictions:
- Your dataset must be in the same regional or multi-regional location as the Cloud Storage bucket containing your export files.
- You can specify only one Cloud Storage URI, and you cannot use a URI wildcard.
- For a Firestore export to load correctly, documents in the export data must share a consistent schema with fewer than 10,000 unique field names.
- You can create a new table to store the data, or you can overwrite an existing table. You cannot append Firestore export data to an existing table.
- Your export command
must specify a
collection-ids
filter. Data exported without specifying a collection ID filter cannot be loaded into BigQuery.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissions
To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.
Permissions to load data into BigQuery
To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create
Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:
roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin
(includes thebigquery.jobs.create
permission)bigquery.user
(includes thebigquery.jobs.create
permission)bigquery.jobUser
(includes thebigquery.jobs.create
permission)
Additionally, if you have the bigquery.datasets.create
permission, you can create and
update tables using a load job in the datasets that you create.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Permissions to load data from Cloud Storage
To load data from a Cloud Storage bucket, you need the following IAM permissions:
storage.objects.get
storage.objects.list
(required if you are using a URI wildcard)
The predefined IAM role roles/storage.objectViewer
includes all the permissions you need in order to load data from a Cloud Storage bucket.
Loading Firestore export service data
You can load data from a Firestore export metadata file by using
the Cloud console, bq
command-line tool, or
API.
Sometimes Datastore terminology is used in the Cloud console
and the bq
command-line tool, but the following procedures are compatible with
Firestore export files. Firestore and Datastore share
an export format.
Console
In the Cloud console, open the BigQuery page.
In the Explorer panel, expand your project and select a dataset.
Expand the
Actions option and click Open.In the details panel, click Create table
.On the Create table page, in the Source section:
For Create table from, select Cloud Storage
In the source field, enter the Cloud Storage URI. The Cloud Storage bucket must be in the same location as your dataset. The URI for your Firestore export file should end with
KIND_COLLECTION_ID.export_metadata
. For example:default_namespace_kind_Book.export_metadata
. In this example,Book
is the collection ID, anddefault_namespace_kind_Book
is the file name generated by Firestore.Verify
KIND_COLLECTION_ID
is specified in your Cloud Storage URI. If you specify the URI withoutKIND_COLLECTION_ID
, you receive the following error:does not contain valid backup metadata. (error code: invalid)
.For File format, select Datastore Backup. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.
On the Create table page, in the Destination section:
For Dataset name, choose the appropriate dataset.
In the Table name field, enter the name of the table you're creating in BigQuery.
Verify that Table type is set to Native table.
In the Schema section, no action is necessary. The schema is inferred for a Firestore export.
Select applicable items in the Advanced options section. If you are overwriting an existing table, set Write preference to Overwrite table.
Click Create table.
bq
Use the bq load
command with source_format
set to DATASTORE_BACKUP
.
Supply the --location
flag and set the value to your
location. If you are overwiting
an existing table, add the --replace
flag.
To load only specific fields, use the --projection_fields flag.
bq --location=LOCATION load \
--source_format=FORMAT \
DATASET.TABLE \
PATH_TO_SOURCE
Replace the following:
LOCATION
: your location. The--location
flag is optional.FORMAT
:DATASTORE_BACKUP
. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.DATASET
: the dataset that contains the table into which you're loading data.TABLE
: the table into which you're loading data. If the table doesn't exist, it is created.PATH_TO_SOURCE
: the Cloud Storage URI.
For example, the following command loads the
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
Firestore export file into a table named book_data
.
mybucket
and mydataset
were created in the US
multi-region location.
bq --location=US load \
--source_format=DATASTORE_BACKUP \
mydataset.book_data \
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
API
Set the following properties to load Firestore export data using the API.
Create a
load
job configuration that points to the source data in Cloud Storage.Specify your location in the
location
property in thejobReference
section of the job resource.The
sourceUris
must be fully qualified, in the formatgs://BUCKET/OBJECT
in the load job configuration. The file (object) name must end inKIND_NAME.export_metadata
. Only one URI is allowed for Firestore exports, and you cannot use a wildcard.Specify the data format by setting the
sourceFormat
property toDATASTORE_BACKUP
in the load job configuration. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.To load only specific fields, set the
projectionFields
property.If you are overwriting an existing table, specify the write disposition by setting the
writeDisposition
property toWRITE_TRUNCATE
.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
Firestore options
To change how BigQuery parses Firestore export data, specify the following option:
Cloud console option | `bq` flag | BigQuery API property | Description |
---|---|---|---|
Not available | --projection_fields |
projectionFields
(Java,
Python) |
(Optional) A comma-separated list that indicates which document fields
to load from a Firestore export. By default,
BigQuery loads all fields. Field names are
case-sensitive and must be present in the export. You cannot specify field
paths within a map field such as map.foo .
|
Data type conversion
BigQuery converts data from each document in Firestore export files to BigQuery data types. The following table describes the conversion between supported data types.
Firestore data type | BigQuery data type |
---|---|
Array | RECORD |
Boolean | BOOLEAN |
Reference | RECORD |
Date and time | TIMESTAMP |
Map | RECORD |
Floating-point number | FLOAT |
Geographical point |
RECORD [{"lat","FLOAT"}, {"long","FLOAT"}] |
Integer | INTEGER |
String | STRING (truncated to 64 KB) |
Firestore key properties
Each document in Firestore has a unique key that contains
information such as the document ID and the document path.
BigQuery creates a RECORD
data type (also known as a
STRUCT
)
for the key, with nested fields for each piece of information, as described in
the following table.
Key property | Description | BigQuery data type |
---|---|---|
__key__.app |
The Firestore app name. | STRING |
__key__.id |
The document's ID, or null if __key__.name
is set. |
INTEGER |
__key__.kind |
The document's collection ID. | STRING |
__key__.name |
The document's name, or null if __key__.id
is set. |
STRING |
__key__.namespace |
Firestore does not support custom namespaces. The default namespace is represented by an empty string. | STRING |
__key__.path |
The path of the document: the sequence of the document and the
collection pairs from the root collection. For example: "Country",
"USA", "PostalCode", 10011, "Route", 1234 . |
STRING |