Google BigQuery supports loading data from Cloud Firestore exports created using the Cloud Firestore managed import and export service. The managed import and export service exports Cloud Firestore documents into a Cloud Storage bucket. You can then load the exported data into a BigQuery table.
Required permissions
When you load data into BigQuery, you need project or dataset-level permissions that allow you to load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need access to the bucket that contains your data.
BigQuery permissions
When you are loading data into BigQuery from Cloud Storage, you
must be granted the bigquery.dataOwner
or bigquery.dataEditor
role at the
project level or at the dataset level. Both roles grant users and groups
permission to load data into a new table or to append to or overwrite an
existing table.
Granting the roles at the project level gives the user or group permission to load data into tables in every dataset in the project. Granting the roles at the dataset level gives the user or group the ability to load data only into tables in that dataset.
For more information on configuring dataset access, see Controlling access to datasets. For more information on IAM roles in BigQuery, see Access Control.
Cloud Storage permissions
In order to load data from a Cloud Storage bucket, you must be granted
storage.objects.get
permissions at the project level or on that individual
bucket. If you are using a URI wildcard, you must also have
storage.objects.list
permissions.
The predefined IAM role storage.objectViewer
can be granted to provide storage.objects.get
and storage.objects.list
permissions.
Limitations
When you load data into BigQuery from a Cloud Firestore export, note the following restrictions:
- Your dataset must be in the same regional or multi-regional location as the Cloud Storage bucket containing your export files.
- You can specify only one Cloud Storage URI, and you cannot use a URI wildcard.
- For a Cloud Firestore export to load correctly, documents in the export data must share a consistent schema with fewer than 10,000 unique field names.
- You can create a new table to store the data, or you can overwrite an existing table. You cannot append Cloud Firestore export data to an existing table.
- If you plan to load a Cloud Firestore export into
BigQuery, you must specify a
collection-ids
filter in your export command. Data exported without specifying a collection ID filter cannot be loaded into BigQuery.
Loading Cloud Firestore export service data
You can load data from a Cloud Firestore export metadata file by using
the BigQuery web UI, bq
command line tool, or
API.
Sometimes Cloud Datastore terminology is used in the UI or or in the commands, but the following procedures are compatible with Cloud Firestore export files. Cloud Firestore and Cloud Datastore share an export format.
Classic UI
- Go to the classic BigQuery web UI.
Go to the BigQuery web UI - In the navigation panel, hover on a dataset, click the down arrow
icon
, and click Create new table.
On the Create Table page, in the Source Data section:
- Leave Create from source selected.
For Location, select Cloud Storage and in the source field, enter the Cloud Storage URI. The Cloud Storage bucket must be in the same location as your dataset. The URI for your Cloud Firestore export file should end with
[KIND_COLLECTION_ID].export_metadata
. For example:default_namespace_kind_Book.export_metadata
. In this example,Book
is the collection ID, anddefault_namespace_kind_Book
is the file name generated by Cloud Firestore.Verify
[KIND_COLLECTION_ID]
is specified in your Cloud Storage URI. If you specify the URI without[KIND_COLLECTION_ID]
, you receive the following error:does not contain valid backup metadata. (error code: invalid)
.For File format, select Cloud Datastore Backup. Cloud Datastore Backup is the correct option for Cloud Firestore. Cloud Firestore and Cloud Datastore share an export format.
On the Create Table page, in the Destination Table section:
- For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're creating in BigQuery.
- Verify that Table type is set to Native table.
In the Schema section, no action is necessary. The schema is inferred for a Cloud Firestore export.
Select applicable items in the Options section. If you are overwriting an existing table, set Write preference to Overwrite table.
Click Create Table.
Command-line
Use the bq load
command with source_format
set to DATASTORE_BACKUP
.
Supply the --location
flag and set the value to your
location. If you are overwiting
an existing table, add the --replace
flag.
To load only specific fields, use the --projection_fields flag.
bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE]
where:
[LOCATION]
is your location. The--location
flag is optional.[FORMAT]
isDATASTORE_BACKUP
. Cloud Datastore Backup is the correct option for Cloud Firestore. Cloud Firestore and Cloud Datastore share an export format.[DATASET]
is the dataset that contains the table into which you're loading data.[TABLE]
is the table into which you're loading data. If the table does not exist, it is created.[PATH_TO_SOURCE]
is the Cloud Storage URI.
For example, the following command loads the
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
Cloud Firestore export file into a table named book_data
.
mybucket
and mydataset
were created in the US
multi-region location.
bq --location=US load --source_format=DATASTORE_BACKUP mydataset.book_data gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata
API
Set the following properties to load Cloud Firestore export data using the API.
Create a load job that points to the source data in Cloud Storage.
Specify your location in the
location
property in thejobReference
section of the job resource.The source URIs must be fully-qualified, in the format gs://[BUCKET]/[OBJECT]. The file (object) name must end in
[KIND_NAME].export_metadata
. Only one URI is allowed for Cloud Firestore exports, and you cannot use a wildcard.Specify the data format by setting the configuration.load.sourceFormat property to
DATASTORE_BACKUP
. Cloud Datastore Backup is the correct option for Cloud Firestore. Cloud Firestore and Cloud Datastore share an export format.To load only specific fields, set the projectionFields property.
If you are overwriting an existing table, specify the write disposition by setting the configuration.load.writeDisposition property to
WRITE_TRUNCATE
.
Cloud Firestore options
To change how BigQuery parses Cloud Firestore export data, specify the following options:
CSV option | Classic UI option | CLI flag | BigQuery API property | Description |
---|---|---|---|---|
Projection fields | None | --projection_fields |
projectionFields | (Optional) A comma-separated list that indicates which document fields
to load from a Cloud Firestore export. By default,
BigQuery loads all fields. Field names are case
sensitive and must be present in the export. You cannot specify field
paths within a map field such as map.foo .
|
Number of bad records allowed | Number of errors allowed | --max_bad_records |
maxBadRecords | (Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. |
Data type conversion
BigQuery converts data from each document in Cloud Firestore export files to BigQuery's data types. The following table describes the conversion between data types.
Cloud Firestore data type | BigQuery data type |
---|---|
Array | RECORD |
Boolean | BOOLEAN |
Reference | RECORD |
Date and time | TIMESTAMP |
Map | RECORD |
Floating-point number | FLOAT |
Geographical point |
RECORD [{"lat","FLOAT"}, {"long","FLOAT"}] |
Integer | INTEGER |
String | STRING (truncated to 64 KB) |
Firestore key properties
Each document in Cloud Firestore has a unique key that contains
information such as the document ID and the document path. BigQuery
creates a RECORD
data type (also known as a STRUCT
)
for the key, with nested fields for each piece of information, as described in
the following table.
Key property | Description | BigQuery data type |
---|---|---|
__key__.app |
The Cloud Firestore app name. | STRING |
__key__.id |
The document's ID, or null if __key__.name
is set. |
INTEGER |
__key__.kind |
The document's collection ID. | STRING |
__key__.name |
The document's name, or null if __key__.id
is set. |
STRING |
__key__.namespace |
Cloud Firestore does not support custom namespaces. The default namespace is represented by an empty string. | STRING |
__key__.path |
The path of the document: the sequence of the document and the
collection pairs from the root collection. For example: "Country",
"USA", "PostalCode", 10011, "Route", 1234 . |
STRING |