Loading data from Firestore exports

BigQuery supports loading data from Firestore exports created using the Firestore managed import and export service. The managed import and export service exports Firestore documents into a Cloud Storage bucket. You can then load the exported data into a BigQuery table.

Limitations

When you load data into BigQuery from a Firestore export, note the following restrictions:

  • Your dataset must be in the same regional or multi-regional location as the Cloud Storage bucket containing your export files.
  • You can specify only one Cloud Storage URI, and you cannot use a URI wildcard.
  • For a Firestore export to load correctly, documents in the export data must share a consistent schema with fewer than 10,000 unique field names.
  • You can create a new table to store the data, or you can overwrite an existing table. You cannot append Firestore export data to an existing table.
  • Your export command must specify a collection-ids filter. Data exported without specifying a collection ID filter cannot be loaded into BigQuery.

Required permissions

When you load data into BigQuery, you need permissions to run a load job and permissions that let you load data into new or existing BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need permissions to access to the bucket that contains your data.

BigQuery permissions

At a minimum, the following permissions are required to load data into BigQuery. These permissions are required if you are loading data into a new table or partition, or if you are appending or overwriting a table or partition.

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.jobs.create

The following predefined IAM roles include both bigquery.tables.create and bigquery.tables.updateData permissions:

  • bigquery.dataEditor
  • bigquery.dataOwner
  • bigquery.admin

The following predefined IAM roles include bigquery.jobs.create permissions:

  • bigquery.user
  • bigquery.jobUser
  • bigquery.admin

In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access lets the user create and update tables in the dataset by using a load job.

For more information on IAM roles and permissions in BigQuery, see Access control.

Cloud Storage permissions

To load data from a Cloud Storage bucket, you must be granted storage.objects.get permissions. If you are using a URI wildcard, you must also have storage.objects.list permissions.

The predefined IAM role storage.objectViewer can be granted to provide both storage.objects.get and storage.objects.list permissions.

Loading Firestore export service data

You can load data from a Firestore export metadata file by using the Cloud Console, bq command-line tool, or API.

Sometimes Datastore terminology is used in the UI or or in the commands, but the following procedures are compatible with Firestore export files. Firestore and Datastore share an export format.

Console

  1. In the Cloud Console, open the BigQuery page.

    Go to the BigQuery page

  2. In the navigation panel, in the Resources section, expand your Google Cloud project and select a dataset. Click Create table. The process for loading data is the same as the process for creating an empty table. Create table.

  3. On the Create table page, in the Source section:

    • For Create table from, select Cloud Storage

    • In the source field, enter the Cloud Storage URI. The Cloud Storage bucket must be in the same location as your dataset. The URI for your Firestore export file should end with KIND_COLLECTION_ID.export_metadata. For example: default_namespace_kind_Book.export_metadata. In this example, Book is the collection ID, and default_namespace_kind_Book is the file name generated by Firestore.

      Verify KIND_COLLECTION_ID is specified in your Cloud Storage URI. If you specify the URI without KIND_COLLECTION_ID, you receive the following error: does not contain valid backup metadata. (error code: invalid).

    • For File format, select Datastore Backup. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.

  4. On the Create table page, in the Destination section:

    • For Dataset name, choose the appropriate dataset.

      Select dataset.

    • In the Table name field, enter the name of the table you're creating in BigQuery.

    • Verify that Table type is set to Native table.

  5. In the Schema section, no action is necessary. The schema is inferred for a Firestore export.

  6. Select applicable items in the Advanced options section. If you are overwriting an existing table, set Write preference to Overwrite table.

    Overwrite table.

  7. Click Create table.

Classic UI

  1. Go to the classic BigQuery web UI.
    Go to the BigQuery web UI
  2. In the navigation panel, hover on a dataset, click the down arrow icon Down arrow icon., and click Create new table.
  3. On the Create Table page, in the Source Data section:

    • Leave Create from source selected.
    • For Location, select Cloud Storage and in the source field, enter the Cloud Storage URI. The Cloud Storage bucket must be in the same location as your dataset. The URI for your Firestore export file should end with KIND_COLLECTION_ID.export_metadata. For example: default_namespace_kind_Book.export_metadata. In this example, Book is the collection ID, and default_namespace_kind_Book is the file name generated by Firestore.

      Verify KIND_COLLECTION_ID is specified in your Cloud Storage URI. If you specify the URI without KIND_COLLECTION_ID, you receive the following error: does not contain valid backup metadata. (error code: invalid).

    • For File format, select Datastore Backup. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.

  4. On the Create Table page, in the Destination Table section:

    • For Table name, choose the appropriate dataset, and in the table name field, enter the name of the table you're creating in BigQuery.
    • Verify that Table type is set to Native table.
  5. In the Schema section, no action is necessary. The schema is inferred for a Firestore export.

  6. Select applicable items in the Options section. If you are overwriting an existing table, set Write preference to Overwrite table.

  7. Click Create Table.

bq

Use the bq load command with source_format set to DATASTORE_BACKUP. Supply the --location flag and set the value to your location. If you are overwiting an existing table, add the --replace flag.

To load only specific fields, use the --projection_fields flag.

bq --location=LOCATION load \
--source_format=FORMAT \
DATASET.TABLE \
PATH_TO_SOURCE

Replace the following:

  • LOCATION: your location. The --location flag is optional.
  • FORMAT: DATASTORE_BACKUP. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.
  • DATASET: the dataset that contains the table into which you're loading data.
  • TABLE: the table into which you're loading data. If the table doesn't exist, it is created.
  • PATH_TO_SOURCE: the Cloud Storage URI.

For example, the following command loads the gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata Firestore export file into a table named book_data. mybucket and mydataset were created in the US multi-region location.

bq --location=US load \
--source_format=DATASTORE_BACKUP \
mydataset.book_data \
gs://mybucket/20180228T1256/default_namespace/kind_Book/default_namespace_kind_Book.export_metadata

API

Set the following properties to load Firestore export data using the API.

  1. Create a load job configuration that points to the source data in Cloud Storage.

  2. Specify your location in the location property in the jobReference section of the job resource.

  3. The sourceUris must be fully qualified, in the format gs://BUCKET/OBJECT in the load job configuration. The file (object) name must end in KIND_NAME.export_metadata. Only one URI is allowed for Firestore exports, and you cannot use a wildcard.

  4. Specify the data format by setting the sourceFormat property to DATASTORE_BACKUP in the load job configuration. Datastore Backup is the correct option for Firestore. Firestore and Datastore share an export format.

  5. To load only specific fields, set the projectionFields property.

  6. If you are overwriting an existing table, specify the write disposition by setting the writeDisposition property to WRITE_TRUNCATE.

Firestore options

To change how BigQuery parses Firestore export data, specify the following option:

BigQuery web UI option Classic UI option `bq` flag BigQuery API property Description
Not available Not available --projection_fields projectionFields (Optional) A comma-separated list that indicates which document fields to load from a Firestore export. By default, BigQuery loads all fields. Field names are case sensitive and must be present in the export. You cannot specify field paths within a map field such as map.foo.

Data type conversion

BigQuery converts data from each document in Firestore export files to BigQuery data types. The following table describes the conversion between supported data types.

Firestore data type BigQuery data type
Array RECORD
Boolean BOOLEAN
Reference RECORD
Date and time TIMESTAMP
Map RECORD
Floating-point number FLOAT
Geographical point

RECORD

[{"lat","FLOAT"},
 {"long","FLOAT"}]
        
Integer INTEGER
String STRING (truncated to 64 KB)

Firestore key properties

Each document in Firestore has a unique key that contains information such as the document ID and the document path. BigQuery creates a RECORD data type (also known as a STRUCT) for the key, with nested fields for each piece of information, as described in the following table.

Key property Description BigQuery data type
__key__.app The Firestore app name. STRING
__key__.id The document's ID, or null if __key__.name is set. INTEGER
__key__.kind The document's collection ID. STRING
__key__.name The document's name, or null if __key__.id is set. STRING
__key__.namespace Firestore does not support custom namespaces. The default namespace is represented by an empty string. STRING
__key__.path The path of the document: the sequence of the document and the collection pairs from the root collection. For example: "Country", "USA", "PostalCode", 10011, "Route", 1234. STRING