Export data to Spanner (reverse ETL)

To provide feedback or request support for this feature, send email to bq-cloud-spanner-federation-preview@google.com

This document describes how you can set up a reverse extract, transform, and load (reverse ETL) workflow from BigQuery to Spanner. You can do this by using the EXPORT DATA statement to export data from a BigQuery table to a Spanner table.

This reverse ETL workflow combines analytic capabilities in BigQuery with low latency and high throughput in Spanner. This workflow lets you serve data to application users without exhausting quotas and limits on BigQuery.

Before you begin

Create a Spanner database including a table to receive the exported data.
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.

Required roles

To get the permissions that you need to export BigQuery data to Spanner, ask your administrator to grant you the following IAM roles on your project:

Export data from a BigQuery table: BigQuery Data Viewer (roles/bigquery.dataViewer)
Run an export job: BigQuery User (roles/bigquery.user)
Check parameters of the Spanner instance: Cloud Spanner Viewer (roles/spanner.viewer)
Write data to a Spanner table: Cloud Spanner Database User (roles/spanner.databaseUser)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Limitations

The following BigQuery data types don't have equivalents in Spanner and are not supported:

Spanner database dialect	Unsupported BigQuery types
All dialects	`STRUCT` `GEOGRAPHY` `DATETIME` `RANGE` `TIME`
GoogleSQL	`BIGNUMERIC`: The supported `NUMERIC` type is not wide enough. Consider adding explicit casts to the `NUMERIC` type in the query.

The maximum size of an exported row cannot exceed 1 MiB.
Spanner enforces referential integrity during the export. If the target table is a child of another table (INTERLEAVE IN PARENT), or if the target table has foreign key constraints, the foreign keys and parent key will be validated during the export. If an exported row is is written to a table with INTERLEAVE IN PARENT and the parent row doesn't exist, the export will fail with "Parent row is missing. Row cannot be written" error. If the exported row is written to a table with foreign key constraints and is referencing a key that doesn't exist, the export will fail with "Foreign key constraint is violated" error. When exporting to multiple tables, we recommend sequencing the export to ensure that referential integrity will be maintained through the export. This usually means exporting parent tables and tables that are referenced by foreign keys before tables that reference them.

If the table that is the target of the export has foreign key constraints, or is a child of another table (INTERLEAVE IN PARENT), the parent table must be populated before a child table export, and should contain all the corresponding keys. An attempt to export a child table while a parent table does not have the complete set of relevant keys will fail.
An export job to Spanner has a maximum duration of 6 hours. For information about optimizing large export jobs, see Export optimization. Alternatively, consider splitting the input into individual blocks of data, which may be exported as individual export jobs.
Exports to Spanner are only supported for the BigQuery Enterprise or Enterprise Plus editions. The BigQuery Standard edition and on-demand compute are not supported.

Configure exports with `spanner_options` option

You can use the spanner_options option to specify a destination Spanner database and table. The configuration is expressed in the form of a JSON string, as the following example shows:

EXPORT DATA OPTIONS(
   uri="https://spanner.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID",
  format='CLOUD_SPANNER',
   spanner_options = """{
      "table": "TABLE_NAME",
      "priority": "PRIORITY",
      "tag": "TAG",
   }"""
)

Replace the following:

PROJECT_ID: the name of your Google Cloud project.
INSTANCE_ID: the name of your database instance.
DATABASE_ID: the name of your database.
TABLE_NAME: the name of an existing destination table.
PRIORITY (optional): priority of the write requests. Allowed values: LOW, MEDIUM, HIGH. Default value: MEDIUM.
TAG (optional): request tag to help identify exporter traffic in Spanner monitoring. Default value: bq_export.

Export query requirements

To export query results to Spanner, the results must meet the following requirements:

All columns in the result set must exist in the destination table, and their types must match.
The result set must contain all NOT NULL columns for the destination table.
Column values must not exceed Spanner data size limits within tables.
Any unsupported column types must be converted to one of the supported types before exporting to Spanner.

Export data

You can use the EXPORT DATA statement to export data from a BigQuery table into a Spanner table.

The following example exports selected fields from a table that's named mydataset.table1:

EXPORT DATA OPTIONS (
  uri="https://spanner.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID",
  format='CLOUD_SPANNER',
  spanner_options="""{ "table": "TABLE_NAME" }"""
)
AS SELECT * FROM mydataset.table1;

Replace the following:

PROJECT_ID: the name of your Google Cloud project
INSTANCE_ID: the name of your database instance
DATABASE_ID: the name of your database
TABLE_NAME: the name of an existing destination table

Export multiple results with the same `rowkey` value

When you export a result containing multiple rows with the same rowkey value, values written to Spanner end up in the same Spanner row. Only single matching BigQuery row (there is no guarantee which one) will be present in the Spanner row set produced by export.

Export optimization

To optimize the export of records from BigQuery to Spanner, you can try the following:

Increasing the number of nodes in the Spanner destination instance. Note that during the early stages of the export, increasing the number of the nodes in the instance might not immediately cause an increase in the export throughput. There can be a slight delay as the Spanner performs load-based splitting. With load-based splitting, the export throughput will grow through and stabilize shortly after. For additional details about maximizing write throughput, see Performance overview.
Specify HIGH priority within spanner_options. However, doing so can cause significant performance degradation for other workloads served by the same instance.
Avoid ordering the query results. If the result set contains all primary key columns, then the exporter automatically sorts the primary keys of the destination table to streamline writes and minimize contention.

If the destination table's primary key includes generated columns, then you should add the generated columns' expressions to the query to ensure that the exported data is sorted properly.

Pricing

For information on data export pricing, see BigQuery pricing.

After the data is exported, you're charged for storing the data in Spanner. For more information, see Spanner pricing.