Export data to Spanner (reverse ETL)

This document describes how you can set up a reverse extract, transform, and load (reverse ETL) workflow from BigQuery to Spanner. You can do this by using the EXPORT DATA statement to export data from a BigQuery table to a Spanner table.

This reverse ETL workflow combines analytic capabilities in BigQuery with low latency and high throughput in Spanner. This workflow lets you serve data to application users without exhausting quotas and limits on BigQuery.

Before you begin

Create a Spanner database including a table to receive the exported data.
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Create an Enterprise or a higher tier reservation. You might reduce BigQuery compute costs when you run one-time exports to Spanner by setting a baseline slot capacity of zero and enabling autoscaling.

Required roles

To get the permissions that you need to export BigQuery data to Spanner, ask your administrator to grant you the following IAM roles on your project:

Export data from a BigQuery table: BigQuery Data Viewer (roles/bigquery.dataViewer)
Run an extract job: BigQuery User (roles/bigquery.user)
Check parameters of the Spanner instance: Cloud Spanner Viewer (roles/spanner.viewer)
Write data to a Spanner table: Cloud Spanner Database User (roles/spanner.databaseUser)

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Limitations

This feature is not supported in Assured Workloads.
The following BigQuery data types don't have equivalents in Spanner and are not supported:

Spanner database dialect	Unsupported BigQuery types
All dialects	`STRUCT` `GEOGRAPHY` `DATETIME` `RANGE` `TIME`
GoogleSQL	`BIGNUMERIC`: The supported `NUMERIC` type is not wide enough. Consider adding explicit casts to the `NUMERIC` type in the query.

The maximum size of an exported row cannot exceed 1 MiB.
Spanner enforces referential integrity during the export. If the target table is a child of another table (INTERLEAVE IN PARENT), or if the target table has foreign key constraints, the foreign keys and parent key will be validated during the export. If an exported row is is written to a table with INTERLEAVE IN PARENT and the parent row doesn't exist, the export will fail with "Parent row is missing. Row cannot be written" error. If the exported row is written to a table with foreign key constraints and is referencing a key that doesn't exist, the export will fail with "Foreign key constraint is violated" error. When exporting to multiple tables, we recommend sequencing the export to ensure that referential integrity will be maintained through the export. This usually means exporting parent tables and tables that are referenced by foreign keys before tables that reference them.

If the table that is the target of the export has foreign key constraints, or is a child of another table (INTERLEAVE IN PARENT), the parent table must be populated before a child table export, and should contain all the corresponding keys. An attempt to export a child table while a parent table does not have the complete set of relevant keys will fail.
An extract job to Spanner has a maximum duration of 6 hours. For information about optimizing large extract jobs, see Export optimization. Alternatively, consider splitting the input into individual blocks of data, which may be exported as individual extract jobs.
Exports to Spanner are only supported for the BigQuery Enterprise or Enterprise Plus editions. The BigQuery Standard edition and on-demand compute are not supported.
Exports to default Spanner schemas are supported. Tables from named schemas are not supported.
You cannot use continuous queries to export to Spanner tables with auto-generated primary keys.
You cannot use continuous queries to export to Spanner tables in a PostgreSQL-dialect database.
When using continuous queries to export to a Spanner table, ensure that you choose a primary key that doesn't correspond to a monotonically increasing integer in your BigQuery table. Doing so might cause performance issues in your export. For information about primary keys in Spanner, and ways to mitigate these performance issues, see Choose a primary key.

Configure exports with `spanner_options` option

You can use the spanner_options option to specify a destination Spanner database and table. The configuration is expressed in the form of a JSON string, as the following example shows:

EXPORT DATA OPTIONS(
   uri="https://spanner.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID",
  format='CLOUD_SPANNER',
   spanner_options = """{
      "table": "TABLE_NAME",
      "change_timestamp_column": "CHANGE_TIMESTAMP",
      "priority": "PRIORITY",
      "tag": "TAG",
   }"""
)

Replace the following:

PROJECT_ID: the name of your Google Cloud project.
INSTANCE_ID: the name of your database instance.
DATABASE_ID: the name of your database.
TABLE_NAME: the name of an existing destination table.
CHANGE_TIMESTAMP: the name of a TIMESTAMP type column in a Spanner table to track the timestamp of the most recent update of the corresponding row in a BigQuery table being exported. This is an optional field for batch and continuous exports to Spanner, but recommended for use cases where the ordering of exported rows with the same primary key is important.
PRIORITY (optional): priority of the write requests. Allowed values: LOW, MEDIUM, HIGH. Default value: MEDIUM.
TAG (optional): request tag to help identify exporter traffic in Spanner monitoring. Default value: bq_export.

Export query requirements

To export query results to Spanner, the results must meet the following requirements:

All columns in the result set must exist in the destination table, and their types must match or be convertible.
The result set must contain all NOT NULL columns for the destination table.
Column values must not exceed Spanner data size limits within tables.
Any unsupported column types must be converted to one of the supported types before exporting to Spanner.

Type conversions

For ease of use, Spanner exporter automatically applies the following type conversions:

BigQuery type	Spanner type
BIGNUMERIC	NUMERIC (PostgreSQL dialect only)
FLOAT64	FLOAT32
BYTES	PROTO
INT64	ENUM

Export data

You can use the EXPORT DATA statement to export data from a BigQuery table into a Spanner table.

The following example exports selected fields from a table that's named mydataset.table1:

EXPORT DATA OPTIONS (
  uri="https://spanner.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID",
  format='CLOUD_SPANNER',
  spanner_options="""{ "table": "TABLE_NAME" }"""
)
AS SELECT * FROM mydataset.table1;

Replace the following:

PROJECT_ID: the name of your Google Cloud project
INSTANCE_ID: the name of your database instance
DATABASE_ID: the name of your database
TABLE_NAME: the name of an existing destination table

Export multiple results with the same `rowkey` value

When you export a result containing multiple rows with the same rowkey value, values written to Spanner end up in the same Spanner row. Only single matching BigQuery row (there is no guarantee which one) will be present in the Spanner row set produced by export.

Export continuously

To continuously process an export query, you can configure it as a continuous query.

Export optimization

To optimize the export of records from BigQuery to Spanner, you can try the following:

Increasing the number of nodes in the Spanner destination instance. Note that during the early stages of the export, increasing the number of the nodes in the instance might not immediately cause an increase in the export throughput. There can be a slight delay as the Spanner performs load-based splitting. With load-based splitting, the export throughput will grow through and stabilize shortly after. For additional details about maximizing write throughput, see Performance overview.
Specify HIGH priority within spanner_options. However, doing so can cause significant performance degradation for other workloads served by the same instance.
Avoid ordering the query results. If the result set contains all primary key columns, then the exporter automatically sorts the primary keys of the destination table to streamline writes and minimize contention.

If the destination table's primary key includes generated columns, then you should add the generated columns' expressions to the query to ensure that the exported data is sorted properly.

Pricing

When you export data to Spanner using the EXPORT DATA statement, you are billed using BigQuery capacity compute pricing.

To export continuously to Spanner using a continuous query, you must have a BigQuery Enterprise or Enterprise Plus edition slot reservation and a reservation assignment that uses the CONTINUOUS job type.

BigQuery exports to Spanner that cross regional boundaries are charged using data extraction rates. For more information, see BigQuery pricing. To avoid data transfer charges, make sure that your BigQuery export runs in the same region as the Spanner default leader. Continuous query exports don't support exports that cross regional boundaries.

After the data is exported, you're charged for storing the data in Spanner. For more information, see Spanner pricing.