The BigQuery to Bigtable template is a batch pipeline that copies data from a BigQuery table into an existing Bigtable table. The template can either read the entire table or read specific records using a supplied query.
Pipeline requirements
- The source BigQuery table must exist.
- The Bigtable table must exist.
- The worker service account
needs the
roles/bigquery.datasets.create
permission. For more information, see Introduction to IAM.
Template parameters
Required parameters
- readIdColumn: The name of the BigQuery column storing the unique identifier of the row.
- bigtableWriteInstanceId: The ID of the Bigtable instance that contains the table.
- bigtableWriteTableId: The ID of the Bigtable table to write to.
- bigtableWriteColumnFamily: The name of the column family of the Bigtable table to write data into.
Optional parameters
- inputTableSpec: The BigQuery table to read from. If you specify
inputTableSpec
, the template reads the data directly from BigQuery storage by using the BigQuery Storage Read API (https://cloud.google.com/bigquery/docs/reference/storage). For information about limitations in the Storage Read API, see https://cloud.google.com/bigquery/docs/reference/storage#limitations. You must specify eitherinputTableSpec
orquery
. If you set both parameters, the template uses thequery
parameter. For example,<BIGQUERY_PROJECT>:<DATASET_NAME>.<INPUT_TABLE>
. - outputDeadletterTable: The BigQuery table for messages that failed to reach the output table. If a table doesn't exist, it is created during pipeline execution. If not specified,
<outputTableSpec>_error_records
is used. For example,<PROJECT_ID>:<DATASET_NAME>.<DEADLETTER_TABLE>
. - query: The SQL query to use to read data from BigQuery. If the BigQuery dataset is in a different project than the Dataflow job, specify the full dataset name in the SQL query, for example: <PROJECT_ID>.<DATASET_NAME>.<TABLE_NAME>. By default, the
query
parameter uses GoogleSQL (https://cloud.google.com/bigquery/docs/introduction-sql), unlessuseLegacySql
istrue
. You must specify eitherinputTableSpec
orquery
. If you set both parameters, the template uses thequery
parameter. For example,select * from sampledb.sample_table
. - useLegacySql: Set to
true
to use legacy SQL. This parameter only applies when using thequery
parameter. Defaults tofalse
. - queryLocation: Needed when reading from an authorized view without underlying table's permission. For example,
US
. - queryTempDataset: With this option, you can set an existing dataset to create the temporary table to store the results of the query. For example,
temp_dataset
. - KMSEncryptionKey: If reading from BigQuery using query source, use this Cloud KMS key to encrypt any temporary tables created. For example,
projects/your-project/locations/global/keyRings/your-keyring/cryptoKeys/your-key
. - bigtableRpcAttemptTimeoutMs: The timeout for each Bigtable RPC attempt in milliseconds.
- bigtableRpcTimeoutMs: The total timeout for a Bigtable RPC operation in milliseconds.
- bigtableAdditionalRetryCodes: The additional retry codes. For example,
RESOURCE_EXHAUSTED,DEADLINE_EXCEEDED
. - bigtableWriteAppProfile: The ID of the Bigtable application profile to use for the export. If you do not specify an app profile, Bigtable uses the default app profile (https://cloud.google.com/bigtable/docs/app-profiles#default-app-profile) of the instance.
- bigtableWriteProjectId: The ID of the Google Cloud project that contains the Bigtable instanceto write data to.
- bigtableBulkWriteLatencyTargetMs: The latency target of Bigtable in milliseconds for latency-based throttling.
- bigtableBulkWriteMaxRowKeyCount: The maximum number of row keys in a Bigtable batch write operation.
- bigtableBulkWriteMaxRequestSizeBytes: The maximum bytes to include per Bigtable batch write operation.
Run the template
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
region is
us-central1
.For a list of regions where you can run a Dataflow job, see Dataflow locations.
- From the Dataflow template drop-down menu, select the BigQuery to Bigtable template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
In your shell or terminal, run the template:
gcloud dataflow flex-template runJOB_NAME \ --project=PROJECT_ID \ --region=REGION_NAME \ --template-file-gcs-location=gs://dataflow-templates-REGION_NAME /VERSION /flex/BigQuery_to_Bigtable \ --parameters \ readIdColumn=READ_COLUMN_ID ,\ inputTableSpec=INPUT_TABLE_SPEC ,\ bigtableWriteInstanceId=BIGTABLE_INSTANCE_ID ,\ bigtableWriteTableId=BIGTABLE_TABLE_ID ,\ bigtableWriteColumnFamily=BIGTABLE_COLUMN_FAMILY
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceREGION_NAME
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
READ_COLUMN_ID
: your BigQuery unique id column.INPUT_TABLE_SPEC
: your BigQuery table name.BIGTABLE_INSTANCE_ID
: your Bigtable instance id.BIGTABLE_TABLE_ID
: your Bigtable table id.BIGTABLE_COLUMN_FAMILY
: your Bigtable table column family.
To run the template using the REST API, send an HTTP POST request. For more information on the
API and its authorization scopes, see
projects.templates.launch
.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID /locations/LOCATION /flexTemplates:launch { "launch_parameter": { "jobName": "JOB_NAME ", "parameters": { "readIdColumn": "READ_COLUMN_ID ", "inputTableSpec": "INPUT_TABLE_SPEC ", "bigtableWriteInstanceId": "BIGTABLE_INSTANCE_ID ", "bigtableWriteTableId": "BIGTABLE_TABLE_ID ", "bigtableWriteColumnFamily": "BIGTABLE_COLUMN_FAMILY " }, "containerSpecGcsPath": "gs://dataflow-templates-LOCATION /VERSION /flex/BigQuery_to_Bigtable", } }
Replace the following:
PROJECT_ID
: the Google Cloud project ID where you want to run the Dataflow jobJOB_NAME
: a unique job name of your choiceLOCATION
: the region where you want to deploy your Dataflow job—for example,us-central1
VERSION
: the version of the template that you want to useYou can use the following values:
latest
to use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/- the version name, like
2023-09-12-00_RC00
, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
READ_COLUMN_ID
: your BigQuery unique id column.INPUT_TABLE_SPEC
: your BigQuery table name.BIGTABLE_INSTANCE_ID
: your Bigtable instance id.BIGTABLE_TABLE_ID
: your Bigtable table id.BIGTABLE_COLUMN_FAMILY
: your Bigtable table column family.
Template source code
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.