The combination of the BigQuery Data Transfer Service and a special migration agent lets you to copy data from an on-premises data warehouse system, such as Teradata, to BigQuery. This document provides an overview of the data transfer process using the BigQuery Data Transfer Service.
The data transfer is free of charge. See Pricing for more details.
To use the BigQuery Data Transfer Service for data warehouse migrations, first you download an on-premises migration agent that connects to your local data warehouse. Then you configure a transfer in the BigQuery Data Transfer Service.
The on-premises migration agent communicates with the BigQuery Data Transfer Service to copy tables from your data warehouse to BigQuery. You can monitor recurring data loads from the data warehouse to BigQuery by using the BigQuery Data Transfer Service's web UI.
The following diagram shows the simplified, overall flow of data between an on-premises data warehouse like Teradata and BigQuery.
Transfer modes and options
Because every migration has unique requirements, the migration agent can be customized in the following ways. There are three major choices when setting up a data transfer from Teradata to BigQuery:
- Extraction method: JDBC with FastConnect or Teradata Parallel Transporter (TPT)
- Automatic schema conversion or custom schema file
- On-demand transfer vs incremental transfers (Beta)
The BigQuery Data Transfer Service supports two different extraction methods for transferring data from Teradata to BigQuery:
- Extraction using JDBC driver with FastExport connection. In this mode,
a table is extracted into a collection of AVRO files to a specified
location on a local file system. Extracted files are then uploaded to a
specified Cloud Storage bucket and, after successful transfer, the files are
deleted from the local file system.
- Limitations on the amount of space in a local file system are strongly enforced, and extraction is paused until extracted files are uploaded and deleted from the local file system.
- If there are tight constraints on local storage space or TPT is not available, use this extraction method.
- JDBC driver with FastExport is the default extraction method.
- Extraction using Teradata Parallel Transporter (TPT) tbuild utility.
In this mode, an agent attempts to calculate extraction batches using
rows distributed by partitions. For each batch, a TPT extraction script
is emitted and executed, producing a set of pipe delimited files. After
each batch extraction, files are uploaded to a specified Cloud Storage
bucket and deleted from the local file system. Limitations on the amount of
space in the local file system are not enforced, so make sure the local
file system has enough space to extract the largest partition in a Teradata
- We recommend extracting with TPT and customizing your schema to indicate partition columns. This results in the fastest data extraction.
Read more about specifying the extraction method in the configuration for the migration agent section of the step-by-step transfer set up instructions.
Custom schema file
A schema file is a JSON file that describes database objects. The schema includes a set of databases, each containing a set of tables, each of which contains a set of columns. Each column has a type field — a type that is assigned to a column in BigQuery.
In a schema file, each object has a name field — a name that will be assigned to it in BigQuery. Each object also has an originalName field — the name of the matching object in the Teradata database.
The BigQuery Data Transfer Service provides automatic schema detection and data conversion during a data transfer from Teradata to BigQuery. Optionally, you can also specify a custom schema file. Schema customization is strongly recommended for some situations. Examples:
- A custom schema file is especially useful for including additional information about a table, like partitioning, that would otherwise be lost in the migration, if no schema file were specified.
- You can choose to provide a custom schema file to transform fields, like the name field of any object, or the usageType array of any column, during the data transfer.
- See Custom schema file for more detail.
On-demand or incremental transfers
When migrating data from a Teradata database instance to BigQuery, the BigQuery Data Transfer Service supports both one-time, snapshot data transfer (an "on-demand transfer") and recurring, periodic transfers of new and updated rows ("incremental transfers") (Beta). You designate the transfer as on-demand or incremental in the scheduling options when Setting up a transfer.
- On-demand data transfer
- If your table is very large and you can extract with TPT for higher performance, we recommend partitioning your Teradata table to enable allow partition-by-partition extraction. For more details, see Custom schema file.
- If your tables are small or you cannot use TPT, follow the basic instructions. Schema customization is not required.
- Incremental data transfer
- If you would like to regularly migrate changes from Teradata to BigQuery, you can use incremental mode. On a recurring basis, new and changed records from Teradata are appended to BigQuery tables.
- This method requires customizing your schema to annotate COMMIT_TIMESTAMP columns.
- Certain conditions apply when setting up incremental transfers. For more information, see incremental transfers.
Your Cloud Storage bucket must be in a region or multi-region that is compatible with the region or multi-region of the destination dataset in BigQuery.Colocate your Cloud Storage buckets for transferring data.
- If your BigQuery dataset is in a multi-regional location, the Cloud Storage bucket containing the data you're transferring must be in a regional or multi-regional bucket in the same location. For example, if your BigQuery dataset is in the EU, the Cloud Storage bucket must be in a regional or multi-regional bucket in the EU.
- If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location. For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
- Exception: If your dataset is in the US multi-regional location, you can transfer data from a Cloud Storage bucket in any regional or multi-regional location.
For detailed information about transfers and regions, see Dataset locations and transfers.
Migrating a data warehouse instance with the BigQuery Data Transfer Service is free of charge.
- Extraction, uploading to a Cloud Storage bucket, and loading data into BigQuery is free.
- Data is not automatically deleted from your Cloud Storage bucket after it is uploaded to BigQuery. Consider deleting the data from your Cloud Storage bucket to avoid additional storage costs. See Cloud Storage pricing.
Standard BigQuery Quotas & limits on load jobs apply.
See our transfers Pricing page for details.
- One-time, on-demand transfers are fully supported. Incremental transfers are in Beta. DDL/DML operations in incremental transfers are partially supported.
- Data is extracted to a folder on the local file system (on-premises). Make sure
there is adequate free space.
- When using the FastExport mode of extraction, you can set the maximum storage
space to be used, and the limit strongly enforced by the migration agent.
max-local-storagesetting in the migration agent's configuration file when setting up a transfer from Teradata to BigQuery.
- When using the TPT extraction method, make sure the file system has enough free space — at least equal to the largest table partition in the Teradata instance.
- When using the FastExport mode of extraction, you can set the maximum storage space to be used, and the limit strongly enforced by the migration agent. Set the
- The BigQuery Data Transfer Service converts schema automatically (if you don't supply a custom schema file) and transfers Teradata data to BigQuery. Data is mapped from Teradata to BigQuery types.
- Files are not automatically deleted from your Cloud Storage bucket after being uploaded from the local file system, and loaded into BigQuery. Consider deleting the data from your Cloud Storage bucket after loading it into BigQuery, to avoid additional storage costs. See Pricing.
- The speed of the extraction is bounded by your JDBC connection.
The data extracted from Teradata is not encrypted. Take appropriate steps to restrict access to the extracted files in the local file system, and ensure the Cloud Storage bucket is properly secured.
Other database resources, such as stored procedures, saved queries, views, and user-defined functions are not transferred and not in the scope of this service.