This document describes more details, options and Beta features when using the BigQuery Data Transfer Service to migrate data from Teradata to BigQuery.
Custom schema file
Making transformations with a schema file
As part of the migration, you can specify a custom schema file to change the name fields of any object, and can add the usageType array to any column. This is especially useful for including additional information about a table, like partitioning, that would otherwise be lost in the migration, if no schema file were specified.
The following usage types are supported:
- PARTITIONING: Only one column per table can be annotated with this
usageType. The type field of a column should be
DATE
orTIMESTAMP
. This column will be used for partitioned table definition for containing tables object. - CLUSTERING: Several columns in one table can be annotated with this
usageType. Column types should follow the
constraints
for clustering in BigQuery. If a
PARTITIONING
field is specified for the same table, BigQuery will use these columns to create a clustered table. Only columns with the typesINT64
,STRING
,DATE
,TIMESTAMP
,BOOL
,NUMERIC
, orBIGNUMERIC
(Preview) can be marked with this usageType. - COMMIT_TIMESTAMP: Only one column per table can be annotated with
this usageType. Use this usageType to annotate an update timestamp
column. This column will be used to extract rows created/updated since
the last transfer run. It should have a
TIMESTAMP
orDATE
type.
Example custom schema file
{
"databases": [
{
"name": "db",
"originalName": "db",
"tables": [
{
"name": "test",
"originalName": "test",
"columns": [
{
"name": "foo",
"originalName": "foo",
"type": "INT64",
"usageType": ["CLUSTERING"]
},
{
"name": "bar",
"originalName": "bar",
"type": "DATE",
"usageType": ["PARTITIONING"]
},
{
"name": "change",
"originalName": "change",
"type": "TIMESTAMP",
"usageType": ["COMMIT_TIMESTAMP"]
}
]
}
]
}
]
}
For more information about exactly how tables are partitioned or clustered in a transfer, see the incremental transfers section.
Teradata to BigQuery data type mapping
Teradata type | BigQuery type | Type difference |
---|---|---|
INTEGER | INTEGER | |
SMALLINT | INTEGER | |
BYTEINT | INTEGER | |
BIGINT | INTEGER | |
DECIMAL NUMERIC NUMBER FLOAT |
NUMERIC or BIGNUMERIC | BigQuery NUMERIC type has 38 digits of precision and 9 decimal digits of scale. BigQuery BIGNUMERIC (Preview) type has more than 76 digits of precision (the 77th digit is partial) and exactly 38 digits of scale. |
REAL | FLOAT64 | |
CHAR | STRING | |
CHARACTER | STRING | |
VARCHAR | STRING | |
CLOB | STRING | |
JSON | STRING | |
BLOB | BYTES | |
BYTE | BYTES | |
VARBYTE | BYTES | |
DATE | DATE | |
TIME TIMETZ |
TIME | BigQuery TIME is in UTC. The BigQuery Data Transfer Service will extract respective values in UTC format. |
TIMESTAMP TIMESTAMPTZ |
TIMESTAMP | BigQuery TIMESTAMP is in UTC. The BigQuery Data Transfer Service will extract respective
values in UTC format. TIMESTAMP precision is limited to microseconds. |
ARRAY | STRING | |
MULTIDIMENSIONALARRAY | STRING | |
HOUR | INTEGER | |
MINUTE | INTEGER | |
SECOND | INTEGER | |
DAY | INTEGER | |
MONTH | INTEGER | |
YEAR | INTEGER | |
PERIODDATE | STRING | |
PERIODTIMESTAMPTZ | STRING | |
PERIODTIMESTAMP | STRING | |
PERIODTIME | STRING | |
PERIODTIMETZ | STRING | |
USERDEFINED | STRING | |
XML | STRING | |
ANYTYPE | STRING |
Incremental transfers
The BigQuery Data Transfer Service supports recurring, periodic transfers of new and updated rows ("incremental transfers"). You designate the transfer as on-demand (one time) or incremental in the scheduling options when Setting up a transfer.
The source table in Teradata must have a change tracking column with the
TIMESTAMP
data type.
In incremental transfers, the first transfer always creates a table snapshot in BigQuery. All subsequent transfers will capture, transfer and append new and changed data to the existing table in BigQuery. This means that for changed rows, the BigQuery table could potentially have duplicate rows with old and new values.
For each transfer run, a timestamp of the transfer run is saved. For each subsequent transfer run, an agent receives the timestamp of a previous transfer run (T1) and a timestamp of when the current transfer run started (T2).
For each transfer after initial run, the migration agent will extract data using the following per-table logic:
- If a table object in a schema file does not have a column with a
usageType of
COMMIT_TIMESTAMP
, then the table is skipped. - If a table has a column with the usageType of
COMMIT_TIMESTAMP
, then all rows with a timestamp between T1 and T2 are extracted and appended to the existing table in BigQuery.
DDL/DML operations in incremental transfers
Teradata operation | Type | Teradata-to-BigQuery support |
---|---|---|
CREATE | DDL | If the table name matches the given pattern, a new full snapshot for the table is created in BigQuery. |
DROP | DDL | Not supported |
ALTER (RENAME) | DDL | If a table name matches a given pattern, a new full snapshot for the renamed table is created in BigQuery. The previous snapshot is not deleted from BigQuery. The user is not notified of the renamed table. |
INSERT | DML | New rows are added to the matching BigQuery table. |
UPDATE | DML | Rows are not changed. Rows are added to the matching BigQuery table as new, like an INSERT. Rows from previous transfers are not updated or deleted. |
MERGE | DML | Not supported. See instead INSERT, UPDATE, and DELETE. |
DELETE | DML | Not supported |
What's next
- Learn more about data warehouse migrations to BigQuery in our solution guide.
- Read an overview about migrating from Teradata using BigQuery Data Transfer Service
- Learn step-by-step how to set up a Teradata data warehouse transfer to BigQuery.