Teradata SQL translation guide
This document details the similarities and differences in SQL syntax between Teradata and BigQuery to help you plan your migration. Use batch SQL translation to migrate your SQL scripts in bulk, or interactive SQL translation to translate ad-hoc queries.
Data types
This section shows equivalents between data types in Teradata and in BigQuery.
Teradata | BigQuery | Notes |
---|---|---|
INTEGER |
INT64 |
|
SMALLINT |
INT64 |
|
BYTEINT |
INT64 |
|
BIGINT |
INT64 |
|
DECIMAL |
Use BigQuery's Use BigQuery's parameterized decimal data types if you need to enforce custom digit or scale bounds (constraints). Teradata allows you to insert values of higher precision by rounding the stored value; however, it keeps the high precision in calculations. This can lead to unexpected rounding behavior compared to the ANSI standard. |
|
FLOAT |
FLOAT64 |
|
NUMERIC |
Use BigQuery's Use BigQuery's parameterized decimal data types if you need to enforce custom digit or scale bounds (constraints). Teradata allows you to insert values of higher precision by rounding the stored value; however, it keeps the high precision in calculations. This can lead to unexpected rounding behavior compared to the ANSI standard. |
|
NUMBER |
Use BigQuery's Use BigQuery's parameterized decimal data types if you need to enforce custom digit or scale bounds (constraints). Teradata allows you to insert values of higher precision by rounding the stored value; however, it keeps the high precision in calculations. This can lead to unexpected rounding behavior compared to the ANSI standard. |
|
REAL |
FLOAT64 |
|
CHAR/CHARACTER |
STRING |
Use BigQuery's
parameterized
|
VARCHAR |
STRING |
Use BigQuery's
parameterized
|
CLOB |
STRING |
|
JSON |
JSON |
|
BLOB |
BYTES |
|
BYTE |
BYTES |
|
VARBYTE |
BYTES |
|
DATE |
DATE |
BigQuery does not support custom formatting similar to what Teradata with DataForm in SDF supports. |
TIME |
TIME |
|
TIME WITH TIME ZONE |
TIME |
Teradata stores the TIME data type in UTC and allows you to
pass an offset from UTC using the WITH TIME ZONE syntax.
The TIME data type in BigQuery
represents a time that's independent of any date or time zone. |
TIMESTAMP |
TIMESTAMP |
Both Teradata and BigQuery
TIMESTAMP
data types have microsecond precision (but Teradata supports leap seconds, while
BigQuery does not).Both Teradata and BigQuery data types are usually associated with a UTC time zone (details). |
TIMESTAMP WITH TIME ZONE |
TIMESTAMP |
The Teradata TIMESTAMP can be set to a different time zone
system-wide, per user or per column (using
WITH TIME ZONE ).The BigQuery TIMESTAMP type assumes UTC if you
don't explicitly specify a time zone. Make sure you either export time zone
information correctly (do not concatenate a DATE and
TIME value without time zone information) so that
BigQuery can convert it on import. Or make sure that you
convert time zone information to UTC before exporting.BigQuery has DATETIME for an abstraction
between civil time, which does not show a timezone when it is output, and
TIMESTAMP , which is a precise point in time that always shows
the UTC timezone. |
ARRAY |
ARRAY |
|
MULTI-DIMENSIONAL ARRAY |
ARRAY |
In BigQuery, use an array of structs, with each struct
containing a field of type ARRAY (For details, see the
BigQuery
documentation). |
INTERVAL HOUR |
INT64 |
|
INTERVAL MINUTE |
INT64 |
|
INTERVAL SECOND |
INT64 |
|
INTERVAL DAY |
INT64 |
|
INTERVAL MONTH |
INT64 |
|
INTERVAL YEAR |
INT64 |
|
PERIOD(DATE) |
DATE ,
DATE |
PERIOD(DATE) should be converted to two DATE
columns containing the start date and end date so that they can be used
with window functions. |
PERIOD(TIMESTAMP WITH TIME ZONE) |
TIMESTAMP ,
TIMESTAMP |
|
PERIOD(TIMESTAMP) |
TIMESTAMP ,
TIMESTAMP |
|
PERIOD(TIME) |
TIME ,
TIME |
|
PERIOD(TIME WITH TIME ZONE) |
TIME ,
TIME |
|
UDT |
STRING |
|
XML |
STRING |
|
TD_ANYTYPE |
STRING |
For more information on type casting, see the next section.
Teradata type formatting
Teradata SQL uses a set of default formats for displaying expressions and
column data, and for conversions between data types. For example, a
PERIOD(DATE)
data type in INTEGERDATE
mode is formatted as YY/MM/DD
by default. We suggest that you use ANSIDATE mode whenever possible to ensure
ANSI SQL compliance, and use this chance to clean up legacy formats.
Teradata allows automatic application of custom formats using the
FORMAT
clause, without changing the underlying storage, either as a data type attribute
when you create a table using DDL, or in a derived expression. For
example, a FORMAT
specification 9.99
rounds any FLOAT
value to two digits.
In BigQuery, this functionality has to be converted by using the
ROUND()
function.
This functionality requires handling of intricate edge cases. For instance,
when the FORMAT
clause is applied to a NUMERIC
column, you must take into
account special rounding and formatting rules.
A FORMAT
clause can be used to implicitly cast an INTEGER
epoch value to a
DATE
format. Or a FORMAT
specification X(6)
on a VARCHAR
column
truncates the column value, and you have to therefore convert to a SUBSTR()
function. This behavior is not ANSI SQL compliant. Therefore, we suggest not
migrating column formats to BigQuery.
If column formats are absolutely required, use Views or user-defined functions (UDFs).
For information about the default formats that Teradata SQL uses for each data type, see the Teradata default formatting documentation.
Timestamp and date type formatting
The following table summarizes the differences in timestamp and date formatting elements between Teradata SQL and GoogleSQL.
Teradata format | Teradata description | BigQuery |
---|---|---|
CURRENT_TIMESTAMP
|
TIME and TIMESTAMP information in Teradata can
have different time zone information, which is defined using WITH
TIME ZONE .
|
If possible, use CURRENT_TIMESTAMP() , which is formatted in
ISO format. However, the output format does always show the UTC timezone.
(Internally, BigQuery does not have a timezone.)Note the following details on differences in the ISO format. DATETIME is formatted based on output channel conventions. In
the BigQuery command-line tool and BigQuery
console, it's formatted using a T separator according to RFC
3339. However, in Python and Java JDBC, a space is used as a
separator.If you want to use an explicit format, use FORMAT_DATETIME() ,
which makes an explicit cast a string. For example, the following
expression always returns a space separator:CAST(CURRENT_DATETIME() AS STRING) Teradata supports a DEFAULT
keyword in TIME columns to set the current time (timestamp);
this is not used in BigQuery.
|
CURRENT_DATE |
Dates are stored in Teradata as INT64 values using the
following formula:(YEAR - 1900) * 10000 + (MONTH * 100) + DAY Dates can be formatted as integers. |
BigQuery has a separate DATE format that always
returns a date in ISO 8601 format.DATE_FROM_UNIX_DATE can't be used, because it is
1970-based.Teradata supports a DEFAULT
keyword in DATE columns to set the current date; this is not
used in BigQuery.
|
CURRENT_DATE-3 |
Date values are represented as integers. Teradata supports arithmetic operators for date types. |
For date types, use DATE_ADD()
or
DATE_SUB() .BigQuery uses arithmetic operators for data types: INT64 , NUMERIC , and FLOAT64 .
|
SYS_CALENDAR.CALENDAR |
Teradata provides a view for calendar operations to go beyond integer operations. | Not used in BigQuery. |
SET SESSION DATEFORM=ANSIDATE |
Set the session or system date format to ANSI (ISO 8601). | BigQuery always uses ISO 8601, so make sure you convert Teradata dates and times. |
Query syntax
This section addresses differences in query syntax between Teradata and BigQuery.
SELECT
statement
Most Teradata
SELECT
statements
are compatible with BigQuery. The following table contains a list
of minor differences.
Teradata | BigQuery | |
---|---|---|
SEL |
Convert to SELECT . BigQuery does not use the
SEL abbreviation. |
|
SELECT
|
In BigQuery, columns cannot reference the output of other
columns defined within the same select list. Prefer moving a subquery into
a WITH clause.
WITH flags AS (
|
|
SELECT * FROM table
|
BigQuery does not use the ANY logical
predicate.The same functionality can be achieved using multiple OR
operators:SELECT * FROM table In this case, string comparison also differs. See Comparison operators. |
|
SELECT TOP 10 * FROM table
|
BigQuery uses LIMIT
at the end of a query instead of TOP n following the
SELECT keyword.
|
Comparison operators
The following table shows Teradata comparison operators that are specific to Teradata and must be converted to the ANSI SQL:2011 compliant operators used in BigQuery.
For information about operators in BigQuery, see the Operators section of the BigQuery documentation.
Teradata | BigQuery | Notes |
---|---|---|
exp EQ exp2 exp IN (exp2, exp3)
|
exp = exp2 exp IN (exp2, exp3) To keep non-ANSI semantics for NOT CASESPECIFIC , you can use RTRIM(UPPER(exp)) = RTRIM(UPPER(exp2))
|
When comparing strings for equality, Teradata might ignore trailing
whitespaces, while BigQuery considers them part of the
string. For example, 'xyz'=' xyz' is TRUE in
Teradata but FALSE in BigQuery.Teradata also provides a
NOT CASESPECIFIC
column attribute that instructs Teradata to ignore
case when comparing two strings. BigQuery is always case
specific when comparing strings. For example, 'xYz' = 'xyz' is
TRUE in Teradata but FALSE in BigQuery.
|
exp LE exp2 |
exp <= exp2 |
|
exp LT exp2 |
exp < exp2 |
|
exp NE exp2 |
exp <> exp2
|
|
exp GE exp2 |
exp >= exp2 |
|
exp GT exp2 |
exp > exp2 |
JOIN
conditions
BigQuery and Teradata support the same JOIN
,
ON
, and USING
conditions. The following table
contains a list of minor differences.
Teradata | BigQuery | Notes |
---|---|---|
FROM A LEFT OUTER JOIN B ON A.date >
B.start_date AND A.date < B.end_date
|
FROM A LEFT OUTER JOIN (SELECT d FROM B JOIN
UNNEST(GENERATE_DATE_ARRAY(B.start_date, B.end_date)) d)
B ON A.date = B.date
|
BigQuery supports inequality JOIN clauses for
all inner joins or if at least one equality condition is given (=). But not
just one inequality condition (= and <) in an OUTER JOIN .
Such constructs are sometimes used to query date or integer ranges.
BigQuery prevents users from inadvertently creating large
cross joins.
|
FROM A, B ON A.id = B.id
|
FROM A JOIN B ON A.id = B.id
|
Using a comma between tables in Teradata is equal to an INNER JOIN ,
while in BigQuery it equals a CROSS JOIN
(Cartesian product). Because the comma in BigQuery
legacy SQL is treated as UNION , we recommend making the operation explicit to avoid confusion.
|
FROM A JOIN B ON
(COALESCE(A.id , 0) = COALESCE(B.id, 0))
|
FROM A JOIN B ON
(COALESCE(A.id , 0) = COALESCE(B.id, 0))
|
No difference for scalar (constant) functions. |
FROM A JOIN B ON
A.id = (SELECT MAX(B.id) FROM B)
|
FROM A JOIN (SELECT MAX(B.id) FROM B)
B1 ON A.id = B1.id
|
BigQuery prevents users from using subqueries, correlated subqueries, or aggregations in join predicates. This lets BigQuery parallelize queries. |
Type conversion and casting
BigQuery has fewer but wider data types than Teradata, which requires BigQuery to be stricter in casting.
Teradata | BigQuery | Notes |
---|---|---|
exp EQ exp2 exp IN (exp2, exp3)
|
exp = exp2 exp IN (exp2, exp3) To keep non-ANSI semantics for NOT CASESPECIFIC , you can use RTRIM(UPPER(exp)) = RTRIM(UPPER(exp2))
|
When comparing strings for equality, Teradata might ignore trailing
whitespaces, while BigQuery considers them part of the
string. For example, 'xyz'=' xyz' is TRUE in
Teradata but FALSE in BigQuery.Teradata also provides a
NOT CASESPECIFIC
column attribute that instructs Teradata to ignore
case when comparing two strings. BigQuery is always case
specific when comparing strings. For example, 'xYz' = 'xyz' is
TRUE in Teradata but FALSE in BigQuery.
|
CAST(long_varchar_column AS CHAR(6))
|
LPAD(long_varchar_column, 6)
|
Casting a character column in Teradata is sometimes used as a non-standard and non-optimal way to create a padded substring. |
CAST(92617 AS TIME)
92617 (FORMAT '99:99:99')
|
PARSE_TIME("%k%M%S", CAST(92617 AS STRING)) |
Teradata performs many more implicit type conversions and rounding than BigQuery, which is generally stricter and enforces ANSI standards.
(This example returns 09:26:17) |
CAST(48.5 (FORMAT 'zz') AS FLOAT)
|
CAST(SUBSTR(CAST(48.5 AS STRING), 0, 2) AS FLOAT64) |
Floating point and numeric data types can require special rounding rules when applied with formats such as currencies.
(This example returns 48) |
See also Comparison operators and column formats. Both comparisons and column formatting can behave like type casts.
QUALIFY
, ROWS
clauses
The QUALIFY
clause in Teradata allows you to
filter results for window functions.
Alternatively, a
ROWS
phrase
can be used for the same task. These work similar to a HAVING
condition for a
GROUP
clause, limiting the output of what in BigQuery are
called
window functions.
Teradata | BigQuery |
---|---|
SELECT col1, col2
|
The Teradata QUALIFY clause with a window
function like ROW_NUMBER() , SUM() ,
COUNT() and with OVER PARTITION BY is expressed
in BigQuery as a WHERE clause on a subquery
that contains an analytics value. Using ROW_NUMBER() :SELECT col1, col2 Using ARRAY_AGG , which supports larger partitions:
SELECT
|
SELECT col1, col2
|
SELECT col1, col2 In BigQuery, both RANGE and ROWS
can be used in the window frame clause. However, window clauses can only be
used with window functions like AVG() , not with numbering
functions like ROW_NUMBER() . |
NORMALIZE
keyword
Teradata provides the
NORMALIZE
keyword for SELECT
clauses to coalesce overlapping periods or intervals into
a single period or interval that encompasses all individual period values.
BigQuery does not support the PERIOD
type, so any PERIOD
type column in Teradata has to be inserted into BigQuery as two
separate DATE
or DATETIME
fields that correspond to the start and end of the
period.
Teradata | BigQuery |
---|---|
SELECT NORMALIZE
|
SELECT
|
Functions
The following sections list mappings between Teradata functions and BigQuery equivalents.
Aggregate functions
The following table maps common Teradata aggregate, statistical aggregate, and approximate aggregate functions to their BigQuery equivalents. BigQuery offers the following additional aggregate functions:
ANY_VALUE
APPROX_COUNT_DISTINCT
APPROX_QUANTILES
APPROX_TOP_COUNT
APPROX_TOP_SUM
COUNTIF
LOGICAL_AND
LOGICAL_OR
STRING_AGG
Teradata | BigQuery |
---|---|
AVG |
AVG |
BITAND |
BIT_AND |
BITNOT |
Bitwise
not operator (~ ) |
BITOR |
BIT_OR |
BITXOR |
BIT_XOR |
CORR |
CORR |
COUNT |
COUNT |
COVAR_POP |
COVAR_POP |
COVAR_SAMP |
COVAR_SAMP |
MAX |
MAX |
MIN |
MIN |
REGR_AVGX |
AVG( |
REGR_AVGY |
AVG( |
REGR_COUNT |
SUM( |
REGR_INTERCEPT |
AVG(dep_var_expression) - AVG(ind_var_expression) *
(COVAR_SAMP(ind_var_expression,
|
REGR_R2 |
(COUNT(dep_var_expression)* |
REGR_SLOPE |
- COVAR_SAMP(ind_var_expression, |
REGR_SXX |
SUM(POWER(ind_var_expression, 2)) -
COUNT(ind_var_expression) * |
REGR_SXY |
SUM(ind_var_expression * dep_var_expression) -
COUNT(ind_var_expression) |
REGR_SYY |
SUM(POWER(dep_var_expression, 2)) -
COUNT(dep_var_expression) |
SKEW |
Custom user-defined function. |
STDDEV_POP |
STDDEV_POP |
STDDEV_SAMP |
STDDEV_SAMP,
STDDEV |
SUM |
SUM |
VAR_POP |
VAR_POP |
VAR_SAMP |
VAR_SAMP,
VARIANCE |
Analytical functions and window functions
The following table maps common Teradata analytic and aggregate analytic functions to their BigQuery window function equivalents. BigQuery offers the following additional functions:
Date/time functions
The following table maps common Teradata date/time functions to their BigQuery equivalents. BigQuery offers the following additional date/time functions:
CURRENT_DATETIME
DATE_ADD
DATE_DIFF
DATE_FROM_UNIX_DATE
DATE_SUB
DATE_TRUNC
DATETIME
DATETIME_ADD
DATETIME_DIFF
DATETIME_SUB
DATETIME_TRUNC
PARSE_DATE
PARSE_DATETIME
PARSE_TIME
PARSE_TIMESTAMP
STRING
TIME
TIME_ADD
TIME_DIFF
TIME_SUB
TIME_TRUNC
TIMESTAMP
TIMESTAMP_ADD
TIMESTAMP_DIFF
TIMESTAMP_MICROS
TIMESTAMP_MILLIS
TIMESTAMP_SECONDS
TIMESTAMP_SUB
TIMESTAMP_TRUNC
UNIX_DATE
UNIX_MICROS
UNIX_MILLIS
UNIX_SECONDS
Teradata | BigQuery |
---|---|
ADD_MONTHS |
DATE_ADD,
TIMESTAMP_ADD
|
CURRENT_DATE |
CURRENT_DATE
|
CURRENT_TIME |
CURRENT_TIME
|
CURRENT_TIMESTAMP |
CURRENT_TIMESTAMP
|
DATE + k |
DATE_ADD(date_expression,
INTERVAL k DAY)
|
DATE - k |
DATE_SUB(date_expression,
INTERVAL k DAY)
|
EXTRACT |
EXTRACT(DATE),
EXTRACT(TIMESTAMP)
|
FORMAT_DATE
|
|
FORMAT_DATETIME
|
|
FORMAT_TIME
|
|
FORMAT_TIMESTAMP
|
|
LAST_DAY |
LAST_DAY
Note: This function supports both DATE and DATETIME input expressions.
|
MONTHS_BETWEEN |
DATE_DIFF(date_expression, date_expression, MONTH)
|
NEXT_DAY |
DATE_ADD( |
OADD_MONTHS |
DATE_SUB( |
td_day_of_month |
EXTRACT(DAY FROM date_expression) |
td_day_of_week |
EXTRACT(DAYOFWEEK FROM date_expression) |
td_day_of_year |
EXTRACT(DAYOFYEAR FROM date_expression) |
td_friday |
DATE_TRUNC( |
td_monday |
DATE_TRUNC( |
td_month_begin |
DATE_TRUNC(date_expression, MONTH)
|
td_month_end |
DATE_SUB( |
td_month_of_calendar |
(EXTRACT(YEAR FROM date_expression) - 1900) * 12 + EXTRACT(MONTH FROM date_expression)
|
td_month_of_quarter |
EXTRACT(MONTH FROM date_expression) |
td_month_of_year |
EXTRACT(MONTH FROM date_expression) |
td_quarter_begin |
DATE_TRUNC(date_expression, QUARTER)
|
td_quarter_end |
DATE_SUB( |
td_quarter_of_calendar |
(EXTRACT(YEAR FROM date_expression) |
td_quarter_of_year |
EXTRACT(QUARTER FROM date_expression) |
td_saturday |
DATE_TRUNC( |
td_sunday |
DATE_TRUNC( |
td_thursday |
DATE_TRUNC( |
td_tuesday |
DATE_TRUNC( |
td_wednesday |
DATE_TRUNC( |
td_week_begin |
DATE_TRUNC(date_expression, WEEK)
|
td_week_end |
DATE_SUB( |
td_week_of_calendar |
(EXTRACT(YEAR FROM date_expression) - 1900) * 52 + EXTRACT(WEEK FROM date_expression)
|
td_week_of_month |
EXTRACT(WEEK FROM date_expression) |
td_week_of_year |
EXTRACT(WEEK FROM date_expression) |
td_weekday_of_month |
CAST( |
td_year_begin |
DATE_TRUNC(date_expression, YEAR)
|
td_year_end |
DATE_SUB( |
td_year_of_calendar |
EXTRACT(YEAR FROM date_expression)
|
TO_DATE |
PARSE_DATE
|
TO_TIMESTAMP |
PARSE_TIMESTAMP
|
TO_TIMESTAMP_TZ |
PARSE_TIMESTAMP
|
String functions
The following table maps Teradata string functions to their BigQuery equivalents. BigQuery offers the following additional string functions:
BYTE_LENGTH
CODE_POINTS_TO_BYTES
ENDS_WITH
FROM_BASE32
FROM_BASE64
FROM_HEX
NORMALIZE
NORMALIZE_AND_CASEFOLD
REGEXP_CONTAINS
REGEXP_EXTRACT
REGEXP_EXTRACT_ALL
REPEAT
REPLACE
SAFE_CONVERT_BYTES_TO_STRING
SPLIT
STARTS_WITH
STRPOS
TO_BASE32
TO_BASE64
TO_CODE_POINTS
TO_HEX
Teradata | BigQuery |
---|---|
ASCII |
TO_CODE_POINTS(string_expression)[OFFSET(0)]
|
CHAR2HEXINT |
TO_HEX
|
CHARACTER LENGTH |
CHAR_LENGTH
|
CHARACTER LENGTH |
CHARACTER_LENGTH
|
CHR |
CODE_POINTS_TO_STRING( |
CONCAT, (|| operator) |
CONCAT, (|| operator)
|
CSV |
Custom user-defined function. |
CSVLD |
Custom user-defined function. |
FORMAT |
FORMAT
|
INDEX |
STRPOS(string, substring)
|
INITCAP |
INITCAP |
INSTR |
Custom user-defined function. |
LEFT |
SUBSTR(source_string, 1, length)
|
LENGTH |
LENGTH
|
LOWER |
LOWER
|
LPAD |
LPAD
|
LTRIM |
LTRIM
|
NGRAM |
Custom user-defined function. |
NVP |
Custom user-defined function. |
OREPLACE |
REPLACE
|
OTRANSLATE |
Custom user-defined function. |
POSITION |
STRPOS(string, substring)
|
REGEXP_INSTR |
STRPOS(source_string, Note: Returns first occurrence. |
REGEXP_REPLACE |
REGEXP_REPLACE |
REGEXP_SIMILAR |
IF(REGEXP_CONTAINS,1,0) |
REGEXP_SUBSTR |
REGEXP_EXTRACT, |
REGEXP_SPLIT_TO_TABLE |
Custom user-defined function. |
REVERSE |
REVERSE
|
RIGHT |
SUBSTR(source_string, -1, length)
|
RPAD |
RPAD
|
RTRIM |
RTRIM
|
STRTOK Note: Each character in the delimiter string argument is considered a separate delimiter character. The default delimiter is a space character. |
SPLIT(instring, delimiter)[ORDINAL(tokennum)] Note: The entire delimiter string argument is used as a single delimiter. The default delimiter is a comma. |
STRTOK_SPLIT_TO_TABLE |
Custom user-defined function |
SUBSTRING , SUBSTR |
SUBSTR
|
TRIM |
TRIM
|
UPPER |
UPPER
|
Math functions
The following table maps Teradata math functions to their BigQuery equivalents. BigQuery offers the following additional math functions:
Teradata | BigQuery |
---|---|
ABS |
ABS |
ACOS |
ACOS |
ACOSH |
ACOSH |
ASIN |
ASIN |
ASINH |
ASINH |
ATAN |
ATAN |
ATAN2 |
ATAN2 |
ATANH |
ATANH |
CEILING |
CEIL |
CEILING |
CEILING |
COS |
COS |
COSH |
COSH |
EXP |
EXP |
FLOOR |
FLOOR |
GREATEST |
GREATEST |
LEAST |
LEAST |
LN |
LN |
LOG |
LOG |
MOD (% operator) |
MOD |
NULLIFZERO |
NULLIF(expression, 0) |
POWER (** operator) |
POWER,
POW
|
RANDOM |
RAND |
ROUND |
ROUND |
SIGN |
SIGN |
SIN |
SIN |
SINH |
SINH |
SQRT |
SQRT |
TAN |
TAN |
TANH |
TANH |
TRUNC |
TRUNC |
ZEROIFNULL |
IFNULL(expression, 0),
COALESCE(expression, 0)
|
DML syntax
This section addresses differences in data management language syntax between Teradata and BigQuery.
INSERT
statement
Most Teradata INSERT
statements are compatible with BigQuery.
The following table shows exceptions.
DML scripts in BigQuery have slightly different consistency
semantics than the equivalent statements in Teradata. For an overview of
snapshot isolation and session and transaction handling, see the
CREATE INDEX
section elsewhere in this document.
Teradata | BigQuery |
---|---|
INSERT INTO table VALUES (...);
|
INSERT INTO
table (...) VALUES (...); Teradata offers a DEFAULT keyword for non-nullable columns.Note: In BigQuery, omitting column names in the INSERT statement only works if values for all columns in
the target table are included in ascending order based on their ordinal
positions. |
INSERT INTO table VALUES (1,2,3);
|
INSERT INTO
table VALUES (1,2,3),
Teradata has a concept of multi-statement request (MSR), which sends multiple INSERT statements
at a time. In BigQuery, this is not recommended due to the
implicit transaction boundary between statements.
Use multi-value
INSERT instead.BigQuery allows concurrent INSERT statements but might queue UPDATE .
To improve performance, consider the following approaches:
|
UPDATE
statement
Most Teradata UPDATE
statements
are compatible with BigQuery, except for the following items:
- When you use a
FROM
clause, the ordering of theFROM
andSET
clauses is reversed in Teradata and BigQuery. - In GoogleSQL, each
UPDATE
statement must include theWHERE
keyword, followed by a condition. To update all rows in the table, useWHERE true
.
As a best practice, you should group multiple DML mutations instead of single
UPDATE
and INSERT
statements. DML scripts in BigQuery have
slightly different consistency semantics than equivalent statements in Teradata.
For an overview on snapshot isolation and session and transaction handling, see
the
CREATE INDEX
section elsewhere in this document.
The following table shows Teradata UPDATE
statements and
BigQuery statements that accomplish the same tasks.
For more information about UPDATE
in BigQuery, see the
BigQuery UPDATE
examples
in the DML documentation.
Teradata | BigQuery | |
---|---|---|
UPDATE table_A
|
UPDATE table_A
|
|
UPDATE table alias
|
UPDATE table
|
|
UPDATE table_A
|
UPDATE table_A
|
DELETE
and TRUNCATE
statements
Both the DELETE
and TRUNCATE
statements are ways to remove rows from a
table without affecting the table schema or indexes. TRUNCATE
is not used in
either Teradata or BigQuery. However, you can use DELETE
statements to achieve the same effect.
In BigQuery, the DELETE
statement must have a WHERE
clause.
To delete all rows in the table (truncate), use WHERE true
. To speed truncate
operations up for very large tables, we recommend using the
CREATE OR REPLACE TABLE ... AS SELECT
statement, using a LIMIT 0
on the same table to replace itself. However,
make sure to manually add partitioning and clustering information when using it.
Teradata vacuums deleted rows later. This means that DELETE
operations are
initially faster than in BigQuery, but they require resources
later, especially large-scale DELETE
operations that impact the majority of a
table. To use a similar approach in BigQuery, we suggest reducing
the number of DELETE
operations, such as by copying the rows not to be deleted
into a new table. Alternatively, you can remove entire partitions.
Both of these options are designed to be faster operations than atomic DML mutations.
For more information about DELETE
in BigQuery, see the
DELETE
examples
in the DML documentation.
Teradata | BigQuery |
---|---|
BEGIN TRANSACTION;
|
Replacing the contents of a table with query output is the equivalent of a
transaction. You can do this with either a query operation
or a copy
operation. Using a query operation:
bq query --replace --destination_table table_A 'SELECT * FROM table_B';
Using a copy operation: bq cp -f table_A table_B |
DELETE database.table ALL; |
DELETE FROM table WHERE TRUE; Or for very large tables a faster way: CREATE OR REPLACE table AS SELECT * FROM table LIMIT 0;
|
MERGE
statement
The MERGE
statement can combine INSERT
, UPDATE
, and DELETE
operations
into a single "upsert" statement and perform the operations atomically. The
MERGE
operation must match at most one source row for each target row.
BigQuery and Teradata both follow ANSI Syntax.
Teradata's MERGE
operation is limited to matching primary keys within one
access module processor (AMP).
In contrast, BigQuery has no size or column limitation for
MERGE
operations, therefore using MERGE
is a useful optimization. However,
if the MERGE
is primarily a large delete, see optimizations for DELETE
elsewhere in this document.
DML scripts in BigQuery have slightly different
consistency semantics than equivalent statements in Teradata. For example,
Teradata's SET tables in session mode might
ignore duplicates
during a MERGE
operation. For an overview on handling MULTISET and
SET tables, snapshot isolation, and session and transaction handling,
see the
CREATE INDEX
section elsewhere in this document.
Rows-affected variables
In Teradata, the
ACTIVITY_COUNT
variable is a Teradata ANSI SQL extension populated with the number of rows
affected by a DML statement.
The @@row_count
system variable
in the Scripting feature has similar functionality.
In BigQuery it would be more common to check the numDmlAffectedRows
return value in the audit logs or the INFORMATION_SCHEMA
views.
DDL syntax
This section addresses differences in data definition language syntax between Teradata and BigQuery.
CREATE TABLE
statement
Most Teradata
CREATE TABLE
statements are compatible with BigQuery, except for the following
syntax elements, which are not used in BigQuery:
MULTISET
. See theCREATE INDEX
section.VOLATILE
. See the Temporary tables section.[NO] FALLBACK
. See the Rollback section.[NO] BEFORE JOURNAL
,[NO] AFTER JOURNAL
CHECKSUM = DEFAULT | val
DEFAULT MERGEBLOCKRATIO
PRIMARY INDEX (col, ...)
. See theCREATE INDEX
section.UNIQUE PRIMARY INDEX
. See theCREATE INDEX
section.CONSTRAINT
DEFAULT
IDENTITY
For more information about CREATE TABLE
in BigQuery, see the
BigQuery CREATE
examples
in the DML documentation.
Column options and attributes
The following column specifications for the CREATE TABLE
statement are not
used in BigQuery:
FORMAT 'format'
. See the Teradata type formatting section.CHARACTER SET name
. BigQuery always uses UTF-8 encoding.[NOT] CASESPECIFIC
COMPRESS val | (val, ...)
Teradata extends the ANSI standard with a column
TITLE
option. This feature can be similarly implemented in BigQuery
using the column description as shown in the following table. Note this option
is not available for Views.
Teradata | BigQuery |
---|---|
CREATE TABLE table (
|
CREATE TABLE dataset.table (
|
Temporary tables
Teradata supports volatile tables, which are often used to store intermediate results in scripts. There are several ways to achieve something similar to volatile tables in BigQuery:
CREATE TEMPORARY TABLE
can be used in Scripting, and is valid during the lifetime of the script. If the table has to exist beyond a script, you can use the other options in this list.Dataset TTL: Create a dataset that has a short time to live (for example, 1 hour) so that any tables created in the dataset are effectively temporary since they won't persist longer than the dataset's time to live. You can prefix all the table names in this dataset with
temp
to clearly denote that the tables are temporary.Table TTL: Create a table that has a table-specific short time to live using DDL statements similar to the following:
CREATE TABLE temp.name (col1, col2, ...) OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR));
WITH
clause: If a temporary table is needed only within the same block, use a temporary result using aWITH
statement or subquery. This is the most efficient option.
An often-used pattern in Teradata scripts (BTEQ)
is to create a permanent table, insert a value in it, use this like a temporary
table in ongoing statements, and then delete or truncate the table afterwards.
In effect, this uses the table as a constant variable (a semaphore). This
approach is not efficient in BigQuery, and we recommend using
real variables
in Scripting
instead, or using CREATE OR REPLACE
with AS SELECT
query syntax to create a table that already has values in it.
CREATE VIEW
statement
The following table shows equivalents between Teradata and
BigQuery for the CREATE VIEW
statement. The clauses for table
locking such as
LOCKING ROW FOR ACCESS
are not needed within BigQuery.
Teradata | BigQuery | Notes |
---|---|---|
CREATE VIEW view_name AS SELECT ...
|
CREATE VIEW
view_name AS SELECT ...
|
|
REPLACE VIEW view_name AS SELECT ...
|
CREATE OR REPLACE VIEW
|
|
Not supported |
CREATE VIEW IF NOT EXISTS
|
Creates a new view only if the view does not currently exist in the specified dataset. |
CREATE [UNIQUE] INDEX
statement
Teradata requires indices for all tables and requires special workarounds like MULTISET tables and NoPI Tables to work with non-unique or non-indexed data.
BigQuery does not require indices. This section describes approaches in BigQuery for how to create functionality similar to how indexes are used in Teradata where there is an actual business logic need.
Indexing for performance
Because it's a column-oriented database with query and storage optimization, BigQuery doesn't need explicit indexes. BigQuery provides functionality such as partitioning and clustering as well as nested fields, which can increase query efficiency and performance by optimizing how data is stored.
Teradata does not support materialized views. However, it offers
join indexes
using the CREATE JOIN INDEX
statement, which essentially materializes data
that's needed for a join. BigQuery does not need materialized
indexes to speed up performance, just as it doesn't need dedicated spool
space for joins.
For other optimization cases, materialized views can be used.
Indexing for consistency (UNIQUE, PRIMARY INDEX)
In Teradata, a unique index can be used to prevent rows with non-unique keys in a table. If a process tries to insert or update data that has a value that's already in the index, the operation either fails with an index violation (MULTISET tables) or silently ignores it (SET tables).
Because BigQuery doesn't provide explicit indexes, a MERGE
statement can be used instead to insert only unique records into a target table
from a staging table while discarding duplicate records. However, there is no
way to prevent a user with edit permissions from inserting a duplicate record,
because BigQuery never locks during INSERT
operations.
To generate an error for duplicate records in BigQuery, you can
use a MERGE
statement from a staging table, as shown in the following
example.
Teradata | BigQuery | |
---|---|---|
CREATE [UNIQUE] INDEX name;
|
MERGE `prototype.FIN_MERGE` t
|
More often, users prefer to
remove duplicates independently
in order to find errors in downstream systems.
BigQuery does not support DEFAULT
and IDENTITY
(sequences)
columns.
Indexing to achieve locking
Teradata provides resources in the
access module processor
(AMP); queries can consume all-AMP, single-AMP, or group-AMP resources. DDL
statements are all-AMP and therefore similar to a global DDL lock.
BigQuery doesn't have a lock mechanism like this and can run
concurrent queries and INSERT
statements up to your quota;
only concurrent UPDATE
DML statements have certain
concurrency implications:
UPDATE
operations against the same partition are queued to ensure snapshot
isolation, so you don't have to lock to prevent phantom reads or lost updates.
Because of these differences, the following Teradata elements are not used in BigQuery:
ON COMMIT DELETE ROWS;
ON COMMIT PRESERVE ROWS;
Procedural SQL statements
This section describes how to convert procedural SQL statements that are used
in stored procedures, functions, and triggers from Teradata
to BigQuery Scripting, procedures, or user-defined functions (UDFs).
All of these are available for system administrators to check using the
INFORMATION_SCHEMA
views.
CREATE PROCEDURE
statement
Stored procedures are supported as part of BigQuery Scripting.
In BigQuery, Scripting refers to any use of control statements, whereas procedures are named scripts (with arguments if needed) that can be called from other Scripts and stored permanently, if needed. A user-defined function (UDF) can also be written in JavaScript.
Teradata | BigQuery |
---|---|
CREATE PROCEDURE |
CREATE PROCEDURE
if a name is required, otherwise use inline with BEGIN
or in a single line with CREATE TEMP FUNCTION .
|
REPLACE PROCEDURE |
CREATE OR REPLACE PROCEDURE |
CALL |
CALL |
The sections that follow describe ways to convert existing Teradata procedural statements to BigQuery Scripting statements that have similar functionality.
Variable declaration and assignment
BigQuery variables are valid during the lifetime of the script.
Teradata | BigQuery |
---|---|
DECLARE |
DECLARE |
SET |
SET |
Error condition handlers
Teradata uses handlers on status codes in procedures for error control. In
BigQuery, error handling is a core feature of the main control
flow, similar to what other languages provide with TRY ... CATCH
blocks.
Teradata | BigQuery |
---|---|
DECLARE EXIT HANDLER
FOR SQLEXCEPTION |
BEGIN ... EXCEPTION WHEN ERROR THEN |
SIGNAL sqlstate |
RAISE message |
DECLARE CONTINUE HANDLER
FOR SQLSTATE VALUE 23505; |
Exception handlers that trigger for certain error conditions are not used by BigQuery. We recommend using ASSERT
statements where exit conditions are used for pre-checks or debugging, because this is ANSI SQL:2011 compliant. |
The SQLSTATE
variable in Teradata is similar to the @@error
system variable
in BigQuery. In BigQuery, it is more common to
investigate errors using audit logging
or the INFORMATION_SCHEMA
views.
Cursor declarations and operations
Because BigQuery doesn't support cursors or sessions, the following statements aren't used in BigQuery:
DECLARE cursor_name CURSOR [FOR | WITH] ...
PREPARE stmt_id FROM sql_str;
OPEN cursor_name [USING var, ...];
FETCH cursor_name INTO var, ...;
CLOSE cursor_name;
Dynamic SQL statements
The Scripting feature in BigQuery supports dynamic SQL statements like those shown in the following table.
Teradata | BigQuery |
---|---|
EXECUTE IMMEDIATE
sql_str; |
EXECUTE IMMEDIATE
sql_str; |
EXECUTE
stmt_id [USING var,...]; |
EXECUTE IMMEDIATE
stmt_id USING var; |
The following Dynamic SQL statements are not used in BigQuery:
PREPARE stmt_id FROM sql_str;
Flow-of-control statements
The Scripting feature in BigQuery supports flow-of-control statements like those shown in the following table.
Teradata | BigQuery |
---|---|
IF condition THEN stmts ELSE stmts END IF
|
IF condition THEN stmts ELSE stmts END IF |
label_name: LOOP stmts END LOOP label_name;
|
GOTO-style block constructs are not used in BigQuery. We recommend rewriting them as user-defined functions (UDFs) or use ASSERT statements where they are used for error handling.
|
REPEAT stmts UNTIL condition END REPEAT;
|
WHILE condition DO stmts END WHILE |
LEAVE outer_proc_label;
|
LEAVE is not used for GOTO-style blocks; it is used as a synonym for BREAK to leave a WHILE loop. |
LEAVE label;
|
LEAVE is not used for GOTO-style blocks; it is used as a synonym for BREAK to leave a WHILE loop. |
WITH RECURSIVE temp_table AS ( ... );
|
Recursive queries (also known as recursive common table expressions (CTE)) are not used in BigQuery. They can be rewritten using arrays of
UNION ALL . |
The following flow-of-control statements are not used in BigQuery because BigQuery doesn't use cursors or sessions:
Metadata and transaction SQL statements
Teradata | BigQuery |
---|---|
HELP TABLE table_name;
HELP VIEW view_name;
|
SELECT The same query is valid to get column information for views. For more information, see the Column view in the BigQuery INFORMATION_SCHEMA . |
SELECT * FROM dbc.tables WHERE tablekind = 'T'; (Teradata DBC view) |
SELECT For more information, see Introduction to BigQuery INFORMATION_SCHEMA . |
HELP STATISTICS table_name; |
APPROX_COUNT_DISTINCT(col) |
COLLECT STATS USING SAMPLE ON table_name column (...); |
Not used in BigQuery. |
LOCKING TABLE table_name FOR EXCLUSIVE; |
BigQuery always uses snapshot isolation. For details, see Consistency guarantees elsewhere in this document. |
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL ... |
BigQuery always uses Snapshot Isolation. For details, see Consistency guarantees elsewhere in this document. |
BEGIN TRANSACTION; |
BigQuery always uses Snapshot Isolation. For details, see Consistency guarantees elsewhere in this document. |
EXPLAIN ... |
Not used in BigQuery. Similar features are the query plan explanation in the BigQuery web UI and the slot allocation visible in the INFORMATION_SCHEMA
views and in audit logging in
Cloud Monitoring. |
Multi-statement and multi-line SQL statements
Both Teradata and BigQuery support transactions (sessions) and therefore support statements separated by semicolons that are consistently executed together. For more information, see Multi-statement transactions.
Error codes and messages
Teradata error codes and BigQuery error codes are different. Providing a REST API, BigQuery relies primarily on HTTP status codes plus detailed error messages.
If your application logic is currently catching the following errors, try to eliminate the source of the error, because BigQuery will not return the same error codes.
SQLSTATE = '02000'
—"Row not found"SQLSTATE = '21000'
—"Cardinality violation (Unique Index)"SQLSTATE = '22000'
—"Data violation (Data Type)"SQLSTATE = '23000'
—"Constraint Violation"
In BigQuery, it would be more common to use the INFORMATION_SCHEMA
views
or audit logging to drill down into errors.
For information about how to handle errors in Scripting, see the sections that follow.
Consistency guarantees and transaction isolation
Both Teradata and BigQuery are atomic—that is, ACID-compliant on
a per-mutation level across many rows. For example, a MERGE
operation is
completely atomic, even with multiple inserted and updated values.
Transactions
Teradata provides either Read Uncommitted (allowing dirty reads) or Serializable transaction isolation level when running in session mode (instead of auto-commit mode). In the best case, Teradata achieves strictly serializable isolation by using pessimistic locking against a row hash across all columns of rows across all partitions. Deadlocks are possible. DDL always forces a transaction boundary. Teradata Fastload jobs run independently, but only on empty tables.
BigQuery also
supports transactions.
BigQuery helps ensure
optimistic concurrency control
(first to commit wins) with
snapshot isolation,
in which a query reads the last committed data before the query starts. This
approach guarantees the same level of consistency on a per-row, per-mutation
basis and across rows within the same DML statement, yet avoids deadlocks. In
the case of multiple UPDATE
statements against the same table, BigQuery
switches to
pessimistic concurrency control
and queues
multiple UPDATE
statements, automatically retrying in case of conflicts. INSERT
DML statements and
load jobs can run concurrently and independently to append to tables.
Rollback
Teradata supports
two session rollback modes,
ANSI session mode and Teradata session mode (SET SESSION CHARACTERISTICS
and
SET SESSION TRANSACTION
), depending on which rollback mode you want. In
failure cases, the transaction might not be rolled back.
BigQuery supports the
ROLLBACK TRANSACTION
statement.
There is no ABORT
statement
in BigQuery.
Database limits
Always check the BigQuery public documentation for the latest quotas and limits. Many quotas for large-volume users can be raised by contacting the Cloud Support team. The following table shows a comparison of the Teradata and BigQuery database limits.
Limit | Teradata | BigQuery |
---|---|---|
Tables per database | Unrestricted | Unrestricted |
Columns per table | 2,048 | 10,000 |
Maximum row size | 1 MB | 100 MB |
Column and table name length | 128 Unicode chars | 16,384 Unicode characters |
Rows per table | Unlimited | Unlimited |
Maximum SQL request length | 1 MB | 1 MB (maximum unresolved GoogleSQL query length) 12 MB (maximum resolved legacy and GoogleSQL query length) Streaming:
|
Maximum request and response size | 7 MB (request), 16 MB (response) | 10 MB (request) and 10 GB (response), or virtually unlimited if you use pagination or the Cloud Storage API. |
Maximum number of concurrent sessions | 120 per parsing engine (PE) | 100 concurrent queries (can be raised with a slot reservation), 300 concurrent API requests per user. |
Maximum number of concurrent (fast) loads | 30 (default 5) | No concurrency limit; jobs are queued. 100,000 load jobs per project per day. |