Migrating Teradata to BigQuery: SQL translation reference

This document details the similarities and differences in SQL syntax between Teradata and BigQuery to help you accelerate the planning and execution of moving your EDW (Enterprise Data Warehouse) to BigQuery in Google Cloud. Teradata data warehousing is designed to work with Teradata-specific SQL syntax. Scripts written for Teradata might need to be altered before you can use them in BigQuery, because the SQL dialects vary between the services. Google Cloud partners have tools for automating the conversion of Teradata SQL scripts. If you are interested in learning more, contact your account representative.

The series that discusses specifics of transitioning from Teradata consists of the following parts:

For an overview of transitioning from an on-premises data warehouse to BigQuery on Google Cloud, see the series that begins with Migrating data warehouses to BigQuery: Introduction and overview

Data types

This section shows equivalents between data types in Teradata and in BigQuery.

Teradata BigQuery Notes
INTEGER INT64
SMALLINT INT64
BYTEINT INT64
BIGINT INT64
DECIMAL NUMERIC Unlike Teradata, BigQuery's NUMERIC data type does not enforce custom digit or scale bounds (constraints).

Teradata allows you to insert values that have higher precision and rounds the output; however, it keeps the high precision in calculations. This may lead to unexpected rounding behavior compared to the ANSI standard.
FLOAT FLOAT64
NUMERIC NUMERIC Unlike Teradata, BigQuery's NUMERIC data type does not enforce custom digit or scale bounds (constraints).

Teradata allows you to insert values that have higher precision and rounds the output; however, it keeps the high precision in calculations. This may lead to unexpected rounding behavior compared to the ANSI standard.
NUMBER NUMERIC Unlike Teradata, BigQuery's NUMERIC data type does not enforce custom digit or scale bounds (constraints).

Teradata allows you to insert values that have higher precision and rounds the output; however, it keeps the high precision in calculations. This may lead to unexpected rounding behavior compared to the ANSI standard.
REAL FLOAT64
CHAR/CHARACTER STRING Unlike Teradata's CHARACTER and CHAR data types, the STRING data type in BigQuery is variable length and does not require manually setting a maximum character length.
VARCHAR STRING Unlike Teradata's VARCHAR data type, the STRING data type in BigQuery is variable length and does not require manually setting a maximum character length.
CLOB STRING
JSON STRING
BLOB BYTES
BYTE BYTES
VARBYTE BYTES
DATE DATE BigQuery does not support custom formatting similar to what Teradata with DataForm in SDF supports.
TIME TIME
TIME WITH TIME ZONE TIME Teradata stores the TIME data type in UTC and allows you to pass an offset from UTC using the WITH TIME ZONE syntax. The TIME data type in BigQuery represents a time that's independent of any date or time zone.
TIMESTAMP TIMESTAMP Both Teradata and BigQuery TIMESTAMP data types have microsecond precision (but Teradata supports leap seconds, while BigQuery does not).

Both Teradata and BigQuery data types are usually associated with a UTC time zone (details).
TIMESTAMP WITH TIME ZONE TIMESTAMP The Teradata TIMESTAMP can be set to a different time zone system-wide, per user or per column (using WITH TIME ZONE).

The BigQuery TIMESTAMP type assumes UTC if you don't explicitly specify a time zone. Make sure you either export time zone information correctly (do not concatenate a DATE and TIME value without time zone information) so that BigQuery can convert it on import. Or make sure that you convert time zone information to UTC before exporting.

BigQuery has DATETIME for an abstraction between civil time, which does not show a timezone when it is output, and TIMESTAMP, which is a precise point in time that always shows the UTC timezone.
ARRAY ARRAY
MULTI-DIMENSIONAL ARRAY ARRAY In BigQuery, use an array of structs, with each struct containing a field of type ARRAY (For details, see the BigQuery documentation).
INTERVAL HOUR INT64
INTERVAL MINUTE INT64
INTERVAL SECOND INT64
INTERVAL DAY INT64
INTERVAL MONTH INT64
INTERVAL YEAR INT64
PERIOD(DATE) DATE, DATE PERIOD(DATE) should be converted to two DATE columns containing the start date and end date so that they can be used with analytic functions.
PERIOD(TIMESTAMP WITH TIME ZONE) TIMESTAMP, TIMESTAMP
PERIOD(TIMESTAMP) TIMESTAMP, TIMESTAMP
PERIOD(TIME) TIME, TIME
PERIOD(TIME WITH TIME ZONE) TIME, TIME
UDT STRING
XML STRING
TD_ANYTYPE STRING

For more information on type casting, see the next section.

Teradata type formatting

Teradata SQL uses a set of default formats for displaying expressions and column data, and for conversions between data types. For example, a PERIOD(DATE) data type in INTEGERDATE mode is formatted as YY/MM/DD by default. We suggest that you use ANSIDATE mode whenever possible to ensure ANSI SQL compliance, and use this chance to clean up legacy formats.

Teradata allows automatic application of custom formats using the FORMAT clause, without changing the underlying storage, either as a data type attribute when you create a table using DDL, or in a derived expression. For example, a FORMAT specification 9.99 rounds any FLOAT value to two digits. In BigQuery, this functionality has to be converted by using the ROUND() function.

This functionality requires handling of intricate edge cases. For instance, when the FORMAT clause is applied to a NUMERIC column, you must take into account special rounding and formatting rules. A FORMAT clause can be used to implicitly cast an INTEGER epoch value to a DATE format. Or a FORMAT specification X(6) on a VARCHAR column truncates the column value, and you have to therefore convert to a SUBSTR() function. This behavior is not ANSI SQL compliant. Therefore, we suggest not migrating column formats to BigQuery.

If column formats are absolutely required, use Views or user-defined functions (UDFs).

For information about the default formats that Teradata SQL uses for each data type, see the Teradata default formatting documentation.

Timestamp and date type formatting

When you convert timestamp and date formatting elements from Teradata SQL to BigQuery Standard SQL, you must pay attention to timezone differences between TIMESTAMP and DATETIME, as summarized in the following table.

Notice there are no parentheses in the Teradata formats, because the formats (CURRENT_*) are keywords, not functions.

Teradata BigQuery
CURRENT_TIMESTAMP
CURRENT_TIME
TIME and TIMESTAMP information in Teradata can have different time zone information, which is defined using WITH TIME ZONE. If possible, use CURRENT_TIMESTAMP(), which is formatted in ISO format. However, the output format does always show the UTC timezone. (Internally, BigQuery does not have a timezone.)

Note the following details on differences in the ISO format.

DATETIME is formatted based on output channel conventions. In the BigQuery command-line tool and BigQuery console, it's formatted using a T separator according to RFC 3339. However, in Python and Java JDBC, a space is used as a separator.

If you want to use an explicit format, use FORMAT_DATETIME(), which makes an explicit cast a string. For example, the following expression always returns a space separator:

CAST(CURRENT_DATETIME() AS STRING)

Teradata supports a DEFAULT keyword in TIME columns to set the current time (timestamp); this is not used in BigQuery.
CURRENT_DATE Dates are stored in Teradata as INT64 values using the following formula:

(YEAR - 1900) * 10000 + (MONTH * 100) + DAY

Dates can be formatted as integers.
BigQuery has a separate DATE format that always returns a date in ISO 8601 format.

DATE_FROM_UNIX_DATE can't be used, because it is 1970-based.

Teradata supports a DEFAULT keyword in DATE columns to set the current date; this is not used in BigQuery.
CURRENT_DATE-3 Date values are represented as integers. Teradata supports arithmetic operators for date types. For date types, use DATE_ADD() or DATE_SUB().

BigQuery uses arithmetic operators for data types: INT64, NUMERIC, and FLOAT64.
SYS_CALENDAR.CALENDAR Teradata provides a view for calendar operations to go beyond integer operations. Not used in BigQuery.
SET SESSION DATEFORM=ANSIDATE Set the session or system date format to ANSI (ISO 8601). BigQuery always uses ISO 8601, so make sure you convert Teradata dates and times.

Query syntax

This section addresses differences in query syntax between Teradata and BigQuery.

SELECT statement

Most Teradata SELECT statements are compatible with BigQuery. The following table contains a list of minor differences.

Teradata BigQuery
SEL Convert to SELECT. BigQuery does not use the SEL abbreviation.
SELECT
  (subquery) AS flag,
  CASE WHEN flag = 1 THEN ...
In BigQuery, columns cannot reference the output of other columns defined within the same select list. Prefer moving a subquery into a WITH clause.

WITH flags AS (
  subquery
),
SELECT
  CASE WHEN flags.flag = 1 THEN ...
SELECT * FROM table
WHERE A LIKE ANY ('string1', 'string2')
BigQuery does not use the ANY logical predicate.

The same functionality can be achieved using multiple OR operators:

SELECT * FROM table
WHERE col LIKE 'string1' OR
      col LIKE 'string2'


In this case, string comparison also differs. See Comparison operators.
SELECT TOP 10 * FROM table BigQuery uses LIMIT at the end of a query instead of TOP n following the SELECT keyword.

Comparison operators

The following table shows Teradata comparison operators that are specific to Teradata and must be converted to the ANSI SQL:2011 compliant operators used in BigQuery.

For information about operators in BigQuery, see the Operators section of the BigQuery documentation.

Teradata BigQuery Notes
exp EQ exp2
exp IN (exp2, exp3)
exp = exp2
exp IN (exp2, exp3)

To keep non-ANSI semantics for NOT CASESPECIFIC, you can use
RTRIM(UPPER(exp)) = RTRIM(UPPER(exp2))
When comparing strings for equality, Teradata might ignore trailing whitespaces, while BigQuery considers them part of the string. For example, 'xyz'=' xyz' is TRUE in Teradata but FALSE in BigQuery.

Teradata also provides a NOT CASESPECIFIC column attribute that instructs Teradata to ignore case when comparing two strings. BigQuery is always case specific when comparing strings. For example, 'xYz' = 'xyz' is TRUE in Teradata but FALSE in BigQuery.
exp LE exp2 exp <= exp2
exp LT exp2 exp < exp2
exp NE exp2 exp <> exp2
exp != exp2
exp GE exp2 exp >= exp2
exp GT exp2 exp > exp2

JOIN conditions

BigQuery and Teradata support the same JOIN, ON, and USING conditions. The following table contains a list of minor differences.

Teradata BigQuery Notes
FROM A LEFT OUTER JOIN B ON A.date > B.start_date AND A.date < B.end_date FROM A LEFT OUTER JOIN (SELECT d FROM B JOIN UNNEST(GENERATE_DATE_ARRAY(B.start_date, B.end_date)) d) B ON A.date = B.date BigQuery supports inequality JOIN clauses for all inner joins or if at least one equality condition is given (=). But not just one inequality condition (= and <) in an OUTER JOIN. Such constructs are sometimes used to query date or integer ranges. BigQuery prevents users from inadvertently creating large cross joins.
FROM A, B ON A.id = B.id FROM A JOIN B ON A.id = B.id Using a comma between tables in Teradata is equal to an INNER JOIN, while in BigQuery it equals a CROSS JOIN (Cartesian product). Because the comma in BigQuery legacy SQL is treated as UNION, we recommend making the operation explicit to avoid confusion.
FROM A JOIN B ON (COALESCE(A.id , 0) = COALESCE(B.id, 0)) FROM A JOIN B ON (COALESCE(A.id , 0) = COALESCE(B.id, 0)) No difference for scalar (constant) functions.
FROM A JOIN B ON A.id = (SELECT MAX(B.id) FROM B) FROM A JOIN (SELECT MAX(B.id) FROM B) B1 ON A.id = B1.id BigQuery prevents users from using subqueries, correlated subqueries, or aggregations in join predicates. This lets BigQuery parallelize queries.

Type conversion and casting

BigQuery has fewer but wider data types than Teradata, which requires BigQuery to be stricter in casting.

Teradata BigQuery Notes
exp EQ exp2
exp IN (exp2, exp3)
exp = exp2
exp IN (exp2, exp3)

To keep non-ANSI semantics for NOT CASESPECIFIC, you can use
RTRIM(UPPER(exp)) = RTRIM(UPPER(exp2))
When comparing strings for equality, Teradata might ignore trailing whitespaces, while BigQuery considers them part of the string. For example, 'xyz'=' xyz' is TRUE in Teradata but FALSE in BigQuery.

Teradata also provides a NOT CASESPECIFIC column attribute that instructs Teradata to ignore case when comparing two strings. BigQuery is always case specific when comparing strings. For example, 'xYz' = 'xyz' is TRUE in Teradata but FALSE in BigQuery.
CAST(long_varchar_colum AS CHAR(6)) LPAD(long_varchar_colum, 6) Casting a character column in Teradata is sometimes used as a non-standard and non-optimal way to create a padded substring.
CAST(92617 AS TIME) 92617 (FORMAT '99:99:99') PARSE_TIME("%k%M%S", CAST(92617 AS STRING))
Teradata performs many more implicit type conversions and rounding than BigQuery, which is generally stricter and enforces ANSI standards.
(This example returns 09:26:17)
CAST(48.5 (FORMAT 'zz') AS FLOAT) CAST(SUBSTR(CAST(48.5 AS STRING), 0, 2) AS FLOAT64)
Floating point and numeric data types can require special rounding rules when applied with formats such as currencies.
(This example returns 48)

See also Comparison operators and column formats. Both comparisons and column formatting can behave like type casts.

QUALIFY, ROWS clauses

The QUALIFY clause in Teradata allows you to filter results for window functions. Alternatively, a ROWS phrase can be used for the same task. These work similar to a HAVING condition for a GROUP clause, limiting the output of what in BigQuery are called aggregate analytic functions.

Teradata BigQuery
SELECT col1, col2
FROM table
QUALIFY ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) = 1;
The Teradata QUALIFY clause with an analytics function like ROW_NUMBER(), SUM(), COUNT() and with OVER PARTITION BY is expressed in BigQuery as a WHERE clause on a subquery that contains the analytics value.

Using ROW_NUMBER():

SELECT col1, col2
FROM (
  SELECT col1, col2,
  ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) RN
  FROM table
) WHERE RN = 1;


Using ARRAY_AGG, which supports larger partitions:

SELECT
  result.*
FROM (
  SELECT
    ARRAY_AGG(table ORDER BY table.col2
      DESC LIMIT 1)[OFFSET(0)]
  FROM table
  GROUP BY col1
) AS result;
SELECT col1, col2
FROM table
AVG(col1) OVER (PARTITION BY col1 ORDER BY col2 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW);
SELECT col1, col2
FROM table
AVG(col1) OVER (PARTITION BY col1 ORDER BY col2 RANGE BETWEEN 2 PRECEDING AND CURRENT ROW);


In BigQuery, both RANGE and ROWS can be used in the window frame clause. However, window clauses can only be used with analytic functions like AVG(), not with numbering functions like ROW_NUMBER().

NORMALIZE keyword

Teradata provides the NORMALIZE keyword for SELECT clauses to coalesce overlapping periods or intervals into a single period or interval that encompasses all individual period values.

BigQuery does not support the PERIOD type, so any PERIOD type column in Teradata has to be inserted into BigQuery as two separate DATE or DATETIME fields that correspond to the start and end of the period.

Teradata BigQuery
SELECT NORMALIZE
    client_id,
    item_sid,
    BEGIN(period) AS min_date,
    END(period) AS max_date,
  FROM
    table;
SELECT
  t.client_id,
  t.item_sid,
  t.min_date,
  MAX(t.dwh_valid_to) AS max_date
FROM (
  SELECT
    d1.client_id,
    d1.item_sid,
    d1.dwh_valid_to AS dwh_valid_to,
    MIN(d2.dwh_valid_from) AS min_date
  FROM
    table d1
  LEFT JOIN
    table d2
  ON
    d1.client_id = d2.client_id
    AND d1.item_sid = d2.item_sid
    AND d1.dwh_valid_to >= d2.dwh_valid_from
    AND d1.dwh_valid_from < = d2.dwh_valid_to
  GROUP BY
    d1.client_id,
    d1.item_sid,
    d1.dwh_valid_to ) t
GROUP BY
  t.client_id,
  t.item_sid,
  t.min_date;

Functions

The following sections list mappings between Teradata functions and BigQuery equivalents.

Aggregate functions

The following table maps common Teradata aggregate, statistical aggregate, and approximate aggregate functions to their BigQuery equivalents. BigQuery offers the following additional aggregate functions:

Teradata BigQuery
AVG AVG
BITAND BIT_AND
BITNOT Bitwise not operator (~)
BITOR BIT_OR
BITXOR BIT_XOR
CORR CORR
COUNT COUNT
COVAR_POP COVAR_POP
COVAR_SAMP COVAR_SAMP
MAX MAX
MIN MIN
REGR_AVGX AVG(
  IF(dep_var_expression is NULL
     OR ind_var_expression is NULL,
     NULL, ind_var_expression)
)
REGR_AVGY AVG(
  IF(dep_var_expression is NULL
     OR ind_var_expression is NULL,
     NULL, dep_var_expression)
)
REGR_COUNT SUM(
  IF(dep_var_expression is NULL
     OR ind_var_expression is NULL,
     NULL, 1)
)
REGR_INTERCEPT AVG(dep_var_expression) - AVG(ind_var_expression) * (COVAR_SAMP(ind_var_expression,
              dep_var_expression)
   / VARIANCE(ind_var_expression))
REGR_R2 (COUNT(dep_var_expression)*
 SUM(ind_var_expression * dep_var_expression) -
 SUM(dep_var_expression) * SUM(ind_var_expression))
SQRT(
     (COUNT(ind_var_expression)*
      SUM(POWER(ind_var_expression, 2))*
      POWER(SUM(ind_var_expression),2))*
     (COUNT(dep_var_expression)*
      SUM(POWER(dep_var_expression, 2))*
      POWER(SUM(dep_var_expression), 2)))
REGR_SLOPE - COVAR_SAMP(ind_var_expression,
            dep_var_expression)
/ VARIANCE(ind_var_expression)
REGR_SXX SUM(POWER(ind_var_expression, 2)) - COUNT(ind_var_expression) *
  POWER(AVG(ind_var_expression),2)
REGR_SXY SUM(ind_var_expression * dep_var_expression) - COUNT(ind_var_expression)
  * AVG(ind) * AVG(dep_var_expression)
REGR_SYY SUM(POWER(dep_var_expression, 2)) - COUNT(dep_var_expression)
  * POWER(AVG(dep_var_expression),2)
SKEW Custom user-defined function.
STDDEV_POP STDDEV_POP
STDDEV_SAMP STDDEV_SAMP, STDDEV
SUM SUM
VAR_POP VAR_POP
VAR_SAMP VAR_SAMP, VARIANCE

Analytical functions

The following table maps common Teradata analytic and aggregate analytic functions to their BigQuery equivalents. BigQuery offers the following additional analytics functions:

Teradata BigQuery
ARRAY_AGG ARRAY_AGG
ARRAY_CONCAT, (|| operator) ARRAY_CONCAT_AGG, (|| operator)
BITAND BIT_AND
BITNOT Bitwise not operator (~)
BITOR BIT_OR
BITXOR BIT_XOR
CORR CORR
COUNT COUNT
COVAR_POP COVAR_POP
COVAR_SAMP COVAR_SAMP
CUME_DIST CUME_DIST
DENSE_RANK (ANSI) DENSE_RANK
FIRST_VALUE FIRST_VALUE
LAST_VALUE LAST_VALUE
MAX MAX
MIN MIN
PERCENT_RANK PERCENT_RANK
PERCENTILE_CONT, PERCENTILE_DISC PERCENTILE_CONT, PERCENTILE_DISC
RANK (ANSI) RANK
ROW_NUMBER ROW_NUMBER
STDDEV_POP STDDEV_POP
STDDEV_SAMP STDDEV_SAMP, STDDEV
SUM SUM
VAR_POP VAR_POP
VAR_SAMP VAR_SAMP, VARIANCE

Date/time functions

The following table maps common Teradata date/time functions to their BigQuery equivalents. BigQuery offers the following additional date/time functions:

Teradata BigQuery
ADD_MONTHS DATE_ADD, TIMESTAMP_ADD
CURRENT_DATE CURRENT_DATE
CURRENT_TIME CURRENT_TIME
CURRENT_TIMESTAMP CURRENT_TIMESTAMP
DATE + k DATE_ADD(date_expression, INTERVAL k DAY)
DATE - k DATE_SUB(date_expression, INTERVAL k DAY)
EXTRACT EXTRACT(DATE), EXTRACT(TIMESTAMP)
FORMAT_DATE
FORMAT_DATETIME
FORMAT_TIME
FORMAT_TIMESTAMP
LAST_DAY DATE_SUB(
  DATE_TRUNC(
    DATE_ADD(
      date_expression,
      INTERVAL 1 MONTH
    ),
    MONTH
  ),
  INTERVAL 1 DAY )
MONTHS_BETWEEN DATE_DIFF(date_expression, date_expression, MONTH)
NEXT_DAY DATE_ADD()
  DATE_TRUNC(
    date_expression,
    WEEK(day_value)
  ),
  INTERVAL 1 WEEK
)
OADD_MONTHS DATE_SUB(
  DATE_TRUNC(
    DATE_ADD(
      date_expression,
      INTERVAL num_months MONTH
    ),
    MONTH
  ),
  INTERVAL 1 DAY
)
td_day_of_month EXTRACT(DAY FROM date_expression)
EXTRACT(DAY FROM timestamp_expression)
td_day_of_week EXTRACT(DAYOFWEEK FROM date_expression)
EXTRACT(DAYOFWEEK FROM timestamp_expression)
td_day_of_year EXTRACT(DAYOFYEAR FROM date_expression)
EXTRACT(DAYOFYEAR FROM timestamp_expression)
td_friday DATE_TRUNC(
  date_expression,
  WEEK(FRIDAY)
)
td_monday DATE_TRUNC(
  date_expression,
  WEEK(MONDAY)
)
td_month_begin DATE_TRUNC(date_expression, MONTH)
td_month_end DATE_SUB(
  DATE_TRUNC(
    DATE_ADD(
      date_expression,
      INTERVAL 1 MONTH
    ),
    MONTH
  ),
  INTERVAL 1 DAY
)
td_month_of_calendar (EXTRACT(YEAR FROM date_expression) - 1900) * 12 + EXTRACT(MONTH FROM date_expression)
td_month_of_quarter EXTRACT(MONTH FROM date_expression)
- ((EXTRACT(QUARTER FROM date_expression) - 1) * 3)
td_month_of_year EXTRACT(MONTH FROM date_expression)
EXTRACT(MONTH FROM timestamp_expression)
td_quarter_begin DATE_TRUNC(date_expression, QUARTER)
td_quarter_end DATE_SUB(
  DATE_TRUNC(
    DATE_ADD(
      date_expression,
      INTERVAL 1 QUARTER
    ),
    QUARTER
  ),
  INTERVAL 1 DAY
)
td_quarter_of_calendar (EXTRACT(YEAR FROM date_expression)
- 1900) * 4
+ EXTRACT(QUARTER FROM date_expression)
td_quarter_of_year EXTRACT(QUARTER FROM date_expression)
EXTRACT(QUARTER FROM timestamp_expression)
td_saturday DATE_TRUNC(
  date_expression,
  WEEK(SATURDAY)
)
td_sunday DATE_TRUNC(
  date_expression,
  WEEK(SUNDAY)
)
td_thursday DATE_TRUNC(
  date_expression,
  WEEK(THURSDAY)
)
td_tuesday DATE_TRUNC(
  date_expression,
  WEEK(TUESDAY)
)
td_wednesday DATE_TRUNC(
  date_expression,
  WEEK(WEDNESDAY)
)
td_week_begin DATE_TRUNC(date_expression, WEEK)
td_week_end DATE_SUB(
  DATE_TRUNC(
    DATE_ADD(
      date_expression,
      INTERVAL 1 WEEK
    ),
    WEEK
  ),
  INTERVAL 1 DAY
)
td_week_of_calendar (EXTRACT(YEAR FROM date_expression) - 1900) * 52 + EXTRACT(WEEK FROM date_expression)
td_week_of_month EXTRACT(WEEK FROM date_expression)
- EXTRACT(WEEK FROM DATE_TRUNC(date_expression, MONTH))
td_week_of_year EXTRACT(WEEK FROM date_expression)
EXTRACT(WEEK FROM timestamp_expression)
td_weekday_of_month CAST(
  CEIL(
    EXTRACT(DAY FROM date_expression)
    / 7
  ) AS INT64
)
td_year_begin DATE_TRUNC(date_expression, YEAR)
td_year_end DATE_SUB(
  DATE_TRUNC(
    DATE_ADD(
      date_expression,
      INTERVAL 1 YEAR
    ),
    YEAR
  ),
  INTERVAL 1 DAY
)
td_year_of_calendar EXTRACT(YEAR FROM date_expression)
TO_DATE PARSE_DATE
TO_TIMESTAMP PARSE_TIMESTAMP
TO_TIMESTAMP_TZ PARSE_TIMESTAMP

String functions

The following table maps Teradata string functions to their BigQuery equivalents. BigQuery offers the following additional string functions:

Teradata BigQuery
ASCII TO_CODE_POINTS(string_expression)[OFFSET(0)]
CHAR2HEXINT TO_HEX
CHARACTER LENGTH CHAR_LENGTH
CHARACTER LENGTH CHARACTER_LENGTH
CHR CODE_POINTS_TO_STRING(
  [mod(numeric_expression, 256)]
)
CONCAT, (|| operator) CONCAT, (|| operator)
CSV Custom user-defined function.
CSVLD Custom user-defined function.
FORMAT FORMAT
INDEX STRPOS(string, substring)
INITCAP Custom user-defined function.
INSTR Custom user-defined function.
LEFT SUBSTR(source_string, 1, length)
LENGTH LENGTH
LOWER LOWER
LPAD LPAD
LTRIM LTRIM
NGRAM Custom user-defined function.
NVP Custom user-defined function.
OREPLACE REPLACE
OTRANSLATE Custom user-defined function.
POSITION STRPOS(string, substring)
REGEXP_INSTR STRPOS(source_string,
REGEXP_EXTRACT(source_string, regexp_string))


Note: Returns first occurrence.
REGEXP_REPLACE REGEXP_REPLACE
REGEXP_SIMILAR IF(REGEXP_CONTAINS,1,0)
REGEXP_SUBSTR REGEXP_EXTRACT,
REGEXP_EXTRACT_ALL
REGEXP_SPLIT_TO_TABLE Custom user-defined function.
REVERSE REVERSE
RIGHT SUBSTR(source_string, -1, length)
RPAD RPAD
RTRIM RTRIM
STRTOK

Note: Each character in the delimiter string argument is considered a separate delimiter character. The default delimiter is a space character.
SPLIT(instring, delimiter)[ORDINAL(tokennum)]

Note: The entire delimiter string argument is used as a single delimiter. The default delimiter is a comma.
STRTOK_SPLIT_TO_TABLE Custom user-defined function
SUBSTRING, SUBSTR SUBSTR
TRIM TRIM
UPPER UPPER

Math functions

The following table maps Teradata math functions to their BigQuery equivalents. BigQuery offers the following additional math functions:

Teradata BigQuery
ABS ABS
ACOS ACOS
ACOSH ACOSH
ASIN ASIN
ASINH ASINH
ATAN ATAN
ATAN2 ATAN2
ATANH ATANH
CEILING CEIL
CEILING CEILING
COS COS
COSH COSH
EXP EXP
FLOOR FLOOR
GREATEST GREATEST
LEAST LEAST
LN LN
LOG LOG
MOD (% operator) MOD
NULLIFZERO NULLIF(expression, 0)
POWER (** operator) POWER, POW
RANDOM RAND
ROUND ROUND
SIGN SIGN
SIN SIN
SINH SINH
SQRT SQRT
TAN TAN
TANH TANH
TRUNC TRUNC
ZEROIFNULL IFNULL(expression, 0), COALESCE(expression, 0)

DML syntax

This section addresses differences in data management language syntax between Teradata and BigQuery.

INSERT statement

Most Teradata INSERT statements are compatible with BigQuery. The following table shows exceptions.

DML scripts in BigQuery have slightly different consistency semantics than the equivalent statements in Teradata. For an overview of snapshot isolation and session and transaction handling, see the CREATE INDEX section elsewhere in this document.

Teradata BigQuery
INSERT INTO table VALUES (...); INSERT INTO table (...) VALUES (...);

Teradata offers a DEFAULT keyword for non-nullable columns.

Note: In BigQuery, omitting column names in the INSERT statement only works if values for all columns in the target table are included in ascending order based on their ordinal positions.
INSERT INTO table VALUES (1,2,3);
INSERT INTO table VALUES (4,5,6);
INSERT INTO table VALUES (7,8,9);
INSERT INTO table VALUES (1,2,3),
                         (4,5,6),
                         (7,8,9);

Teradata has a concept of multi-statement request (MSR), which sends multiple INSERT statements at a time. In BigQuery, this is not recommended due to the implicit transaction boundary between statements. Use multi-value INSERT instead.

BigQuery allows concurrent INSERT statements but might queue UPDATE. To improve performance, consider the following approaches:
  • Combine multiple rows in a single INSERT statement, instead of one row per INSERT operation.
  • Combine multiple DML statements (including INSERT) using a MERGE statement.
  • Use CREATE TABLE ... AS SELECT to create and populate new tables instead of UPDATE or DELETE, in particular when querying partitioned fields or rollback or restore.

UPDATE statement

Most Teradata UPDATE statements are compatible with BigQuery, except for the following items:

  • When you use a FROM clause, the ordering of the FROM and SET clauses is reversed in Teradata and BigQuery.
  • In Standard SQL, each UPDATE statement must include the WHERE keyword, followed by a condition. To update all rows in the table, use WHERE true.

As a best practice, you should group multiple DML mutations instead of single UPDATE and INSERT statements. DML scripts in BigQuery have slightly different consistency semantics than equivalent statements in Teradata. For an overview on snapshot isolation and session and transaction handling, see the CREATE INDEX section elsewhere in this document.

The following table shows Teradata UPDATE statements and BigQuery statements that accomplish the same tasks.

For more information about UPDATE in BigQuery, see the BigQuery UPDATE examples in the DML documentation.

Teradata BigQuery
UPDATE table_A
FROM table_A, table_B
SET
  y = table_B.y,
  z = table_B.z + 1
WHERE table_A.x = table_B.x
  AND table_A.y IS NULL;
UPDATE table_A
SET
  y = table_B.y,
  z = table_B.z + 1
FROM table_B
WHERE table_A.x = table_B.x
  AND table_A.y IS NULL;
UPDATE table alias
SET x = x + 1
WHERE f(x) IN (0, 1);
UPDATE table
SET x = x + 1
WHERE f(x) IN (0, 1);
UPDATE table_A
FROM table_A, table_BA, B
SET z = table_B.z
WHERE table_A.x = table_B.x
  AND table_A.y = table_B.y;
UPDATE table_A
SET z = table_B.z
FROM table_B
WHERE table_A.x = table_B.x
  AND table_A.y = table_B.y;

DELETE, TRUNCATE statements

Both the DELETE and TRUNCATE statements are ways to remove rows from a table without affecting the table schema or indexes. TRUNCATE is not used in either Teradata or BigQuery. However, you can use DELETE statements to achieve the same effect.

In BigQuery, the DELETE statement must have a WHERE clause. To delete all rows in the table (truncate), use WHERE true. To speed truncate operations up for very large tables, we recommend using the CREATE OR REPLACE TABLE ... AS SELECT statement, using a LIMIT 0 on the same table to replace itself. However, make sure to manually add partitioning and clustering information when using it.

Teradata vacuums deleted rows later. This means that DELETE operations are initially faster than in BigQuery, but they require resources later, especially large-scale DELETE operations that impact the majority of a table. To use a similar approach in BigQuery, we suggest reducing the number of DELETE operations, such as by copying the rows not to be deleted into a new table. Alternatively, you can remove entire partitions. Both of these options are designed to be faster operations than atomic DML mutations.

For more information about DELETE in BigQuery, see the DELETE examples in the DML documentation.

Teradata BigQuery
BEGIN TRANSACTION;
LOCKING TABLE table_A FOR EXCLUSIVE;
DELETE FROM table_A;
INSERT INTO table_A SELECT * FROM table_B;
END TRANSACTION;
Replacing the contents of a table with query output is the equivalent of a transaction. You can do this with either a query operation or a copy operation.

Using a query operation:

bq query --replace --destination_table table_A 'SELECT * FROM table_B';

Using a copy operation:

bq cp -f table_A table_B
DELETE database.table ALL; DELETE FROM table WHERE TRUE;

Or for very large tables a faster way:
CREATE OR REPLACE table AS SELECT * FROM table LIMIT 0;

MERGE statement

The MERGE statement can combine INSERT, UPDATE, and DELETE operations into a single "upsert" statement and perform the operations atomically. The MERGE operation must match at most one source row for each target row. BigQuery and Teradata both follow ANSI Syntax.

Teradata's MERGE operation is limited to matching primary keys within one access module processor (AMP). In contrast, BigQuery has no size or column limitation for MERGE operations, therefore using MERGE is a useful optimization. However, if the MERGE is primarily a large delete, see optimizations for DELETE elsewhere in this document.

DML scripts in BigQuery have slightly different consistency semantics than equivalent statements in Teradata. For example, Teradata's SET tables in session mode might ignore duplicates during a MERGE operation. For an overview on handling MULTISET and SET tables, snapshot isolation, and session and transaction handling, see the CREATE INDEX section elsewhere in this document.

Rows-affected variables

In Teradata, the ACTIVITY_COUNT variable is a Teradata ANSI SQL extension populated with the number of rows affected by a DML statement.

The @@row_count system variable in the Scripting feature has similar functionality. In BigQuery it would be more common to check the numDmlAffectedRows return value in the audit logs or the INFORMATION_SCHEMA views.

DDL syntax

This section addresses differences in data definition language syntax between Teradata and BigQuery.

CREATE TABLE statement

Most Teradata CREATE TABLE statements are compatible with BigQuery, except for the following syntax elements, which are not used in BigQuery:

For more information about CREATE TABLE in BigQuery, see the BigQuery CREATE examples in the DML documentation.

Column options and attributes

The following column specifications for the CREATE TABLE statement are not used in BigQuery:

Teradata extends the ANSI standard with a column TITLE option. This feature can be similarly implemented in BigQuery using the column description as shown in the following table. Note this option is not available for Views.

Teradata BigQuery
CREATE TABLE table (
col1 VARCHAR(30) TITLE 'column desc'
);
CREATE TABLE dataset.table (
  col1 STRING
OPTIONS(description="column desc")
);

Temporary tables

Teradata supports volatile tables, which are often used to store intermediate results in scripts. There are several ways to achieve something similar to volatile tables in BigQuery:

  • CREATE TEMPORARY TABLE can be used in Scripting, and is valid during the lifetime of the script. If the table has to exist beyond a script, you can use the other options in this list.

  • Dataset TTL: Create a dataset that has a short time to live (for example, 1 hour) so that any tables created in the dataset are effectively temporary since they won't persist longer than the dataset's time to live. You can prefix all the table names in this dataset with temp to clearly denote that the tables are temporary.

  • Table TTL: Create a table that has a table-specific short time to live using DDL statements similar to the following:

    CREATE TABLE temp.name (col1, col2, ...)
    OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR));
    
  • WITH clause: If a temporary table is needed only within the same block, use a temporary result using a WITH statement or subquery. This is the most efficient option.

An often-used pattern in Teradata scripts (BTEQ) is to create a permanent table, insert a value in it, use this like a temporary table in ongoing statements, and then delete or truncate the table afterwards. In effect, this uses the table as a constant variable (a semaphore). This approach is not efficient in BigQuery, and we recommend using real variables in Scripting instead, or using CREATE OR REPLACE with AS SELECT query syntax to create a table that already has values in it.

CREATE VIEW statement

The following table shows equivalents between Teradata and BigQuery for the CREATE VIEW statement. The clauses for table locking such as LOCKING ROW FOR ACCESS are not needed within BigQuery.

Teradata BigQuery Notes
CREATE VIEW view_name AS SELECT ... CREATE VIEW view_name AS SELECT ...
REPLACE VIEW view_name AS SELECT ... CREATE OR REPLACE VIEW
view_name AS
SELECT ...
Not supported CREATE VIEW IF NOT EXISTS
OPTIONS(view_option_list)
AS SELECT ...
Creates a new view only if the view does not currently exist in the specified dataset.

CREATE [UNIQUE] INDEX statement

Teradata requires indices for all tables and requires special workarounds like MULTISET tables and NoPI Tables to work with non-unique or non-indexed data.

BigQuery does not require indices. This section describes approaches in BigQuery for how to create functionality similar to how indexes are used in Teradata where there is an actual business logic need.

Indexing for performance

Because it's a column-oriented database with query and storage optimization, BigQuery doesn't need explicit indexes. BigQuery provides functionality such as partitioning and clustering as well as nested fields, which can increase query efficiency and performance by optimizing how data is stored.

Teradata does not support materialized views. However, it offers join indexes using the CREATE JOIN INDEX statement, which essentially materializes data that's needed for a join. BigQuery does not need materialized indexes to speed up performance, just as it doesn't need dedicated spool space for joins.

For other optimization cases, materialized views can be used.

Indexing for consistency (UNIQUE, PRIMARY INDEX)

In Teradata, a unique index can be used to prevent rows with non-unique keys in a table. If a process tries to insert or update data that has a value that's already in the index, the operation either fails with an index violation (MULTISET tables) or silently ignores it (SET tables).

Because BigQuery doesn't provide explicit indexes, a MERGE statement can be used instead to insert only unique records into a target table from a staging table while discarding duplicate records. However, there is no way to prevent a user with edit permissions from inserting a duplicate record, because BigQuery never locks during INSERT operations. To generate an error for duplicate records in BigQuery, you can use a MERGE statement from a staging table, as shown in the following example.

Teradata BigQuery
CREATE [UNIQUE] INDEX name; MERGE `prototype.FIN_MERGE` t
USING `prototype.FIN_TEMP_IMPORT` m
ON t.col1 = m.col1
  AND t.col2 = m.col2
WHEN MATCHED THEN
  UPDATE SET t.col1 = ERROR(CONCAT('Encountered error for ', m.col1, ' ', m.col2))
WHEN NOT MATCHED THEN
  INSERT (col1,col2,col3,col4,col5,col6,col7,col8) VALUES(col1,col2,col3,col4,col5,col6,CURRENT_TIMESTAMP(),CURRENT_TIMESTAMP());

More often, users prefer to remove duplicates independently in order to find errors in downstream systems.
BigQuery does not support DEFAULT and IDENTITY (sequences) columns.

Indexing to achieve locking

Teradata provides resources in the access module processor (AMP); queries can consume all-AMP, single-AMP, or group-AMP resources. DDL statements are all-AMP and therefore similar to a global DDL lock. BigQuery doesn't have a lock mechanism like this and can run concurrent queries and INSERT statements up to your quota; only concurrent UPDATE DML statements have certain concurrency implications: UPDATE operations against the same partition are queued to ensure snapshot isolation, so you don't have to lock to prevent phantom reads or lost updates.

Because of these differences, the following Teradata elements are not used in BigQuery:

  • ON COMMIT DELETE ROWS;
  • ON COMMIT PRESERVE ROWS;

Procedural SQL statements

This section describes how to convert procedural SQL statements that are used in stored procedures, functions, and triggers from Teradata to BigQuery Scripting, procedures, or user-defined functions (UDFs). All of these are available for system administrators to check using the INFORMATION_SCHEMA views.

CREATE PROCEDURE statement

Stored procedures are supported as part of BigQuery Scripting.

In BigQuery, Scripting refers to any use of control statements, whereas procedures are named scripts (with arguments if needed) that can be called from other Scripts and stored permanently, if needed. A user-defined function (UDF) can also be written in JavaScript.

Teradata BigQuery
CREATE PROCEDURE CREATE PROCEDURE if a name is required, otherwise use inline with BEGIN or in a single line with CREATE TEMP FUNCTION.
REPLACE PROCEDURE CREATE OR REPLACE PROCEDURE
CALL CALL

The sections that follow describe ways to convert existing Teradata procedural statements to BigQuery Scripting statements that have similar functionality.

Variable declaration and assignment

BigQuery variables are valid during the lifetime of the script.

Teradata BigQuery
DECLARE DECLARE
SET SET

Error condition handlers

Teradata uses handlers on status codes in procedures for error control. In BigQuery, error handling is a core feature of the main control flow, similar to what other languages provide with TRY ... CATCH blocks.

Teradata BigQuery
DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN ... EXCEPTION WHEN ERROR THEN
SIGNAL sqlstate RAISE message
DECLARE CONTINUE HANDLER FOR SQLSTATE VALUE 23505; Exception handlers that trigger for certain error conditions are not used by BigQuery.

We recommend using ASSERT statements where exit conditions are used for pre-checks or debugging, because this is ANSI SQL:2011 compliant.

The SQLSTATE variable in Teradata is similar to the @@error system variable in BigQuery. In BigQuery, it is more common to investigate errors using audit logging or the INFORMATION_SCHEMA views.

Cursor declarations and operations

Because BigQuery doesn't support cursors or sessions, the following statements aren't used in BigQuery:

Dynamic SQL statements

The Scripting feature in BigQuery supports dynamic SQL statements like those shown in the following table.

Teradata BigQuery
EXECUTE IMMEDIATE sql_str; EXECUTE IMMEDIATE sql_str;
EXECUTE stmt_id [USING var,...]; EXECUTE IMMEDIATE stmt_id USING var;

The following Dynamic SQL statements are not used in BigQuery:

Flow-of-control statements

The Scripting feature in BigQuery supports flow-of-control statements like those shown in the following table.

Teradata BigQuery
IF condition THEN stmts ELSE stmts END IF IF condition THEN stmts ELSE stmts END IF
label: LOOP stmts END LOOP label; GOTO-style block constructs are not used in BigQuery.

We recommend rewriting them as user-defined functions (UDFs) or use ASSERT statements where they are used for error handling.
REPEAT stmts UNTIL condition END REPEAT; WHILE condition DO stmts END WHILE
LEAVE outer_proc_label; LEAVE is not used for GOTO-style blocks; it is used as a synonym for BREAK to leave a WHILE loop.
LEAVE label; LEAVE is not used for GOTO-style blocks; it is used as a synonym for BREAK to leave a WHILE loop.
WITH RECURSIVE temp_table AS ( ... ); Recursive queries (also known as recursive common table expressions (CTE)) are not used in BigQuery. They can be rewritten using arrays of UNION ALL.

The following flow-of-control statements are not used in BigQuery because BigQuery doesn't use cursors or sessions:

Metadata and transaction SQL statements

Teradata BigQuery
HELP TABLE table_name;
HELP VIEW view_name;
SELECT
 * EXCEPT(is_generated, generation_expression, is_stored, is_updatable)
FROM
 mydataset.INFORMATION_SCHEMA.COLUMNS;
WHERE
 table_name=table_name


The same query is valid to get column information for views.
For more information, see the Column view in the BigQuery INFORMATION_SCHEMA.
SELECT * FROM dbc.tables WHERE tablekind = 'T';

(Teradata DBC view)
SELECT
 * EXCEPT(is_typed)
FROM
mydataset.INFORMATION_SCHEMA.TABLES;


For more information, see Introduction to BigQuery INFORMATION_SCHEMA.
HELP STATISTICS table_name; APPROX_COUNT_DISTINCT(col)
COLLECT STATS USING SAMPLE ON table_name column (...); Not used in BigQuery.
LOCKING TABLE table_name FOR EXCLUSIVE; BigQuery always uses snapshot isolation. For details, see Consistency guarantees elsewhere in this document.
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL ... BigQuery always uses Snapshot Isolation. For details, see Consistency guarantees elsewhere in this document.
BEGIN TRANSACTION;
SELECT ...
END TRANSACTION;
BigQuery always uses Snapshot Isolation. For details, see Consistency guarantees elsewhere in this document.
EXPLAIN ... Not used in BigQuery.

Similar features are the query plan explanation in the BigQuery web UI and the slot allocation visible in the INFORMATION_SCHEMA views and in audit logging in Cloud Monitoring.

Multi-statement and multi-line SQL statements

Teradata supports transactions (sessions), and therefore supports statements separated by semicolons that are consistently executed together. BigQuery supports the same in its Scripting feature, but runs these statements as individual queries without rollback.

Error codes and messages

Teradata error codes and BigQuery error codes are different. Providing a REST API, BigQuery relies primarily on HTTP status codes plus detailed error messages.

If your application logic is currently catching the following errors, try to eliminate the source of the error, because BigQuery will not return the same error codes.

  • SQLSTATE = '02000'—"Row not found"
  • SQLSTATE = '21000'—"Cardinality violation (Unique Index)"
  • SQLSTATE = '22000'—"Data violation (Data Type)"
  • SQLSTATE = '23000'—"Constraint Violation"

In BigQuery, it would be more common to use the INFORMATION_SCHEMA views or audit logging to drill down into errors.

For information about how to handle errors in Scripting, see the sections that follow.

Consistency guarantees and transaction isolation

Both Teradata and BigQuery are atomic—that is, ACID-compliant on a per-mutation level across many rows. For example, a MERGE operation is completely atomic, even with multiple inserted and updated values.

Transactions

Teradata provides either Read Uncommitted (allowing dirty reads) or Serializable transaction isolation level when running in session mode (instead of auto-commit mode). In the best case, Teradata achieves strictly serializable isolation by using pessimistic locking against a row hash across all columns of rows across all partitions. Deadlocks are possible. DDL always forces a transaction boundary. Teradata Fastload jobs run independently, but only on empty tables.

BigQuery helps ensure optimistic concurrency control (first to commit wins) with snapshot isolation, in which a query reads the last committed data before the query starts. This approach guarantees the same level of consistency on a per-row, per-mutation basis and across rows within the same DML statement, yet avoids deadlocks. In the case of multiple UPDATE statements against the same table, BigQuery switches to pessimistic concurrency control and queues multiple UPDATE statements, automatically retrying in case of conflicts. INSERT DML statements and load jobs can run concurrently and independently to append to tables. However, BigQuery does not provide an explicit, manual transaction boundary or session.

Rollback

Teradata supports two session rollback modes, ANSI session mode and Teradata session mode (SET SESSION CHARACTERISTICS and SET SESSION TRANSACTION), depending on which rollback mode you want. In failure cases, the transaction might not be rolled back.

The Scripting feature in BigQuery allows serial, dependent execution of statements, but will not roll back earlier updates if later ones fail.

There is no ABORT or ROLLBACK statement in BigQuery.

Because there is no explicit transaction boundary in BigQuery yet, there is also no concept of an explicit rollback in BigQuery. The recommended approach is to use table decorators or the FOR SYSTEM_TIME AS OF clause ("time travel" features) in BigQuery. This way, you can explicitly define a discrete snapshot before a batch run that includes DDL changes. Time travel is always enabled in BigQuery at no extra cost, so there is no need to specify FALLBACK explicitly or optimize spooling or journaling.

Database limits

Always check the BigQuery public documentation for the latest quotas and limits. Many quotas for large-volume users can be raised by contacting the Cloud Support team. The following table shows a comparison of the Teradata and BigQuery database limits.

Limit Teradata BigQuery
Tables per database Unrestricted Unrestricted
Columns per table 2,048 10,000
Maximum row size 1 MB 100 MB
Column and table name length 128 Unicode chars 16,384 Unicode characters
Rows per table Unlimited Unlimited
Maximum SQL request length 1 MB 1 MB (maximum unresolved standard SQL query length)
12 MB (maximum resolved legacy and standard SQL query length)

Streaming:
  • 10 MB (HTTP request size limit)
  • 10,000 (maximum rows per request)
Maximum request and response size 7 MB (request), 16 MB (response) 10 MB (request) and 10 GB (response), or virtually unlimited if you use pagination or the Cloud Storage API.
Maximum number of concurrent sessions 120 per parsing engine (PE) 100 concurrent queries (can be raised with a slot reservation), 300 concurrent API requests per user.
Maximum number of concurrent (fast) loads 30 (default 5) No concurrency limit; jobs are queued. 100,000 load jobs per project per day.