Last updated: September 2020
This document is part of a series that discusses migrating your schema and data from Teradata to BigQuery. This document provides an overview of translating SQL statements that are written for Teradata to be compatible with BigQuery.
The series that discusses specifics of transitioning from Teradata consists of the following parts:
- Schema and data transfer quickstart
- Query translation overview (this document)
- Query translation quickstart
- SQL translation reference
For an overview of transitioning from an on-premises data warehouse to BigQuery on Google Cloud, see the series that begins with Migrating data warehouses to BigQuery: Introduction and overview
In contrast, BigQuery does not support these proprietary extensions. Therefore, some of your queries might need to be refactored during migration from Teradata to BigQuery. Having queries that only use the ANSI/ISO SQL standard that's supported by BigQuery has the added benefit that it helps ensure portability and helps your queries be agnostic to the underlying data warehouse.
This document addresses some of the challenges you might encounter while migrating SQL queries from Teradata to BigQuery. It explains when these translations should be applied in the context of an end-to-end staged migration.
Teradata SQL differences
This section briefly discusses notable differences between Teradata SQL and the BigQuery standard SQL, and some strategies for translating between the two dialects. The list of differences presented in this document is not exhaustive. For additional information, see the Teradata-to-BigQuery SQL translation reference.
Data Definition Language
Data Definition Language
(DDL) is used to define your database schema. It includes a subset of SQL
statements such as
For the most part, these statements are equivalent between Teradata SQL and standard SQL. Here is a non-exhaustive list of notable exceptions:
- Index manipulation options are not supported in
BigQuery, such as
PRIMARY INDEX. BigQuery does not use indexes when querying your data. It produces fast results thanks to its underlying model using Dremel, its storage techniques using Capacitor, and its massively parallel architecture.
which are checks applied to individual columns or an entire table.
BigQuery supports only
MULTISET, which is used to allow duplicate rows in Teradata.
CASESPECIFIC, which specifies case for character data comparisons and collations.
BigQuery supports a more concise set of data types than Teradata, with groups of Teradata types mapping into a single standard SQL data type. For instance:
BIGINTall map to
UDTand other types that contain large character fields map to
VARBYTEtypes that contain binary information map to
For dates, the main types (
TIMESTAMP) are equivalent in
Teradata and BigQuery. However, other specialized date types from
Teradata need to be mapped, such as the following:
INTERVAL_MINUTE, and other
INTERVAL_*types map to
PERIOD(TIME), and other
PERIOD(*)types map to
The SELECT statement
The syntax of the
SELECT statement is generally compatible between Teradata and
BigQuery. This section notes differences that often must be
addressed during migration.
BigQuery lets you use the following as identifiers: projects;datasets; tables or views; columns.
As a serverless product, BigQuery does not have a concept of a cluster or environment or fixed endpoint, therefore the project specifies the dataset's resource hierarchy.
SELECT statement in Teradata, fully qualified column names can be used.
BigQuery always references column names from tables or aliases,
and never from projects or datasets.
For example, here are some options to address identifiers in BigQuery.
Columns implicitly inferred from the table:
SELECT c FROM project.dataset.table
Or by using an explicit table reference:
SELECT table.c FROM project.dataset.table
Or by using an explicit table alias:
SELECT t.c FROM project.dataset.table t
SELECT statement in Teradata, aliases can be defined and referenced
within the same query. For instance, in the following snippet,
flag is defined
as a column alias, and then immediately referred to in the enclosed
SELECT F AS flag, CASE WHEN flag = 1 THEN ...
In standard SQL, references between columns within the same query are not allowed. To translate, you move the logic into a nested query:
SELECT q.*, CASE WHEN q.flag = 1 THEN ... FROM ( SELECT F AS flag, ... ) AS q
The sample placeholder
F could itself be a nested query that returns a single
Filtering with LIKE
In Teradata, the
LIKE ANY operator is used to filter the results to a given
set of possible options. For example:
SELECT* FROM t1 WHERE a LIKE ANY ('string1', 'string2')
To translate statements that have this operator to standard SQL, you can split
the list after
ANY into several
SELECT* FROM t1 WHERE a LIKE 'string1' OR a LIKE 'string2'
The QUALIFY clause
clause is a conditional clause in the
SELECT statement that filters results of
a previously computed, ordered
according to user‑specified search conditions. Its syntax consists of the
QUALIFY clause followed by the analytic function, such as
and the values you want to find:
SELECT a, b FROM t1 QUALIFY ROW_NUMBER() OVER (PARTITION BY a ORDER BY b) = 1
Teradata customers commonly use this function as a shorthand way to rank and return results without the need for an additional subquery.
QUALIFY clause is translated to BigQuery by adding a
WHERE condition to an enclosing query:
SELECT a, b FROM ( SELECT a, b, ROW_NUMBER() OVER (PARTITION BY A ORDER BY B) row_num FROM t1 ) WHERE row_num = 1
For an example, refer to the associated quickstart.
Data Manipulation Language
Data Manipulation Language (DML)
is used to list, add, delete, and modify data in a database. It includes the
While the basic forms of these statements are the same between Teradata SQL and standard SQL, Teradata includes additional, non-standard clauses and special statement constructs that you need to convert when you migrate. The following sections present a non-exhaustive list of the most common statements, the main differences, and the recommended translations.
The INSERT statement
BigQuery is an enterprise data warehouse that focuses on Online
Analytical Processing (OLAP). Using point-specific DML statements, such as
executing a script with many
INSERT statements, is an attempt to treat
BigQuery like an Online Transaction Processing (OLTP) system,
which is not a correct approach.
BigQuery DML statements are intended for bulk updates, therefore each DML statement that modifies data initiates an implicit transaction. You should group your DML statements whenever possible to avoid unnecessary transaction overhead.
As an example, if you have the following set of statements from Teradata, running them as is in BigQuery is an anti-pattern:
INSERT INTO t1 (...) VALUES (...); INSERT INTO t1 (...) VALUES (...);
You can translate the previous script into a single
INSERT statement, which
performs a bulk operation instead:
INSERT INTO t1 VALUES (...), (...)
A typical scenario where a large number of
INSERT statements is used is when
you create a new table from an existing table. In BigQuery,
instead of using multiple
INSERT statements, create a new table and insert all
the rows in one operation using the
CREATE TABLE ... AS SELECT
The UPDATE statement
UPDATE statements in Teradata are similar to
UPDATE statements in standard
SQL. The important differences are:
- The order of the
FROMclauses is reversed.
Teradata correlation names
used as table aliases in the
UPDATEmust be removed.
- In Standard SQL, each
UPDATEstatement must include the
WHEREkeyword, followed by a condition. To update all rows in the table, use
The following example shows an
UPDATE statement from Teradata that uses
UPDATE t1 FROM t1, t2 SET b = t2.b WHERE a = t2.a;
The equivalent statement in standard SQL is the following:
UPDATE t1 SET b = t2.b FROM t2 WHERE a = t2.a;
The considerations from the previous section about executing large numbers of
DML statements in BigQuery also apply in this case. We recommend
using a single
statement instead of multiple
The DELETE statement
Standard SQL requires
DELETE statements to have a
WHERE clause. In
WHERE clauses are
if you're deleting all the rows in a table. (If specific rows are being deleted,
the Teradata DML also requires a
WHERE clause.) During translation, any
WHERE clauses must be added to scripts. This change is necessary only
when all the rows in a table will be deleted.
For instance, the following statement in Teradata SQL deletes all the rows from
a table. The
ALL clause is optional:
DELETE t1 ALL;
The translation into standard SQL is as follows:
DELETE FROM t1 WHERE TRUE;
Stored procedures in Teradata are a combination of SQL and control statements. Stored procedures can take parameters that let you build a customized interface to the Teradata Database.
Stored procedures are supported as part of BigQuery Scripting.
However, there are some cases where other features might be more appropriate. These alternatives depend on how your stored procedures are being used. For example:
- Replace triggers that are used to run periodic queries with scheduled queries.
- Replace stored procedures that control the complex execution of queries and their interdependencies with workflows defined in Cloud Composer.
- Refactor stored procedures that are used as an API into your data warehouse with parameterized queries and using the BigQuery API. This change implies that you must rebuild the logic from the stored procedure in a different programming language such as Java or Go, and that you then call SQL queries with parameters from the code.
Refactoring and replacing business logic that lives in stored procedures is a delicate task that requires expertise in both your data layer and in architecture best practices for the target platform. Depending on the complexity of your migration, you might choose to contract the services of our specialized partners.
Translating during migration
As part of your migration, you must examine Teradata SQL statements and scripts and determine whether you need to translate Teradata SQL statements into standard SQL as used in BigQuery. As with the overall recommendation for using iterative migration, we recommend approaching this task in a systematic way.
Choosing a use case
We previously defined a use case as all the datasets, data processing, and system and user interactions required in order to achieve business value. A use case includes a group of tables in the data warehouse, the upstream processes that supply data into those tables, and the downstream processes that consume data from those tables, as shown in the following diagram:
Some examples of upstream processes, also called upstream data pipelines, are feeds from data lakes, OLTP systems, a CRM, and logging applications. Some examples of downstream processes are dashboards, reports, feeds to other systems, business applications, and ad hoc queries used by analysts.
When you're choosing use cases for migration, choose use cases where the dominant part of the downstream processes are internal reports or well-defined data outputs, such as feeds or APIs. Choosing these types of use cases in the initial migration iterations has several benefits:
- They help familiarize your staff with the required translations, so that you can estimate the effort required for subsequent iterations.
- They make it easier to set up automated testing for data accuracy, because their outputs can be compared using scripts.
- They enable on-the fly visual comparisons, which are easier to present to non-technical stakeholders than raw data outputs.
To migrate a use case from Teradata to BigQuery, follow the recommendations in the schema and data transfer overview document. Whenever you move a table to BigQuery and when either downstream or upstream processes need to be modified, you also need to perform an evaluation of the queries and stored procedures involved in the change to determine whether you must translate those queries and procedures.
If translations are necessary, follow the guidelines in the Teradata SQL differences section and in the Teradata-to-BigQuery SQL translation reference to create new queries using standard SQL that conforms to ISO SQL:2011.
With the translated queries, test the downstream or upstream process in a controlled environment, following your company's best practices for testing and continuous deployment. You can also take a look at how to create software release pipelines in Google Cloud.
We recommend that you create different assets to help with the translation, starting in the early iterations of your data warehouse migration. For example, consider developing software libraries that implement common query translations where possible that are adapted to your use cases. This library will facilitate subsequent iterations. You can also use these libraries along with documentation guides for training your staff on becoming used to standard SQL, and for understanding the best options to use in BigQuery for Teradata syntax that isn't used in BigQuery.
- Continue to the next part of this series: Query translation quickstart.
- Take a look at Google's Professional Service Organization offerings and our extensive partner ecosystem, which consists of companies that can support you along your migration path.
- Read about best practices to optimize query performance.
- Read about migrating data pipelines to BigQuery.
- Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.