Batch SQL Translator

This document describes how to use the batch SQL translator in BigQuery to translate Teradata SQL to BigQuery Standard SQL. This document is intended for users who are familiar with the Google Cloud Console.

The batch SQL translator is part of the BigQuery Migration Service. You can use it to translate a batch of other SQL dialects to BigQuery Standard SQL. You can also translate SQL scripting languages such as Basic Teradata Query (BTEQ).

The following SQL dialects are supported:

  • Teradata SQL
  • Basic Teradata Query (BTEQ)

Required permissions

You must have the following permissions on the project to enable the BigQuery Migration Service:

  • resourcemanager.projects.get
  • serviceusage.services.enable
  • serviceusage.services.get

You need the following permissions on the project to access and use the BigQuery Migration Service:

  • bigquerymigration.workflows.create
  • bigquerymigration.workflows.get
  • bigquerymigration.workflows.list
  • bigquerymigration.workflows.delete
  • bigquerymigration.subtasks.get
  • bigquerymigration.subtasks.list

    Alternatively, you can use the following roles to get the same permissions:

    • bigquerymigration.viewer - Read only access.
    • bigquerymigration.editor - Read/write access.

To access the Cloud Storage buckets for input and output files:

  • storage.objects.get on the source Cloud Storage bucket.
  • storage.objects.list on the source Cloud Storage bucket.
  • storage.objects.create on the destination Cloud Storage bucket.

You can have all the above necessary Cloud Storage permissions from the following roles:

  • roles/storage.objectAdmin
  • roles/storage.admin

Before you begin

Before you can submit a translation job, you must enable the API and upload the source files to Cloud Storage.

Enable the Migration API

Enable the BigQuery Migration API as follows:

  1. In the Cloud Console, go to the BigQuery Migration API page.

    Go to BigQuery Migration API

  2. Click Enable.

Upload input files to Cloud Storage

Before you start a translation job, you must upload source files with the queries and scripts you want to translate. For information about what a source file can contain, see the Source File Format section. For information on how to create buckets and upload files to Cloud Storage, see Creating storage buckets and Uploading Objects.

Translate from Teradata SQL

Follow these steps to start a translation, view its progress, and finally see the results. These steps assumes you have source files uploaded into a Cloud Storage bucket already.

Console

  1. In the Cloud Console, go to the BigQuery page.

    Go to BigQuery

  2. In the navigation panel, go to SQL translation.

  3. Click Start Translation.

  4. Fill in the translation configuration dialog.

    1. Name can contain letters, numbers or underscores.
    2. Choose a location for the translation job. For a most efficient run, choose the same location as your source file bucket.
    3. Choose Teradata SQL for translation mode.
  5. Click Continue.

  6. Choose a source path. You can type in the path or use the Browse option. You can also exclude certain files in your bucket by adding prefix filters.

  7. Click Continue to move to the next step.

  8. Choose a destination path. Similarly to the source path, you can type in the path or use the Browse option.

  9. If you're doing simple translations, you can click Create to start the translation job. For information about using the optional settings, see the Optional settings section.

  10. You can see the status of the job in the translation jobs list.

    Translation status.

  11. Once the translation is complete, click Show details to see the results of the translation job.

API

Call the create method with a defined workflow.

Then call the start method to start the translation workflow.

Additional settings for BTEQ translation

The following lists the extra fields that are required for BTEQ translation. The BTEQ options are set in the BTEQ settings tab, after setting the destination path.

Field Required Description
Dataset ID Required This dataset ID is used as the dataset prefix for tables referenced in the script.

For example an insert statement that inserts new data into table1 is translated to datasetId.table1.

Default path URI Required If specified, this path is used as the default path in .IMPORT and .EXPORT statements.
Path Replacement Not Required This field can be added multiple times. When added, you specify a file path for any files mentioned in the BTEQ script. When translating, the file is replaced with the path provided.

For example, if you provide the following pairs of file and path:

  • ./file1.csv
  • gs://bucket/folder1/file1*.csv
Then all mentions of file1.csv will be replaced with the Cloud Storage path provided.

Optional Settings

The settings in this tab are for advanced translation settings.

Field Description
Schema path This is a Cloud Storage path that contains data definition language (DDL) statements for table schemas. Providing table schemas can improve translation results.
File encoding

This field specifies the encoding for the input file. The default is `utf-8`. The following encodings are available:

  • UTF_8
  • ISO_8859_1
  • US_ASCII
  • UTF_16
  • UTF_16LE
  • UTF_16BE

Output SQL identifier cases

Output casing for Table/Column names. The default is to keep the original casing. The following are the options for this field:

  • Default
  • Keep original
  • Upper case
  • Lower case

Special token map If the translation input files contain SQL templates using special tokens as placeholders, use this section to provide a map from the special tokens to their data type.

The following is an example special token pair:

  • Special token: <b>
  • Data type: NUMERIC
With this token, the following Teradata SQL statement:
select trim(<b>) from x;
translates to the following BigQuery Standard SQL:
SELECT TRIM(CAST(<b> AS STRING)) FROM x;

Explore the output files

After running the translation, you can see the results and errors in the destination path that you specified. The batch SQL translator outputs the following files into the destination path:

  • Translated file
  • Error summary files

Translated files

For each input file, a corresponding output file is generated in the destination path. The output file contains the translated queries and errors.

Error summary file

The error summary file is a CSV file that contains a table of all the errors encountered during the translation. Each input file has a corresponding error summary file. The summary files are named <source_file_name>_errors.csv.

The following is an example of an error summary file:

Input_Query Line,Column Category Message Type
"selct * from abc" 5,1 SqlParseException Non-query expression encountered in illegal context ERROR
"RENAM TBLE DB.table1 to DB.table2;" 1,1 SqlParseException Non-query expression encountered in illegal context ERROR

Source File Format

The source files in your Cloud Storage bucket should be valid Teradata SQL or BTEQ script files.

Source files can contain the following:

  • Valid SQL statements terminated by semicolon in a single line or across multiple lines.
  • Each stored procedure (SP) must be in their own file separate from other SPs and other SQL statements.
  • Valid supported BTEQ statements and valid multi-statement SQL requests if BTEQ mode is chosen.
  • Comments annotated by /*, */, -- or * if translating BTEQ.
  • Special placeholder tokens which are not applicable for translation. You must specify these tokens in a config file. See Optional settings for more information.

Source files cannot contain the following: + Unsupported BTEQ statements, for more information see Unsupported BTEQ statements.

Supported BTEQ Statements

The following statements are supported in BTEQ translations.

  • .ACTIVITYCOUNT
  • .ERRORCODE
  • .EXIT/.QUIT
  • .EXPORT
  • .GOTO/.LABEL
  • .IF,.ELSEIF, .ELSE, .ENDIF
  • .IMPORT
  • .USING
  • .REPEAT

Unsupported BTEQ Statements

The following statements are not supported.

  • .MAXERROR
  • .NULL
  • .REPEATSTOP
  • .RETRY
  • .RUN

Limitations

No Schema Mapping Support

The SQL translator doesn't translate schema changes between the source data warehouse and BigQuery. Any references to column or table names that are changed in the schema will not be reflected in the input queries during translation. For example, if a column named "foo" is renamed to "bar" during migration, translation won't automatically update queries that reference "foo" to reference "bar". To update these references, perform a post processing step to replace those SQL identifiers in the translated output queries.

What's next