Migrate code with the batch SQL translator

This document describes how to use the batch SQL translator in BigQuery to translate scripts written in other SQL dialects into GoogleSQL queries. This document is intended for users who are familiar with the Google Cloud console.

The batch SQL translator is part of the BigQuery Migration Service. The batch SQL translator can translate the following SQL dialects into GoogleSQL:

  • Amazon Redshift SQL
  • Apache HiveQL and Beeline CLI
  • IBM Netezza SQL and NZPLSQL
  • Teradata and Teradata Vantage
    • SQL
    • Basic Teradata Query (BTEQ)
    • Teradata Parallel Transport (TPT)

Additionally, translation of the following SQL dialects is supported in preview:

  • Apache Spark SQL
  • Azure Synapse T-SQL
  • MySQL SQL
  • Oracle SQL, PL/SQL, Exadata
  • PostgreSQL SQL
  • Trino or PrestoSQL
  • Snowflake SQL
  • SQL Server T-SQL
  • SQLite
  • Vertica SQL

Required permissions

You must have the following permissions on the project to enable the BigQuery Migration Service:

  • resourcemanager.projects.get
  • serviceusage.services.enable
  • serviceusage.services.get

You need the following permissions on the project to access and use the BigQuery Migration Service:

  • bigquerymigration.workflows.create
  • bigquerymigration.workflows.get
  • bigquerymigration.workflows.list
  • bigquerymigration.workflows.delete
  • bigquerymigration.subtasks.get
  • bigquerymigration.subtasks.list

    Alternatively, you can use the following roles to get the same permissions:

    • bigquerymigration.viewer - Read only access.
    • bigquerymigration.editor - Read/write access.

To access the Cloud Storage buckets for input and output files:

  • storage.objects.get on the source Cloud Storage bucket.
  • storage.objects.list on the source Cloud Storage bucket.
  • storage.objects.create on the destination Cloud Storage bucket.

You can have all the above necessary Cloud Storage permissions from the following roles:

  • roles/storage.objectAdmin
  • roles/storage.admin

Locations

The batch SQL translator is available in the following processing locations:

  • us (US multi-region)
  • eu (EU multi-region)
  • southamerica-east1 (São Paulo)
  • us-central1 (Iowa)
  • asia-northeast1 (Tokyo)
  • asia-south1 (Mumbai)
  • asia-southeast1 (Singapore)
  • australia-southeast1 (Sydney)
  • europe-central2 (Warsaw)
  • europe-north1 (Finland)
  • europe-west1 (Belgium)
  • europe-west2 (London)
  • europe-west3 (Frankfurt)

Before you begin

Before you submit a translation job, complete the following steps:

  1. Enable the BigQuery Migration API.
  2. Collect the source files containing the SQL scripts and queries to be translated.
  3. Optional. Create a metadata file to improve the accuracy of the translation.
  4. Optional. Decide if you need to map SQL object names in the source files to new names in BigQuery. Determine what name mapping rules to use if this is necessary.
  5. Decide what method to use to submit the translation job.
  6. Upload the source files to Cloud Storage.

Enable the BigQuery Migration API

If your Google Cloud CLI project was created before February 15, 2022, enable the BigQuery Migration API as follows:

  1. In the Google Cloud console, go to the BigQuery Migration API page.

    Go to BigQuery Migration API

  2. Click Enable.

Collect source files

Source files must be text files that contain valid SQL for the source dialect. Source files can also include comments. Do your best to ensure the SQL is valid, using whatever methods are available to you.

Create metadata files

To help the service generate more accurate translation results, we recommend that you provide metadata files. However, this isn't mandatory.

You can use the dwh-migration-dumper command-line extraction tool to generate the metadata information, or you can provide your own metadata files. Once metadata files are prepared, you can include them along with the source files in the translation source folder. The translator automatically detects them and leverages them to translate source files, you don't need to configure any extra settings to enable this.

To generate metadata information by using the dwh-migration-dumper tool, see Generate metadata for translation.

To provide your own metadata, collect the data definition language (DDL) statements for the SQL objects in your source system into separate text files.

Map SQL object names

You can optionally perform output name mapping during batch translation. When you use output name mapping, you specify name mapping rules that change the names of SQL objects in the source system to new names in BigQuery. For example, you might have the object schema1.table1 in your source system, and you want that object to be named project1.dataset1.table1 in BigQuery. If you use output name mapping, you should define your name mapping rules prior to starting a batch translation job. You can enter these rules manually when configuring the job, or you can create a JSON file containing the name mapping rules and upload that instead.

Decide how to submit the translation job

You have three options for submitting a batch translation job:

  • Batch translation client: Configure a job by changing settings in a configuration file, and submit the job using the command line. This approach doesn't require you to manually upload source files to Cloud Storage. The client still uses Cloud Storage to store files during translation job processing.

    The batch translation client is an open-source Python client that lets you to translate source files located on your local machine, and have the translated files output to a local directory. You configure the client for basic use by changing a few settings in its configuration file. If you choose to, you can also configure the client to address more complex tasks like macro replacement, and pre- and postprocessing of translation inputs and outputs. For more information, see the batch translation client readme.

  • Google Cloud console: Configure and submit a job using a user interface. This approach requires you to upload source files to Cloud Storage.

  • BigQuery Migration API: Configure and submit a job programmatically. This approach requires you to upload source files to Cloud Storage.

Create configuration YAML files

You can optionally create and use configuration configuration YAML files to customize your batch translations. These files can be used to transform your translation output in various ways. For example, you can create a configuration YAML file to change the case of a SQL object during translation.

If you want to use the Google Cloud console or the BigQuery Migration API for a batch translation job, you can upload the configuration YAML file to the Cloud Storage bucket containing the source files.

If you want to use the batch translation client, you can place the configuration YAML file in the local translation input folder.

Upload input files to Cloud Storage

If you want to use the Google Cloud console or the BigQuery Migration API to perform a translation job, you must upload the source files containing the queries and scripts you want to translate to Cloud Storage. You can also upload any metadata files or configuration YAML files to the same Cloud Storage bucket containing the source files. For more information about creating buckets and uploading files to Cloud Storage, see Create buckets and Upload objects from a filesystem.

Submit a translation job

Follow these steps to start a translation job, view its progress, and see the results.

Batch translation client

  1. Install the batch translation client and the Google Cloud CLI.

  2. Generate a gcloud CLI credential file.

  3. In the batch translation client installation directory, use the text editor of your choice to open the config.yaml file and modify the following settings:

    • project_number: Type the project number of the project you want to use for the batch translation job. You can find this in the Project info pane on the Google Cloud console welcome page for the project.
    • gcs_bucket: Type the name of the Cloud Storage bucket that the batch translation client should use to store files during translation job processing.
    • input_directory: Type the absolute or relative path to the directory containing the source files and any metadata files.
    • output_directory: Type the absolute or relative path to the target directory for the translated files.
  4. Save the changes and close the config.yaml file.

  5. Place your source and metadata files in the input directory.

  6. Run the batch translation client using the following command:

    bin/dwh-migration-client
    

    Once the translation job is created, you can see its status in the translation jobs list in the Google Cloud console.

  7. Optional. Once the translation job is completed, delete the files that the job created in the Cloud Storage bucket you specified, in order to avoid storage costs.

Console

These steps assume you have source files uploaded into a Cloud Storage bucket already.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. Click Start Translation.

  4. Fill in the translation configuration dialog.

    1. For Display name, type a name for the translation job. The name can contain letters, numbers or underscores.
    2. For Processing location, select the location where you want the translation job to run. For example, if you are in Europe and you don't want your data to cross any location boundaries, select the eu region. The translation job performs best when you choose the same location as your source file bucket.
    3. For Source dialect, select the SQL dialect that you want to translate.
    4. For Target dialect, select BigQuery.
  5. Click Next.

  6. For Source location, specify the path to the Cloud Storage folder containing the files to translate. You can type the path in the format bucket_name/folder_name/ or use the Browse option.

  7. Click Next.

  8. For Target location, specify the path to the destination Cloud Storage folder for the translated files. You can type the path in the format bucket_name/folder_name/ or use the Browse option.

  9. If you're doing translations that don't need to have default object names or source-to-target name mapping specified, skip to Step 11. Otherwise, click Next.

  10. Fill in the optional settings that you need.

    1. Optional. For Default database, type a default database name to use with the source files. The translator uses this default database name to resolve SQL objects' fully qualified names where the database name is missing.

    2. Optional. For Schema search path, specify a schema to search when the translator needs to resolve SQL objects' fully qualified names in the source files where the schema name is missing. If the source files use a number of different schema names, click Add Schema Name and add a value for each schema name that might be referenced.

      The translator searches through the metadata files you provided to validate tables with their schema names. If a definite option can't be determined from the metadata, the first schema name you enter is used as the default. For more information on how the default schema name is used, see default schema.

    3. Optional. If you want to specify name mapping rules to rename SQL objects between the source system and BigQuery during translation, you can either provide a JSON file with the name mapping pair, or you can use the Google Cloud console to specify the values to map.

      To use a JSON file:

      1. Click Upload JSON file for name mapping.
      2. Browse to the location of a name mapping file in the appropriate format, select it, and click Open.

        Note that the file size must be less than 5 MB.

      To use the Google Cloud console:

      1. Click Add name mapping pair.
      2. Add the appropriate parts of the source object name in the Database, Schema, Relationship, and Attribute fields in the Source column.
      3. Add the parts of the target object name in BigQuery in the fields in the Target column.
      4. For Type, select the object type that describes the object you are mapping.
      5. Repeat Steps 1 - 4 until you have specified all of the name mapping pairs that you need. Note that you can only specify up to 25 name mapping pairs when using the Google Cloud console.
  11. Click Create to start the translation job.

Once the translation job is created, you can see its status in the translation jobs list.

API

To the projects.locations.workflows.create method, supply an instance of the MigrationWorkflow resource with a supported task type.

Supported task types

  • Amazon Redshift SQL - Redshift2BigQuery_Translation
  • Apache HiveQL and Beeline CLI - HiveQL2BigQuery_Translation
  • IBM Netezza SQL and NZPLSQL - Netezza2BigQuery_Translation
  • Teradata and Teradata Vantage - Teradata2BigQuery_Translation
  • Apache Spark SQL - SparkSQL2BigQuery_Translation
  • Azure Synapse T-SQL - AzureSynapse2BigQuery_Translation
  • MySQL SQL - MySQL2BigQuery_Translation
  • Oracle SQL, PL/SQL, Exadata - Oracle2BigQuery_Translation
  • PostgreSQL SQL - Postgresql2BigQuery_Translation
  • Presto or Trino SQL - Presto2BigQuery_Translation
  • Snowflake SQL - Snowflake2BigQuery_Translation
  • SQL Server T-SQL - SQLServer2BigQuery_Translation
  • Vertica SQL - Vertica2BigQuery_Translation

Limitation

The consolidated translation API only supports translation jobs that complete in less than 1.5 hours.

Required permissions

To get the permissions that you need to create translation jobs using the consolidated translation API, ask your administrator to grant you the MigrationWorkflow Editor (roles/bigquerymigration.editor) IAM role on the parent resource. For more information about granting roles, see Manage access.

This predefined role contains the permissions required to create translation jobs using the consolidated translation API. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create translation jobs using the consolidated translation API:

  • bigquerymigration.workflows.create
  • bigquerymigration.workflows.get

You might also be able to get these permissions with custom roles or other predefined roles.

Example: Create a batch translation

The following curl command creates a batch translation job where the input and output files are stored in Cloud Storage. The source_target_mapping field contains a list that maps the source literal entries to an optional relative path for the target output.

curl -d "{
  \"tasks\": {
     string: {
        \"type\": \"TYPE\",
        \"translation_details\": {
           \"target_base_uri\": \"TARGET_BASE\",
           \"source_target_mapping\": {
              \"source_spec\": {
                 \"base_uri\": \"BASE\"
              }
           },
        }
     }
  }
  }" \
  -H "Content-Type:application/json" \
  -H "Authorization: Bearer TOKEN" -X POST https://bigquerymigration.googleapis.com/v2alpha/projects/PROJECT_ID/locations/LOCATION/workflows

Replace the following:

  • TYPE: the task type of the translation, which determines the source and target dialect.
  • TARGET_BASE: the base URI for all translation outputs.
  • BASE: the base URI for all files read as sources for translation.
  • TOKEN: the token for authentication. To generate a token, use the gcloud auth print-access-token command or the OAuth 2.0 playground (use the scope https://www.googleapis.com/auth/cloud-platform).
  • PROJECT_ID: the project to process the translation.
  • LOCATION: the location of the project to process the translation.

The preceding command returns a response that includes a workflow ID written in the format projects/PROJECT_ID/locations/LOCATION/workflows/WORKFLOW_ID. To view the translation results through the API, see Consolidated translation API results.

Example: Create a translation job with string literal inputs and outputs

The following curl command creates a translation job with string literal inputs and outputs. The source_target_mapping field contains a list that maps the source directories to an optional relative path for the target output.

curl -d "{
  \"tasks\": {
     string: {
        \"type\": \"TYPE\",
        \"translation_details\": {
        \"source_target_mapping\": {
           \"source_spec\": {
              \"literal\": {
              \"relative_path\": \"PATH\",
              \"literal_string\": \"STRING\"
              }
           }
        },
        \"target_return_literals\": \"TARGETS\",
        }
     }
  }
  }" \
  -H "Content-Type:application/json" \
  -H "Authorization: Bearer TOKEN" -X POST https://bigquerymigration.googleapis.com/v2alpha/projects/PROJECT_ID/locations/LOCATION/workflows

Replace the following:

  • TYPE: the task type of the translation, which determines the source and target dialect.
  • PATH: the identifier of the literal entry, similar to a filename or path.
  • STRING: string of literal input data (for example, SQL) to be translated.
  • TARGETS: the expected targets that the user wants to be directly returned in the response in the literal format. These should be in the target URI format (for example, GENERATED_DIR + target_spec.relative_path + source_spec.literal.relative_path). Anything not in this list is not returned in the response. The generated directory, GENERATED_DIR for general SQL translations is sql/.
  • TOKEN: the token for authentication. To generate a token, use the gcloud auth print-access-token command or the OAuth 2.0 playground (use the scope https://www.googleapis.com/auth/cloud-platform).
  • PROJECT_ID: the project to process the translation.
  • LOCATION: the location of the project to process the translation.

The preceding command returns a response that includes a workflow ID written in the format projects/PROJECT_ID/locations/LOCATION/workflows/WORKFLOW_ID. To view the translation results through the API, see Consolidated translation API results.

Explore the translation output

After running the translation job, you can see information about the job in the Google Cloud console. If you used the Google Cloud console to run the job, you can see job results in the destination Cloud Storage bucket that you specified. If you used the consolidated translation API to run the job, you can run another command with the workflow ID to retrieve the job results. If you used the batch translation client to run the job, you can see job results in the output directory that you specified. The batch SQL translator outputs the following files to the specified destination:

  • The translated files.
  • The translation summary report in CSV format.
  • The consumed output name mapping in JSON format.

Google Cloud console output

To see translation job details, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. In the list of translation jobs, locate the job for which you want to see the translation details. Then, click the translation job name.

  4. In the Results section, you can see the overall translation success rate, the number of statements processed, and the job duration.

  5. Select the Log Summary tab to see a summary of the translation issues, including issue categories, suggested actions, and how often each issue occurred. You can also select an issue category to see log messages associated with that issue category (Preview).

  6. Select the Log Messages tab to see more details about each translation issue, including the issue category, the specific issue message, and a link to the file in which the issue occurred. You can select an issue in the Log Message tab to open the Code tab that displays the input and output file if applicable (Preview).

  7. Select the Translation Configuration tab to see the translation job configuration details.

Consolidated translation API results

After the asynchronous translation is complete, retrieve the results by specifying the translation job workflow ID using the following command:

curl \
-H "Content-Type:application/json" \
-H "Authorization:Bearer TOKEN" -X GET https://bigquerymigration.googleapis.com/v2alpha/projects/PROJECT_ID/locations/LOCATION/workflows/WORKFLOW_ID

Replace the following:

  • TOKEN: the token for authentication. To generate a token, use the gcloud auth print-access-token command or the OAuth 2.0 playground (use the scope https://www.googleapis.com/auth/cloud-platform).
  • PROJECT_ID: the project to process the translation.
  • LOCATION: the location of the project to process the translation.
  • WORKFLOW_ID: the ID generated when you create a translation workflow.

Summary report

The summary report is a CSV file that contains a table of all of the warning and error messages encountered during the translation job.

To see the summary file in the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. In the list of translation jobs, locate the job that you are interested in, and click Show details in the Status column.

  4. In the Translation report section, click batch_translation_report.csv.

  5. On the Object details page, click the value in the Authenticated URL row to see the file in your browser.

The following table describes the summary file columns:

Column Description
Timestamp The timestamp at which the issue occurred.
FilePath The path to the source file that the issue is associated with.
FileName The name of the source file that the issue is associated with.
ScriptLine The line number where the issue occurred.
ScriptColumn The column number where the issue occurred.
TranspilerComponent The translation engine internal component where the warning or error occurred. This column might be empty.
Environment The translation dialect environment associated with the warning or error. This column might be empty.
ObjectName The SQL object in the source file that is associated with the warning or error. This column might be empty.
Severity The severity of the issue, either warning or error.
Category The translation issue category.
SourceType The source of this issue. The value in this column can either be SQL, indicating an issue in the input SQL files, or METADATA, indicating an issue in the metadata package.
Message The translation issue warning or error message.
ScriptContext The SQL snippet in the source file that is associated with the issue.
Action The action we recommend you take to resolve the issue.

Code tab

The code tab lets you review further information about the input and output files for a particular translation job. In the code tab, you can examine the files used in a translation job, review a side-by-side comparison of an input file and its translation for any inaccuracies, and view log summaries and messages for a specific file in a job.

To access the code tab, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. In the list of translation jobs, locate the job that you are interested in, then click Show details in the Status column.

  4. Select Code tab.

Consumed output name mapping file

This JSON file contains the output name mapping rules that were used by the translation job. The rules in this file might differ from the output name mapping rules that you specified for the translation job, due to either conflicts in the name mapping rules, or lack of name mapping rules for SQL objects that were identified during translation. Review this file to determine whether the name mapping rules need correction. If they do, create new output name mapping rules that address any issues you identify, and run a new translation job.

Translated files

For each source file, a corresponding output file is generated in the destination path. The output file contains the translated query.

Debug batch translated SQL queries with the interactive SQL translator

You can use the BigQuery interactive SQL translator to review or debug a SQL query using the same metadata or object mapping information as your source database. After you complete a batch translation job, BigQuery generates a translation configuration ID that contains information about the job's metadata, the object mapping, or the schema search path, as applicable to the query. You use the batch translation configuration ID with the interactive SQL translator to run SQL queries with the specified configuration.

To start an interactive SQL translation by using a batch translation configuration ID, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation menu, click SQL translation.

  3. In the list of translation jobs, locate the job that you are interested in, and then click More Options > Open Interactive Translation.

The BigQuery interactive SQL translator now opens with the corresponding batch translation configuration ID. To view the translation configuration ID for the interactive translation, click More > Translation settings in the interactive translator.

Limitations

The translator can't translate user-defined functions (UDFs) from languages other than SQL, because it can't parse them to determine their input and output data types. This causes translation of SQL statements that reference these UDFs to be inaccurate. To make sure non-SQL UDFs are properly referenced during translation, use valid SQL to create placeholder UDFs with the same signatures.

For example, say you have a UDF written in C that calculates the sum of two integers. To make sure that SQL statements that reference this UDF are correctly translated, create a placeholder SQL UDF that shares the same signature as the C UDF, as shown in the following example:

CREATE FUNCTION Test.MySum (a INT, b INT)
  RETURNS INT
  LANGUAGE SQL
  RETURN a + b;

Save this placeholder UDF in a text file, and include that file as one of the source files for the translation job. This enables the translator to learn the UDF definition and identify the expected input and output data types.

Quota and limits

  • BigQuery Migration API quotas apply.
  • Each project can have at most 10 active translation tasks.
  • While there is no hard limit on the total number of source and metadata files, we recommend keeping the number of files to under 1000 for better performance.

Troubleshoot translation errors

RelationNotFound or AttributeNotFound translation issues

Translation works best with metadata DDLs. When SQL object definitions cannot be found, the translation engine raises RelationNotFound or AttributeNotFound issues. We recommend using the metadata extractor to generate metadata packages to make sure all object definitions are present. Adding metadata is the recommended first step to resolve most translation errors, as it often can fix many other errors that are indirectly caused from a lack of metadata.

For more information, see Generate metadata for translation and assessment.

Pricing

There is no charge to use the batch SQL translator. However, storage used to store input and output files incurs the normal fees. For more information, see Storage pricing.

What's next

Learn more about the following steps in data warehouse migration: