Stay organized with collections Save and categorize content based on your preferences.

Migrate code with the batch SQL translator

This document describes how to use the batch SQL translator in BigQuery to translate scripts written in other SQL dialects into Google Standard SQL queries. This document is intended for users who are familiar with the Google Cloud console.

The batch SQL translator is part of the BigQuery Migration Service. The batch SQL translator can translate the following SQL dialects into Google Standard SQL:

  • Amazon Redshift SQL
  • Teradata SQL except for SPL

Additionally, translation of the following SQL dialects is supported in preview:

  • Apache HiveQL
  • Apache Spark SQL
  • Azure Synapse T-SQL
  • Basic Teradata Query (BTEQ)
  • IBM Netezza SQL/NZPLSQL
  • MySQL SQL
  • Oracle SQL, PL/SQL, Exadata
  • Presto SQL
  • PostgreSQL SQL
  • Snowflake SQL
  • SQL Server T-SQL
  • Teradata SPL
  • Vertica SQL

Required permissions

You must have the following permissions on the project to enable the BigQuery Migration Service:

  • resourcemanager.projects.get
  • serviceusage.services.enable
  • serviceusage.services.get

You need the following permissions on the project to access and use the BigQuery Migration Service:

  • bigquerymigration.workflows.create
  • bigquerymigration.workflows.get
  • bigquerymigration.workflows.list
  • bigquerymigration.workflows.delete
  • bigquerymigration.subtasks.get
  • bigquerymigration.subtasks.list

    Alternatively, you can use the following roles to get the same permissions:

    • bigquerymigration.viewer - Read only access.
    • bigquerymigration.editor - Read/write access.

To access the Cloud Storage buckets for input and output files:

  • storage.objects.get on the source Cloud Storage bucket.
  • storage.objects.list on the source Cloud Storage bucket.
  • storage.objects.create on the destination Cloud Storage bucket.

You can have all the above necessary Cloud Storage permissions from the following roles:

  • roles/storage.objectAdmin
  • roles/storage.admin

Before you begin

Before you submit a translation job, complete the following steps:

  1. Enable the BigQuery Migration API.
  2. Collect the source files containing the SQL scripts and queries to be translated.
  3. Optional. Create a metadata file to improve the accuracy of the translation.
  4. Optional. Decide if you need to map SQL object names in the source files to new names in BigQuery. Determine what name mapping rules to use if this is necessary.
  5. Decide what method to use to submit the translation job.
  6. Upload the source files to Cloud Storage.

Enable the BigQuery Migration API

If your Google Cloud CLI project was created before February 15, 2022, enable the BigQuery Migration API as follows:

  1. In the Google Cloud console, go to the BigQuery Migration API page.

    Go to BigQuery Migration API

  2. Click Enable.

Collect source files

Source files must be text files that contain valid SQL for the source dialect. Source files can also include comments. Do your best to ensure the SQL is valid, using whatever methods are available to you.

Create metadata files

To help the service generate more accurate translation results, we recommend that you provide metadata files. However, this isn't mandatory.

You can use the dwh-migration-dumper command-line tool to generate the metadata information, or you can provide your own metadata files. Once metadata files are prepared, you can include them along with the source files in the translation source folder. The translator automatically detects them and leverages them to translate source files, you don't need to configure any extra settings to enable this.

To generate metadata information by using the dwh-migration-dumper tool, see Generate metadata for translation.

To provide your own metadata, collect the data definition language (DDL) statements for the SQL objects in your source system into separate text files.

Map SQL object names

You can optionally perform output name mapping during batch translation. When you use output name mapping, you specify name mapping rules that change the names of SQL objects in the source system to new names in BigQuery. For example, you might have the object schema1.table1 in your source system, and you want that object to be named project1.dataset1.table1 in BigQuery. If you use output name mapping, you should define your name mapping rules prior to starting a batch translation job. You can enter these rules manually when configuring the job, or you can create a JSON file containing the name mapping rules and upload that instead.

Decide how to submit the translation job

You have three options for submitting a batch translation job:

  • Batch translation client: Configure a job by changing settings in a configuration file, and submit the job using the command line. This approach doesn't require you to manually upload source files to Cloud Storage. The client still uses Cloud Storage to store files during translation job processing.

    The batch translation client is an open-source Python client that allows you to translate source files located on your local machine, and have the translated files output to a local directory. You configure the client for basic use by changing a few settings in its configuration file. If you choose to, you can also configure the client to address more complex tasks like macro replacement, and pre- and postprocessing of translation inputs and outputs. For more information, see the batch translation client readme.

  • Google Cloud console: Configure and submit a job using a user interface. This approach requires you to upload source files to Cloud Storage.

  • BigQuery Migration API: Configure and submit a job programmatically. This approach requires you to upload source files to Cloud Storage.

Upload input files to Cloud Storage

If you want to use the Google Cloud console or the BigQuery Migration API to perform a translation job, you must upload the source files containing the queries and scripts you want to translate to Cloud Storage. You should also upload any metadata files you chose to create. For information on how to create buckets and upload files to Cloud Storage, see Creating storage buckets and Uploading Objects.

Submit a translation job

Follow these steps to start a translation job, view its progress, and see the results.

Batch translation client

  1. Install the batch translation client and the Google Cloud CLI.

  2. Generate a gcloud CLI credential file.

  3. In the batch translation client installation directory, use the text editor of your choice to open the config.yaml file and modify the following settings:

    • project_number: Type the project number of the project you want to use for the batch translation job. You can find this in the Project info pane on the Google Cloud console dashboard for the project.
    • gcs_bucket: Type the name of the Cloud Storage bucket that the batch translation client should use to store files during translation job processing.
    • input_directory: Type the absolute or relative path to the directory containing the source files and any metadata files.
    • output_directory: Type the absolute or relative path to the target directory for the translated files.
  4. Save the changes and close the config.yaml file.

  5. Place your source and metadata files in the input directory.

  6. Run the batch translation client using the following command:

    bin/dwh-migration-client
    

    Once the translation job is created, you can see its status in the translation jobs list in the Google Cloud console.

  7. Optional. Once the translation job is completed, delete the files that the job created in the Cloud Storage bucket you specified, in order to avoid storage costs.

Console

These steps assume you have source files uploaded into a Cloud Storage bucket already.

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. Click Start Translation.

  4. Fill in the translation configuration dialog.

    1. For Display name, type a name for the translation job. The name can contain letters, numbers or underscores.
    2. For Processing location, select the location where you want the translation job to run. For example, if you are in Europe and you don't want your data to cross any location boundaries, select the eu region. The translation job performs best when you choose the same location as your source file bucket.
    3. For Source dialect, select the SQL dialect that you want to translate.
    4. For Target dialect, select BigQuery.
  5. Click Next.

  6. For Source location, specify the path to the Cloud Storage folder containing the files to translate. You can type the path in the format bucket_name/folder_name/ or use the Browse option.

  7. Click Next.

  8. For Target location, specify the path to the destination Cloud Storage folder for the translated files. You can type the path in the format bucket_name/folder_name/ or use the Browse option.

  9. If you're doing translations that don't need to have default object names or source-to-target name mapping specified, skip to Step 11. Otherwise, click Next.

  10. Fill in the optional settings that you need.

    1. Optional. For Default database, type a default database name to use with the source files. The translator uses this default database name to resolve SQL objects' fully qualified names where the database name is missing.

    2. Optional. For Schema search path, specify a schema to search when the translator needs to resolve SQL objects' fully qualified names in the source files where the schema name is missing. If the source files use a number of different schema names, click Add Schema Name and add a value for each schema name that might be referenced.

      The translator searches through the metadata files you provided to validate tables with their schema names. If a definite option can't be determined from the metadata, the first schema name you enter is used as the default. For more information on how the default schema name is used, see default schema.

    3. Optional. If you want to specify name mapping rules to rename SQL objects between the source system and BigQuery during translation, you can either provide a JSON file with the name mapping pair, or you can use the Google Cloud console to specify the values to map.

      To use a JSON file:

      1. Click Upload JSON file for name mapping.
      2. Browse to the location of a name mapping file in the appropriate format, select it, and click Open.

        Note that the file size must be less than 5 MB.

      To use the Google Cloud console:

      1. Click Add name mapping pair.
      2. Add the appropriate parts of the source object name in the Database, Schema, Relationship, and Attribute fields in the Source column.
      3. Add the parts of the target object name in BigQuery in the fields in the Target column.
      4. For Type, select the object type that describes the object you are mapping.
      5. Repeat Steps 1 - 4 until you have specified all of the name mapping pairs that you need. Note that you can only specify up to 25 name mapping pairs when using the Google Cloud console.
  11. Click Create to start the translation job.

Once the translation job is created, you can see its status in the translation jobs list.

API

These steps assume you have source files uploaded into a Cloud Storage bucket already.

  1. Call the create method with a defined workflow.
  2. Call the start method to start the translation workflow.

You can use the BigQuery Migration API client libraries for Go, Java, and Python to make it easier to configure and submit batch translation jobs programmatically.

Explore the translation output

After running the translation job, you can see information about the job in the Google Cloud console. If you used the Google Cloud console or the BigQuery Migration API to run the job, you can see job results in the destination Cloud Storage bucket that you specified. If you used the batch translation client to run the job, you can see job results in the output directory that you specified. The batch SQL translator outputs the following files to the specified destination:

  • The translated files.
  • The translation summary report in CSV format.
  • The consumed output name mapping in JSON format.

Google Cloud console output

To see translation job details, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. In the list of translation jobs, locate the one you are interested in, and click the translation job name.

  4. In the Results section, you can see see the overall translation success rate, the number of statements processed, and the job duration.

  5. Select the Actions tab to see translation issues and how often they occurred. Each action listed has child rows that show the issue category, the specific issue message, and a link to the file in which the issue occurred.

  6. Select the Translation Configuration tab to see the translation job configuration details.

Summary report

The summary report is a CSV file that contains a table of all of the warning and error messages encountered during the translation job.

To see the summary file in the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the Migration section of the navigation panel, click SQL translation.

  3. In the list of translation jobs, locate the one you are interested in, and click Show details in the Status column.

  4. In the Translation report section, click batch_translation_report.csv.

  5. On the Object details page, click the value in the Authenticated URL row to see the file in your browser.

The following table describes the summary file columns:

Column Description
Timestamp The timestamp at which the issue occurred.
FilePath The path to the source file that the issue is associated with.
FileName The name of the source file that the issue is associated with.
ScriptLine The line number where the issue occurred.
ScriptColumn The column number where the issue occurred.
TranspilerComponent The translation engine internal component where the warning or error occurred. This column might be empty.
Environment The translation dialect environment associated with the warning or error. This column might be empty.
ObjectName The SQL object in the source file that is associated with the warning or error. This column might be empty.
Severity The severity of the issue, either warning or error.
Category The translation issue category.
Message The translation issue warning or error message.
ScriptContext The SQL snippet in the source file that is associated with the issue.
Action The action we recommend you take to resolve the issue.

Consumed output name mapping file

This JSON file contains the output name mapping rules that were used by the translation job. The rules in this file might differ from the output name mapping rules that you specified for the translation job, due to either conflicts in the name mapping rules, or lack of name mapping rules for SQL objects that were identified during translation. Review this file to determine whether the name mapping rules need correction. If they do, create new output name mapping rules that address any issues you identify, and run a new translation job.

Translated files

For each source file, a corresponding output file is generated in the destination path. The output file contains the translated query.

Limitations

The translator can't translate user-defined functions (UDFs) from languages other than SQL, because it can't parse them to determine their input and output data types. This causes translation of SQL statements that reference these UDFs to be inaccurate. To make sure non-SQL UDFs are properly referenced during translation, use valid SQL to create placeholder UDFs with the same signatures.

For example, say you have a UDF written in C that calculates the sum of two integers. To make sure that SQL statements that reference this UDF are correctly translated, create a placeholder SQL UDF that shares the same signature as the C UDF, as shown in the following example:

CREATE FUNCTION Test.MySum (a INT, b INT)
  RETURNS INT
  LANGUAGE SQL
  RETURN a + b;

Save this placeholder UDF in a text file, and include that file as one of the source files for the translation job. This enables the translator to learn the UDF definition and identify the expected input and output data types.

Quota and limits

  • BigQuery Migration API quotas apply.
  • Each project can have at most 10 active translation tasks.
  • While there is no hard limit on the total number of source and metadata files, we recommend keeping the number of files to under 1000 for better performance.

What's next

Learn more about the following steps in data warehouse migration: