Migrating data from Teradata

The combination of the BigQuery Data Transfer Service and a special migration agent allows you to copy your data from the Teradata on-premises data warehouse system to BigQuery. This document describes the process of migrating data from Teradata using BigQuery Data Transfer Service.

Before you begin

To ensure a successful Teradata data warehouse migration, make sure you have met the following prerequisites.

Google Cloud Platform requirements

  1. Choose or create a Google Cloud Platform project to store your migration data:

  2. Enable these Google Cloud Platform APIs.

    Console

    In the Google Cloud Platform Console, click the ENABLE button on the both of the following pages.

    BigQuery is automatically enabled in new projects. For an existing project, you may need to enable the BigQuery API.

    Example:

    Enable API

    A green checkmark indicates that you've already enabled the API.

    Enabled API

    CLI

    You can optionally use the gcloud command-line interface (CLI) to enable the APIs.

    You can issue gcloud commands in the Cloud Shell or you can download the CLI tool and install it on the local machine, as follows:

    Enter the following gcloud commands:

      gcloud services enable bigquerydatatransfer.googleapis.com
      gcloud services enable storage-component.googleapis.com
      gcloud services enable pubsub.googleapis.com
    

    BigQuery is automatically enabled in new projects. For an existing project, also enable the BigQuery API.

      gcloud services enable bigquery.googleapis.com
    
  3. Create a BigQuery dataset to store your data. You do not need to create any tables.

  4. Create a Cloud IAM service account. See Creating and managing service accounts for information on creating a service account.

  5. Grant the service account the following Cloud IAM roles. See Granting roles to a service account.

    • BigQuery: The predefined Cloud IAM role bigquery.admin.
    • Cloud Storage: The predefined Cloud IAM role storage.objectAdmin.
  6. Create a Cloud Storage bucket for staging the data. You use this bucket name later in the migration process.

  7. Allow pop-ups in your browser from bigquery.cloud.google.com so that you can view the permissions window when you set up the transfer. You must grant the BigQuery Data Transfer Service permission to manage your transfer.

On-premises requirements

  1. Local machine requirements
    • The migration agent uses a JDBC connection with the Teradata instance and Google Cloud Platform APIs. Ensure that network access is not blocked by a firewall.
    • Ensure that Java Runtime Environment 8 or above is installed.
    • Ensure you have the minimum recommended storage space as described in the staging space requirements.
  2. Teradata connection details
    • Username and password of a user with read access to the system tables and the tables that are being migrated.
    • Hostname and port number to connect to the Teradata instance.
  3. Download the required JDBC drivers from Teradata: tdgssconfig.jar and terajdbc4.jar.
  4. Download your Google Cloud Platform credentials.

Migration options

Because every migration has unique requirements, the migration service can be customized:

  • Using JDBC FastExport vs Teradata Parallel Transporter (TPT)
    • We recommend extracting with TPT and customizing your schema to indicate partition columns. This will result in the fastest data extraction.
    • If there are tight constraints on local storage space or TPT is not available, use default extraction.
  • Schema customization
    • The Migration Service converts your schema automatically, but optionally you can further customize the schema. Schema customization is strongly recommended for some situations, as explained below.
  • One-time data movement
    • If your table is very large and you can extract with TPT for higher performance, we recommend partitioning your Teradata table to enable allow partition-by-partition extraction. For more details, see optional schema file.
    • If your tables are small or you cannot use TPT, follow the basic instructions. Schema customization is not required.
  • Incremental data movement
    • If you would like to incrementally migrate changes from Teradata to BigQuery, you can use incremental mode. On a regular basis, new and changed records from Teradata will be appended to BigQuery tables.
    • This method requires customizing your schema to annotate COMMIT_TIMESTAMP columns. For more details, see incremental transfers.

Setting up a Teradata migration

This section describes the step-by-step process of setting up a data migration from Teradata to BigQuery. The steps are:

  • Download the migration agent.
  • Set up a BigQuery Data Transfer Service transfer.
  • Initialize the migration agent.
  • Run the migration agent.

Downloading the migration agent

Use this link to download the migration agent to the local machine where the data warehouse is located.

After you've installed the migration agent, you will set up a BigQuery Data Transfer Service transfer, initialize the migration agent, and then run the agent to start the data migration.

Setting up a transfer

Create a transfer with the BigQuery Data Transfer Service.

Console

  1. Go to the BigQuery web UI in the GCP Console.

    Go to the GCP Console

  2. Click Transfers.

  3. Click Add Transfer.

  4. On the New Transfer page:

    • For Source, choose Migration: Teradata.
    • For Display name, enter a name for the transfer such as My Migration. The display name can be any value that allows you to easily identify the transfer if you need to modify it later.
    • (Optional) For Schedule, you can leave the default value of every 24 hours (based on creation time), or click Edit to change the time.
    • For Destination dataset, choose the appropriate dataset.

      New Teradata migration general

  5. Under Data Source Details, continue with specific details for your Teradata transfer.

    • For Database type, choose Teradata.
    • For Cloud Storage bucket, browse for the name of the Cloud Storage bucket for staging the migration data. Do not type in the prefix gs:// – enter only the bucket name.
    • For Database name, enter the name of the source database in Teradata.
    • For Table name patterns enter a pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
    • For Service account email, enter the email attached to the IAM service account you created.
    • (Optional) For Schema file path, enter the path and file name of a JSON schema file. If no schema file is entered, BigQuery automatically detects the table schema by using the source data being transferred. You can create your own schema file, as shown in the screenshot example below, or you can use the migration agent to help you create a schema file. For information on creating a schema file, see the initializing the migration agent section.

      New Teradata migration

    • (Optional) In the Notification options section:

      • Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
      • For Select a Cloud Pub/Sub topic, choose your topic name or click Create a topic. This option configures Cloud Pub/Sub run notifications for your transfer.

        Cloud Pub/Sub topic

  6. Click Save.

  7. The GCP Console will display all the transfer setup details, including a Resource name for this transfer. Note the resource name, because you will need to enter it later when you run the migration agent.

    Transfer confirmation

Classic UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click Transfers.

  3. Click Add Transfer.

  4. On the New Transfer page:

    • For Source, choose Migration: Teradata.
    • For Display name, enter a name for the transfer such as My Migration. The display name can be any value that allows you to easily identify the transfer if you need to modify it later.
    • (Optional) For Schedule, you can leave the default value of every 24 hours (based on creation time), or click Edit to change the time.

      Query schedule

    • For Destination dataset, choose the appropriate dataset.

    • For Database type, choose Teradata.

    • For Cloud Storage bucket, enter the name of the Cloud Storage bucket for staging the migration data. Do not include the prefix gs:// – enter only the bucket name.

    • For Database name, enter the name of the source database in Teradata.

    • For Table name patterns enter a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.

    • For Service account email, enter the email attached to the IAM service account you created.

    • (Optional) For Schema file path, enter the path and file name of a JSON schema file. If no schema file is entered, BigQuery automatically detects the table schema by using the source data being transferred. You can create your own schema file, as shown in the screenshot example below, or you can use the migration agent to help you create a schema file. For information on creating a schema file, see the initializing the migration agent section.

      New Teradata migration

    • (Optional) Expand the Advanced section and configure run notifications for your transfer.

    • For Cloud Pub/Sub topic, enter your Cloud Pub/Sub topic name, for example, projects/myproject/topics/mytopic.

    • Check Send email notifications to allow email notifications of transfer run failures.

      Cloud Pub/Sub topic

  5. Click Add.

  6. When prompted, click Allow to give the BigQuery Data Transfer Service permission to manage your transfer. You must allow browser pop-ups from bigquery.cloud.google.com to view the permissions window.

    Allow transfer

  7. The web UI will display all the transfer setup details, including a Resource name for this transfer. Note the resource name. You need to enter it later when you run the migration agent.

    Transfer Confirmation

CLI

Enter the bq mk command and supply the transfer creation flag — --transfer_config. The following flags are also required:

  • --data_source
  • --display_name
  • --target_dataset
  • --params
bq mk \
--transfer_config \
--project_id=project ID \
--target_dataset=dataset \
--display_name=name \
--params='parameters' \
--data_source=data source

Where:

  • project ID is your project ID. If --project_id isn't supplied to specify a particular project, the default project is used.
  • dataset is the dataset you wish to target (--target_dataset) for the transfer configuration.
  • name is the display name (--display_name) for the transfer configuration. The transfer's display name can be any value that allows you to easily identify the transfer if you need to modify it later.
  • parameters contains the parameters (--params) for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'.
    • For Teradata migrations, these parameters are required: bucket, database_type, agent_service_account, database_name, table_name_patterns, .
      • bucket is the Cloud Storage bucket that will act as a staging area during the migration.
      • database_type is Teradata.
      • agent_service_account is the email address associated with the service account that you created.
      • database_name is the name of the source database in Teradata.
      • table_name_patterns is a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
  • data_source is the data source (--data_source): on_premises.

For example, the following command creates a Teradata transfer named My Transfer using Cloud Storage bucket mybucket and target dataset mydataset. The transfer will migrate all tables from the Teradata data warehouse mydatabase and the optional schema file is myschemafile.json. The transfer is created in the default project:

bq mk \
--transfer_config \
--project_id=123456789876 \
--target_dataset=MyDataset \
--display_name='My Migration' \
--params='{"bucket": "mybucket", "database_type": "Teradata",
"database_name":"mydatabase", "table_name_patterns": ".*",
"agent_service_account":"myemail@mydomain.com", "schema_file_path":
"gs://mybucket/myschemafile.json"}' \
--data_source=on_premises

After running the command, you receive a message like the following:

[URL omitted] Please copy and paste the above URL into your web browser and follow the instructions to retrieve an authentication code.

Follow the instructions and paste the authentication code on the command line.

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Migration Agent

You can optionally create the transfer directly from the migration agent, in the initializing the migration agent section.

To create the transfer from the migration agent, first you will need to grant the Cloud IAM role serviceAccessTokenCreator to the service account you created.

You can grant the Cloud IAM role in one of these two ways:

gcloud projects add-iam-policy-binding user_project_id \
--member='serviceAccount:service-user_project_number@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com' \
--role='roles/iam.serviceAccountTokenCreator'

Once you've granted the serviceAccessTokenCreator permission to the service account, you can move on to downloading the migration agent, and then set up the transfer as part of the initialization step.

Initializing the migration agent

When you are starting a data migration for the first time, initialize the migration agent. Initialization is required only once each time you set up a migration transfer, whether or not it is recurring.

This session will not start the migration; it is only for initial configuration.

  1. Open a new session. On the command line, you issue a command to run the jar file, with some particular flags, in this form:

    java -cp \
    OS-specific-separated-paths-to-jars (JDBC and agent) \
    com.google.cloud.bigquery.dms.Agent \
    --initialize
    

    Unix, Linux, Mac OS

    java -cp \
    /usr/local/migration/Teradata/JDBC/tdgssconfig.jar:/usr/local/migration/Teradata/JDBC/terajdbc4.jar:/usr/local/migration/mirroring-agent.jar \
    com.google.cloud.bigquery.dms.Agent \
    --initialize
    

    Windows

    Copy all the files into the C:\migration folder (or adjust the paths in the command), then run:

    java -cp C:\migration\tdgssconfig.jar;C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --initialize
    
  2. When prompted, enter following parameters:

    • Whether you want to save the Teradata Parallel Transporter (TPT) template to disk. If you are planning to use the TPT extraction method, you can modify the saved template with parameters that suit your Teradata instance.
    • The URI of the source database. Include the port number, if needed.
    • Path to a local scratch space for the migration. Ensure you have the minimum recommended storage space as described in staging space requirements.
    • Whether you will use Teradata Parallel Transporter (TPT) as the extraction method.
    • (Optional) Path to a database credential file.
  3. When prompted for a BigQuery Data Transfer Service Resource name:

    You can enter the Resource name of the transfer you configured in the BigQuery web UI, or you can create the transfer at this time, through the migration agent itself. Optionally, you can use the migration agent initialization command to create a schema file. See the Migration Agent tab below for this option.

    Console

    Enter the Resource name of the transfer you previously set up in the Console tab of the section on setting up a transfer.

    Classic UI

    Enter the Resource name of the transfer you previously set up in the Classic UI tab of the section on setting up a transfer.

    Migration Agent

    • Enter the Google Cloud Platform Project ID.
    • Enter the name of the source database in Teradata.
    • Enter a pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
    • Optional: Enter the path to a local JSON schema file (recommended for recurring transfers). This schema file will be uploaded to your Cloud Storage bucket).
      • Choose to create new schema file. In this case, you will be prompted for a Teradata username and password, and the agent will produce a JSON file with converted schema. The file will be created in a local folder, following this pattern: <localpath>/schema/DO_NOT_REMOVE_td2bq_table_schemas_<string>.json. After uploading to your Cloud Storage bucket, the path and file name will follow this pattern: gs://mybucket/myproject_id/schema/DO_NOT_REMOVE_td2bq_table_schemas_<string>.json.
      • Modify the schema file to mark partitioning, clustering, primary keys and change tracking columns, and verify that you want to use this schema for the transfer configuration. See the optional schema file section for tips.
    • Enter the name of the destination dataset in BigQuery.
    • Enter the name of Cloud Storage bucket where migration data will be staged before loading to BigQuery.
    • Enter a name for the transfer configuration.
  4. After entering all the requested parameters, the migration agent creates a configuration file and puts it into the local path provided in the parameters. See the next section for a closer look at the configuration file.

Configuration file for the migration agent

The configuration file created in the initialization step will look like this example:


   {
    "agent-id": "0eebc1ad-621d-4386-87f1-8eca96991f63",
    "transfer-configuration": {
      "project-id": "123456789876",
      "location": "us",
      "id": "5d533a90-0000-2e89-89a0-94eb2c059a76"
    },
    "source-type": "teradata",
    "console-log": false,
    "silent": false,
    "teradata-config": {
      "connection": {
       "host": "localhost"
      },
      "local-processing-space": "extracted",
      "database-credentials-file-path": "",
      "max-local-storage": "200GB",
      "use-tpt": false,
      "max-sessions": 0,
      "max-parallel-upload": 1,
      "max-unload-file-size": "2GB"
     }
   }
   

All options for the migration agent configuration file

  • transfer-configuration: information about this transfer configuration in BigQuery Data Transfer Service.
  • teradata-config: information specific for this Teradata extraction:

    • connection: information about the hostname and port
    • local-processing-space: the extraction folder where the agent will extract table data to, before uploading it to Cloud Storage.
    • Optional: database-credentials-file-path: the path to a file that contains credentials for connecting to the Teradata database automatically. The file should contain two lines, for example:
      username=abc
      password=123
      
      If no path is provided, you will be prompted for a username and password when you start an agent.
    • max-local-storage: controls the maximum amount of local storage to use for the extraction in the specified staging directory. The default value is 200GB. The supported format is: numberKB|MB|GB|TB.

      In all extraction modes, files are deleted from your staging directory after they are uploaded to Cloud Storage.

      The actual amount of staging space required depends upon the type of extraction:

      • In the default extraction, small chunks of data will be written and continually uploaded to your specified Cloud Storage bucket. Extraction will pause when your specified limit is reached.
      • In extraction with Teradata Parallel Transporter (TPT) without a partitioning column, your whole table is extracted, regardless of the max_local_storage setting.
      • In extraction with Teradata Parallel Transporter (TPT) with a partitioning column, the agent will extract sets of partitions. The staging storage requirements are up to the larger of max_local_storage or the size of your table's largest partition in extracted to CSV format.
    • use-tpt: directs the migration agent to use Teradata Parallel Transporter (TPT) as an extraction method.

      For each table, the migration agent will generate a TPT script, start a tbuild process and wait for completion. Once the tbuild process completes, the agent will list and upload the extracted files to Cloud Storage and delete the TPT script.

      To use the TPT extraction method:

      • The tbuild utility should be installed and available for the migration agent to use and start the tbuild process.
      • The local extraction folder should have enough space for extracting the largest table's partition in CSV format. Due to formatting, a CSV file will be larger than the size of the original table in Teradata.
    • max-sessions: specifies the maximum number of sessions used by the export job (either FastExport or TPT). If set to 0, then the Teradata database will determine the maximum number of sessions for each export job.

    • max-parallel-uploads: determines number of files uploaded to Cloud Storage in parallel. Depending on your network bandwidth and other settings (such as DLP scanning), increasing this parameter could improve performance.

    • max-unload-file-size: determines the maximum extracted file size. This parameter is not enforced for TPT extractions.

Running the migration agent

After initializing the migration agent and creating the configuration file, use the steps below to run the agent and start the migration.

  1. Start to run the agent, with the classpath to the JDBC drivers and path to the configuration file created in the previous initialization step.

  2. java -cp 
    OS-specific-separated-paths-to-jars (JDBC and agent)
    com.google.cloud.bigquery.dms.Agent
    --configuration-file=path to configuration file

    Unix, Linux, Mac OS

    java -cp \
    /usr/local/migration/Teradata/JDBC/tdgssconfig.jar:/usr/local/migration/Teradata/JDBC/terajdbc4.jar:mirroring-agent.jar \
    com.google.cloud.bigquery.dms.Agent \
    --configuration-file=config.json
    

    Windows

    Copy all the files into the C:\migration folder (or adjust the paths in the command), then run:

    java -cp C:\migration\tdgssconfig.jar;C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --configuration-file=config.json
    

    If you are ready to proceed with the migration, press Enter and the agent will proceed if the classpath provided during initialization is valid.

  3. When prompted, enter the username and password for the database connection. If the username and password are valid, the data migration starts.

    Optional You can also use a flag in the command to start the migration that passes a credentials file to the agent, instead of entering the username and password each time. See the optional parameter database-credentials-file-path in the agent configuration file, for more information.

  4. Leave this session open until the migration is completed. If you created a recurring migration transfer, keep this session open indefinitely. If this session is interrupted, current and future transfer runs will fail.

  5. Periodically monitor if the agent is running. If a transfer run is in progress and no agent responds within 24 hours, the transfer run will fail.

  6. If the migration agent dies while the transfer is in progress or scheduled, the BigQuery Data Transfer Service web UI will show the error status, and will prompt you to restart the agent. To start the migration agent again, resume from the beginning of this section, running the migration agent, with the command for running the migration agent. You do not need to repeat the initialization command. The transfer will resume from the point where tables were not completed.

Optional schema file

About schema files

A schema file is a JSON file that describes database objects. The schema includes a set of databases, each containing a set of tables, each of which contains a set of columns. Each column has a type field — a type that is assigned to a column in BigQuery.

Each object has a name field — a name that will be assigned to it in BigQuery. Each object also has an originalName field — the name of the matching object in the Teradata database.

Making transformations with a schema file

As part of the migration, you can use the schema file to change the name fields of any object, and can add the usageType array to any column. This is especially useful for including additional information about a table, like partitioning, that would otherwise be lost in the migration, if no schema file were specified.

The following usage types are supported:

  • PARTITIONING: only one column per table can be annotated with this usageType. The type field of a column should be DATE or TIMESTAMP. This column will be used for partitioned table definition for containing tables object.
  • CLUSTERING: several columns in one table can be annotated with this usageType. Column types should follow the constraints for clustering in BigQuery. If a PARTITIONING field is specified for the same table, BigQuery will use these columns to create a clustered table. Only columns with the types INT64, STRING, DATE, TIMESTAMP, BOOL or NUMERIC can be marked with this usageType.
  • COMMIT_TIMESTAMP: only one column per table can be annotated with this usageType. Use this usageType to annotate an update timestamp column. This column will be used to extract rows created/updated since the last transfer run. It should have a TIMESTAMP or DATE type.

Example of a modified schema file:


{
  "databases": [
    {
      "name": "db",
      "originalName": "db",
      "tables": [
        {
          "name": "test",
          "originalName": "test",
          "columns": [
            {
              "name": "foo",
              "originalName": "foo",
              "type": "INT64",
              "usageType": ["CLUSTERING"]
            },
            {
              "name": "bar",
              "originalName": "bar",
              "type": "DATE",
              "usageType": ["PARTITIONING"]
            },
            {
              "name": "change",
              "originalName": "change",
              "type": "TIMESTAMP",
              "usageType": ["COMMIT_TIMESTAMP"]
            }
          ]
        }
      ]
    }
  ]
}

For more information on exactly how tables are partitioned or clustered in a transfer, see the incremental transfers section.

Tracking progress of the migration

You can view the status of the migration in the BigQuery Data Transfer Service web UI. You can also set up Cloud Pub/Sub or email notifications. See BigQuery Data Transfer Service notifications.

The BigQuery Data Transfer Service will schedule and initiate a transfer run on a schedule specified upon the creation of transfer configuration. It is important that the migration agent is running when a transfer run is active. If there are no updates from the agent side within 24 hours, a transfer run will fail.

Example of migration status in the BigQuery Data Transfer Service web UI:

Migration status

Incremental transfers

For each transfer run, a timestamp of the transfer run will be saved. For each subsequent transfer run, an agent will receive timestamp of a previous transfer run (T1) and a timestamp when current transfer run started (T2).

For each transfer after initial run, the migration agent will extract data using the following per-table logic:

  • If a table object in a schema file does not have a column with a usageType of COMMIT_TIMESTAMP, then the table will be skipped.
  • If a table has a column with the usageType of COMMIT_TIMESTAMP, then all rows with timestamp between T1 and T2 will be extracted and appended to the existing table in BigQuery.

Upgrading the migration agent

If a new version of the migration agent is available, you will need to manually update the migration agent. To receive notices about the BigQuery Data Transfer Service, subscribe to the release notes.

What's next

Trang này có hữu ích không? Hãy cho chúng tôi biết đánh giá của bạn:

Gửi phản hồi về...

BigQuery Data Transfer Service