Migrating data from Teradata

The combination of the BigQuery Data Transfer Service and a special migration agent allows you to copy your data from a Teradata on-premises data warehouse instance to BigQuery. This document describes the step-by-step process of migrating data from Teradata using the BigQuery Data Transfer Service.

Before you begin

To ensure a successful Teradata data warehouse migration, make sure you have met the following prerequisites.

Google Cloud Platform requirements

  1. Choose or create a Google Cloud Platform project to store your migration data. You must have owner permissions on the Google Cloud Platform project to set up the transfer.

  2. Enable these Google Cloud Platform APIs.

    Console

    In the Google Cloud Platform Console, click the ENABLE button on both of the following pages.

    BigQuery is automatically enabled in new projects. For an existing project, you may need to enable the BigQuery API.

    Example:

    Enable API.

    A green checkmark indicates that you've already enabled the API.

    Enabled API.

    CLI

    You can optionally use the gcloud command-line interface (CLI) to enable the APIs.

    You can issue gcloud commands in the Cloud Shell or you can download the CLI tool and install it on the local machine, as follows:

    Enter the following gcloud commands:

      `gcloud services enable bigquerydatatransfer.googleapis.com`
      `gcloud services enable storage-component.googleapis.com`
      `gcloud services enable pubsub.googleapis.com`
    

    BigQuery is automatically enabled in new projects. For an existing project, also enable the BigQuery API.

      `gcloud services enable bigquery.googleapis.com`
    
  3. Create a BigQuery dataset to store your data. You do not need to create any tables.

  4. Create a Cloud Identity and Access Management service account. See Creating and managing service accounts for information about creating a service account.

  5. Grant the service account the following Cloud IAM roles. See Granting roles to a service account.

    • BigQuery: The predefined Cloud IAM role bigquery.admin.
    • Cloud Storage: The predefined Cloud IAM role storage.objectAdmin.
  6. Create a Cloud Storage bucket for staging the data. You use this bucket name later in the migration process.

  7. Allow pop-ups in your browser from bigquery.cloud.google.com so that you can view the permissions window when you set up the transfer. You must grant the BigQuery Data Transfer Service permission to manage your transfer.

On-premises requirements

  1. Local machine requirements
    • The migration agent uses a JDBC connection with the Teradata instance and Google Cloud Platform APIs. Ensure that network access is not blocked by a firewall.
    • Ensure that Java Runtime Environment 8 or higher is installed.
    • Ensure you have the minimum recommended storage space as described in the staging space requirements.
  2. Teradata connection details
    • Username and password of a user with read access to the system tables and the tables that are being migrated.
    • Hostname and port number to connect to the Teradata instance.
  3. Download the required JDBC drivers from Teradata: tdgssconfig.jar and terajdbc4.jar.
  4. Download your Google Cloud Platform credentials.

Transfer modes and options

Because every migration has unique requirements, the migration agent can be customized in the following ways. There are three major choices when setting up a data transfer from Teradata to BigQuery:

Extraction method

The BigQuery Data Transfer Service supports two different extraction methods for transferring data from Teradata to BigQuery:

  1. Extraction using JDBC driver with FastExport connection. In this mode, a table is extracted into a collection of AVRO files to a specified location on a local file system. Extracted files are then uploaded to a specified Cloud Storage bucket and, after successful transfer, the files are deleted from the local file system.
    • Limitations on the amount of space in a local file system are strongly enforced, and extraction is paused until extracted files are uploaded and deleted from the local file system.
    • If there are tight constraints on local storage space or TPT is not available, use this extraction method.
    • JDBC driver with FastExport is the default extraction method.
  2. Extraction using Teradata Parallel Transporter (TPT) tbuild utility. In this mode, an agent attempts to calculate extraction batches using rows distributed by partitions. For each batch, a TPT extraction script is emitted and executed, producing a set of pipe delimited files. After each batch extraction, files are uploaded to a specified Cloud Storage bucket and deleted from the local file system. Limitations on the amount of space in the local file system are not enforced, so make sure the local file system has enough space to extract the largest partition in a Teradata table.
    • We recommend extracting with TPT and customizing your schema to indicate partition columns. This results in the fastest data extraction.

Read more about specifying the extraction method in the configuration for the migration agent section of the step-by-step transfer set up instructions.

Custom schema file

A schema file is a JSON file that describes database objects. The schema includes a set of databases, each containing a set of tables, each of which contains a set of columns. Each column has a type field — a type that is assigned to a column in BigQuery.

In a schema file, each object has a name field — a name that will be assigned to it in BigQuery. Each object also has an originalName field — the name of the matching object in the Teradata database.

The BigQuery Data Transfer Service provides automatic schema detection and data conversion during a data transfer from Teradata to BigQuery. Optionally, you can also specify a custom schema file. Schema customization is strongly recommended for some situations. Examples:

  • A custom schema file is especially useful for including additional information about a table, like partitioning, that would otherwise be lost in the migration, if no schema file were specified. for more information on how to use the custom schema file option.
  • You can choose to provide a custom schema file to transform fields, like the name field of any object, or the usageType array of any column, during the data transfer.
  • See Custom schema file for more detail.

On-demand or incremental transfers

When migrating data from a Teradata database instance to BigQuery, the BigQuery Data Transfer Service supports both one-time, snapshot data transfer (an "on-demand transfer") and recurring, periodic transfers of new and updated rows ("incremental transfers") (Beta). You designate the transfer as on-demand or incremental in the scheduling options when Setting up a transfer.

  • On-demand data transfer
    • If your table is very large and you can extract with TPT for higher performance, we recommend partitioning your Teradata table to enable allow partition-by-partition extraction. For more details, see Custom schema file.
    • If your tables are small or you cannot use TPT, follow the basic instructions. Schema customization is not required.
  • Incremental data transfer
    • If you would like to regularly migrate changes from Teradata to BigQuery, you can use incremental mode. On a recurring basis, new and changed records from Teradata are appended to BigQuery tables.
    • This method requires customizing your schema to annotate COMMIT_TIMESTAMP columns.
    • Certain conditions apply when setting up incremental transfers. For more information, see incremental transfers.

Setting up a Teradata migration

This section describes the step-by-step process of setting up a data migration from Teradata to BigQuery. The steps are:

  • Download the migration agent.
  • Set up a BigQuery Data Transfer Service transfer.
  • Initialize the migration agent.
  • Run the migration agent.

Downloading the migration agent

Use this link to download the migration agent to the local machine where the data warehouse is located.

After you've installed the migration agent, you will set up a BigQuery Data Transfer Service transfer, initialize the migration agent, and then run the agent to start the data migration.

Setting up a transfer

Create a transfer with the BigQuery Data Transfer Service.

Console

  1. In the Google Cloud Platform Console, go to the BigQuery web UI.

    Go to the GCP Console

  2. Click Transfers.

  3. Click Create Transfer.

  4. Under Source type:

    • Choose Migration: Teradata.
    • For Transfer config name, enter a display name for the transfer such as My Migration. The display name can be any value that lets you easily identify the transfer if you need to modify it later.
    • (Optional) For Schedule options, you can leave the default value of Daily (based on creation time) or choose another time.
    • For Destination settings, choose the appropriate dataset.

      New Teradata migration general.

  5. Under Data Source Details, continue with specific details for your Teradata transfer.

    • For Database type, choose Teradata.
    • For Cloud Storage bucket, browse for the name of the Cloud Storage bucket for staging the migration data. Do not type in the prefix gs:// – enter only the bucket name.
    • For Database name, enter the name of the source database in Teradata.
    • For Table name patterns enter a pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
    • For Service account email, enter the email attached to the Cloud Identity and Access Management service account you created.
    • (Optional) For Schema file path, enter the path and filename of a JSON schema file. If no schema file is entered, BigQuery automatically detects the table schema by using the source data being transferred. You can create your own schema file, as shown in the screenshot example below, or you can use the migration agent to help you create a schema file. For information on creating a schema file, see the initializing the migration agent section.

      New Teradata migration

    • (Optional) In the Notification options section:

      • Click the toggle to enable email notifications. When you enable this option, the transfer administrator receives an email notification when a transfer run fails.
      • For Select a Cloud Pub/Sub topic, choose your topic name or click Create a topic. This option configures Cloud Pub/Sub run notifications for your transfer.

        Cloud Pub/Sub topic

  6. Click Save.

  7. The GCP Console displays all the transfer setup details, including a Resource name for this transfer. Note the resource name, because you will need to enter it later when you run the migration agent.

    Transfer confirmation

Classic UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click Transfers.

  3. Click Add Transfer.

  4. On the New Transfer page:

    • For Source, choose Migration: Teradata.
    • For Display name, enter a name for the transfer such as My Migration. The display name can be any value that allows you to easily identify the transfer if you need to modify it later.
    • (Optional) For Schedule, you can leave the default value of every 24 hours (based on creation time), or click Edit to change the time.

      Query schedule

    • For Destination dataset, choose the appropriate dataset.

    • For Database type, choose Teradata.

    • For Cloud Storage bucket, enter the name of the Cloud Storage bucket for staging the migration data. Do not include the prefix gs:// – enter only the bucket name.

    • For Database name, enter the name of the source database in Teradata.

    • For Table name patterns enter a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.

    • For Service account email, enter the email attached to the Cloud Identity and Access Management service account you created.

    • (Optional) For Schema file path, enter the path and filename of a JSON schema file. If no schema file is entered, BigQuery automatically detects the table schema by using the source data being transferred. You can create your own schema file, as shown in the following example screenshot, or you can use the migration agent to help you create a schema file. For information about creating a schema file, see the initializing the migration agent section.

      New Teradata migration

    • (Optional) Expand the Advanced section and configure run notifications for your transfer.

    • For Cloud Pub/Sub topic, enter your Cloud Pub/Sub topic name, for example, projects/myproject/topics/mytopic.

    • Check Send email notifications to allow email notifications of transfer run failures.

      Cloud Pub/Sub topic

  5. Click Add.

  6. When prompted, click Allow to give the BigQuery Data Transfer Service permission to manage your transfer. You must allow browser pop-ups from bigquery.cloud.google.com to view the permissions window.

    Allow transfer

  7. The web UI will display all the transfer setup details, including a Resource name for this transfer. Note the resource name. You need to enter it later when you run the migration agent.

    Transfer Confirmation

CLI

Enter the bq mk command and supply the transfer creation flag — --transfer_config. The following flags are also required:

  • --data_source
  • --display_name
  • --target_dataset
  • --params
bq mk \
--transfer_config \
--project_id=project ID \
--target_dataset=dataset \
--display_name=name \
--params='parameters' \
--data_source=data source

Where:

  • project ID is your project ID. If --project_id isn't supplied to specify a particular project, the default project is used.
  • dataset is the dataset you wish to target (--target_dataset) for the transfer configuration.
  • name is the display name (--display_name) for the transfer configuration. The transfer's display name can be any value that allows you to easily identify the transfer if you need to modify it later.
  • parameters contains the parameters (--params) for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'.
    • For Teradata migrations, these parameters are required: bucket, database_type, agent_service_account, database_name, table_name_patterns, .
      • bucket is the Cloud Storage bucket that will act as a staging area during the migration.
      • database_type is Teradata.
      • agent_service_account is the email address associated with the service account that you created.
      • database_name is the name of the source database in Teradata.
      • table_name_patterns is a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
  • data_source is the data source (--data_source): on_premises.

For example, the following command creates a Teradata transfer named My Transfer using Cloud Storage bucket mybucket and target dataset mydataset. The transfer will migrate all tables from the Teradata data warehouse mydatabase and the optional schema file is myschemafile.json.

bq mk \
--transfer_config \
--project_id=123456789876 \
--target_dataset=MyDataset \
--display_name='My Migration' \
--params='{"bucket": "mybucket", "database_type": "Teradata",
"database_name":"mydatabase", "table_name_patterns": ".*",
"agent_service_account":"myemail@mydomain.com", "schema_file_path":
"gs://mybucket/myschemafile.json"}' \
--data_source=on_premises

After running the command, you receive a message like the following:

[URL omitted] Please copy and paste the above URL into your web browser and follow the instructions to retrieve an authentication code.

Follow the instructions and paste the authentication code on the command line.

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Migration Agent

You can optionally create the transfer directly from the migration agent, in the initializing the migration agent section.

To create the transfer from the migration agent, first you will need to grant the Cloud IAM role serviceAccessTokenCreator to the service account you created.

You can grant the Cloud IAM role in one of these two ways:

gcloud projects add-iam-policy-binding user_project_id \
--member='serviceAccount:service-user_project_number@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com' \
--role='roles/iam.serviceAccountTokenCreator'

Once you've granted the serviceAccessTokenCreator permission to the service account, you can move on to downloading the migration agent, and then set up the transfer as part of the initialization step.

Initializing the migration agent

When you are starting a data migration for the first time, initialize the migration agent. Initialization is required only once each time you set up a migration transfer, whether or not it is recurring.

This session will not start the migration; it is only for initial configuration.

  1. Open a new session. On the command line, you issue a command to run the jar file, with some particular flags, in this form:

    java -cp \
    OS-specific-separated-paths-to-jars (JDBC and agent) \
    com.google.cloud.bigquery.dms.Agent \
    --initialize
    

    Unix, Linux, Mac OS

    java -cp \
    /usr/local/migration/Teradata/JDBC/tdgssconfig.jar:/usr/local/migration/Teradata/JDBC/terajdbc4.jar:/usr/local/migration/mirroring-agent.jar \
    com.google.cloud.bigquery.dms.Agent \
    --initialize
    

    Windows

    Copy all the files into the C:\migration folder (or adjust the paths in the command), then run:

    java -cp C:\migration\tdgssconfig.jar;C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --initialize
    
  2. When prompted, enter following parameters:

    • Whether you want to save the Teradata Parallel Transporter (TPT) template to disk. If you are planning to use the TPT extraction method, you can modify the saved template with parameters that suit your Teradata instance.
    • The URI of the source database. Include the port number, if needed.
    • Path to a local scratch space for the migration. Ensure you have the minimum recommended storage space as described in staging space requirements.
    • Whether you will use Teradata Parallel Transporter (TPT) as the extraction method.
    • (Optional) Path to a database credential file.
  3. When prompted for a BigQuery Data Transfer Service Resource name:

    You can enter the Resource name of the transfer you configured in the BigQuery web UI, or you can create the transfer at this time, through the migration agent itself. Optionally, you can use the migration agent initialization command to create a schema file. See the Migration Agent tab below for this option.

    Console

    Enter the Resource name of the transfer you previously set up in the Console tab of the section on setting up a transfer.

    Classic UI

    Enter the Resource name of the transfer you previously set up in the Classic UI tab of the section on setting up a transfer.

    Migration Agent

    • Enter the Google Cloud Platform Project ID.
    • Enter the name of the source database in Teradata.
    • Enter a pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
    • Optional: Enter the path to a local JSON schema file (recommended for recurring transfers). This schema file will be uploaded to your Cloud Storage bucket).
      • Choose to create new schema file. In this case, you will be prompted for a Teradata username and password, and the agent will produce a JSON file with converted schema. The file will be created in a local folder, following this pattern: <localpath>/schema/DO_NOT_REMOVE_td2bq_table_schemas_<string>.json. After uploading to your Cloud Storage bucket, the path and file name will follow this pattern: gs://mybucket/myproject_id/schema/DO_NOT_REMOVE_td2bq_table_schemas_<string>.json.
      • Modify the schema file to mark partitioning, clustering, primary keys and change tracking columns, and verify that you want to use this schema for the transfer configuration. See the optional schema file section for tips.
    • Enter the name of the destination dataset in BigQuery.
    • Enter the name of the Cloud Storage bucket where migration data will be staged before loading to BigQuery.
    • Enter a name for the transfer configuration.
  4. After entering all the requested parameters, the migration agent creates a configuration file and puts it into the local path provided in the parameters. See the next section for a closer look at the configuration file.

Configuration file for the migration agent

The configuration file created in the initialization step will look like this example:


   {
    "agent-id": "0eebc1ad-621d-4386-87f1-8eca96991f63",
    "transfer-configuration": {
      "project-id": "123456789876",
      "location": "us",
      "id": "5d533a90-0000-2e89-89a0-94eb2c059a76"
    },
    "source-type": "teradata",
    "console-log": false,
    "silent": false,
    "teradata-config": {
      "connection": {
       "host": "localhost"
      },
      "local-processing-space": "extracted",
      "database-credentials-file-path": "",
      "max-local-storage": "200GB",
      "use-tpt": false,
      "max-sessions": 0,
      "max-parallel-upload": 1,
      "max-unload-file-size": "2GB"
     }
   }
   

All options for the migration agent configuration file

  • transfer-configuration: Information about this transfer configuration in BigQuery Data Transfer Service.
  • teradata-config: Information specific for this Teradata extraction:

    • connection: Information about the hostname and port
    • local-processing-space: The extraction folder where the agent will extract table data to, before uploading it to Cloud Storage.
    • database-credentials-file-path: (Optional) The path to a file that contains credentials for connecting to the Teradata database automatically. The file should contain two lines, for example:
      username=abc
      password=123
      
      When using a credentials file, take care to control access to the folder where you store it on the local file system, because it will not be encrypted. If no path is provided, you will be prompted for a username and password when you start an agent.
    • max-local-storage: The maximum amount of local storage to use for the extraction in the specified staging directory. The default value is 200GB. The supported format is: numberKB|MB|GB|TB.

      In all extraction modes, files are deleted from your local staging directory after they are uploaded to Cloud Storage.

      The actual amount of staging space required depends upon the method of extraction:

      • In the default extraction method (JDBC driver with FastExport), small chunks of data are written and continually uploaded to your specified Cloud Storage bucket. Extraction pauses when your specified max_local_storage limit is reached.
      • In extraction with Teradata Parallel Transporter (TPT) without a partitioning column, your whole table is extracted, regardless of the max_local_storage setting.
      • In extraction with Teradata Parallel Transporter (TPT) with a partitioning column, the agent extracts sets of partitions. The staging storage requirements are up to the larger of max_local_storage or the size of your table's largest partition in extracted to CSV format.
    • use-tpt: Directs the migration agent to use Teradata Parallel Transporter (TPT) as an extraction method.

      For each table, the migration agent generates a TPT script, starts a tbuild process and waits for completion. Once the tbuild process completes, the agent lists and uploads the extracted files to Cloud Storage, and then deletes the TPT script.

      To use the TPT extraction method:

      • The tbuild utility should be installed and available for the migration agent to use and start the tbuild process.
      • The local extraction folder should have enough space for extracting the largest table's partition in CSV format. Due to formatting, a CSV file will be larger than the size of the original table in Teradata.
    • max-sessions: Specifies the maximum number of sessions used by the export job (either FastExport or TPT). If set to 0, then the Teradata database will determine the maximum number of sessions for each export job.

    • max-parallel-uploads: Determines number of files uploaded to Cloud Storage in parallel. Depending on your network bandwidth and other settings (such as DLP scanning), increasing this parameter could improve performance.

    • max-unload-file-size: Determines the maximum extracted file size. This parameter is not enforced for TPT extractions.

Running the migration agent

After initializing the migration agent and creating the configuration file, use the following steps to run the agent and start the migration:

  1. Start to run the agent by using the classpath to the JDBC drivers and path to the configuration file created in the previous initialization step.

  2. java -cp 
    OS-specific-separated-paths-to-jars (JDBC and agent)
    com.google.cloud.bigquery.dms.Agent
    --configuration-file=path to configuration file

    Unix, Linux, Mac OS

    java -cp \
    /usr/local/migration/Teradata/JDBC/tdgssconfig.jar:/usr/local/migration/Teradata/JDBC/terajdbc4.jar:mirroring-agent.jar \
    com.google.cloud.bigquery.dms.Agent \
    --configuration-file=config.json
    

    Windows

    Copy all the files into the C:\migration folder (or adjust the paths in the command), then run:

    java -cp C:\migration\tdgssconfig.jar;C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --configuration-file=config.json
    

    If you are ready to proceed with the migration, press Enter and the agent will proceed if the classpath provided during initialization is valid.

  3. When prompted, enter the username and password for the database connection. If the username and password are valid, the data migration starts.

    Optional In the command to start the migration, you can also use a flag that passes a credentials file to the agent, instead of entering the username and password each time. See the optional parameter database-credentials-file-path in the agent configuration file for more information. When using a credentials file, take appropriate steps to control access to the folder where you store it on the local file system, because it will not be encrypted.

  4. Leave this session open until the migration is completed. If you created a recurring migration transfer, keep this session open indefinitely. If this session is interrupted, current and future transfer runs will fail.

  5. Periodically monitor if the agent is running. If a transfer run is in progress and no agent responds within 24 hours, the transfer run will fail.

  6. If the migration agent dies while the transfer is in progress or scheduled, the BigQuery Data Transfer Service web UI shows the error status and prompts you to restart the agent. To start the migration agent again, resume from the beginning of this section, running the migration agent, with the command for running the migration agent. You do not need to repeat the initialization command. The transfer will resume from the point where tables were not completed.

Tracking progress of the migration

You can view the status of the migration in the BigQuery Data Transfer Service web UI. You can also set up Cloud Pub/Sub or email notifications. See BigQuery Data Transfer Service notifications.

The BigQuery Data Transfer Service schedules and initiates a transfer run on a schedule specified upon the creation of transfer configuration. It is important that the migration agent is running when a transfer run is active. If there are no updates from the agent side within 24 hours, a transfer run will fail.

Example of migration status in the BigQuery Data Transfer Service web UI:

Migration status

Upgrading the migration agent

If a new version of the migration agent is available, you will need to manually update the migration agent. To receive notices about the BigQuery Data Transfer Service, subscribe to the release notes.

What's next

Σας βοήθησε αυτή η σελίδα; Πείτε μας τη γνώμη σας:

Αποστολή σχολίων σχετικά με…

Αυτή η σελίδα
BigQuery Data Transfer Service