Migrating data from Teradata

The combination of the BigQuery Data Transfer Service and a special migration agent allows you to copy your data from the Teradata on-premises data warehouse system to BigQuery. This document describes the process of migrating data from Teradata using BigQuery Data Transfer Service.

Before you begin

To ensure a successful Teradata data warehouse migration, make sure you have met the following prerequisites.

Google Cloud Platform requirements

  1. Choose or create a Google Cloud Platform project to store your migration data:

  2. Enable these Google Cloud Platform APIs.

    Console

    In the Google Cloud Platform Console, click the ENABLE button on the both of the following pages.

    Enable API

    BigQuery is automatically enabled in new projects. For an existing project, you may need to enable the BigQuery API.

    A green checkmark indicates that you've already enabled the API.

    Enabled API

    CLI

    You can optionally use the gcloud command-line interface (CLI) to enable the APIs.

    You can issue gcloud commands in the Cloud Shell or you can download the CLI tool and install it on the local machine, as follows:

    Enter the following gcloud commands:

      gcloud services enable bigquerydatatransfer.googleapis.com
      gcloud services enable storage-component.googleapis.com
    

    BigQuery is automatically enabled in new projects. For an existing project, also enable the BigQuery API.

      gcloud services enable bigquery.googleapis.com
    
  3. Create a BigQuery dataset to store your data. You do not need to create any tables.

  4. Create a Cloud IAM service account. See Creating and managing service accounts for information on creating a service account.

  5. Grant the service account the following Cloud IAM roles. See Granting roles to a service account.

    • BigQuery: The predefined Cloud IAM role bigquery.admin.
    • Cloud Storage: The predefined Cloud IAM role storage.objectAdmin.
  6. Create a Cloud Storage bucket for staging the data. You use this bucket name later in the migration process.

  7. Allow pop-ups in your browser from bigquery.cloud.google.com so that you can view the permissions window when you set up the transfer. You must allow the BigQuery Data Transfer Service permission to manage your transfer.

On-premises requirements

  1. Local machine requirements
    • The migration agent uses a JDBC connection with the Teradata instance and Google Cloud Platform APIs. Ensure that network access is not blocked by a firewall.
    • Ensure that Java Runtime Environment 8 or above is installed.
    • Ensure you have the minimum recommended storage space as described in the staging space requirements.
  2. Teradata connection details
    • Username and password of a user with read access to the system tables and the tables that are being migrated.
    • Hostname and port number to connect to the Teradata instance.
  3. Download the required JDBC drivers from Teradata: tdgssconfig.jar and terajdbc4.jar.
  4. Download your Google Cloud Platform credentials.

Migration options

Because every migration has unique requirements, the migration service can be customized:

  • Using JDBC FastExport vs Teradata Parallel Transporter (TPT)
    • We recommend extracting with TPT and customizing your schema to indicate partition columns. This will result in the fastest data extraction.
    • If there are tight constraints on local storage space or TPT is not available, use default extraction.
    • For more details, see optional agent flags.
  • Schema customization
    • The Migration Service converts your schema automatically, but optionally you can further customize the schema. Schema customization is strongly recommended for some situations, as explained below.
  • One-time data movement
    • If your table is very large and you can extract with TPT for higher performance, we recommend schema customization with a partitioning scheme. This allows partition-by-partition extraction. For more details, see optional schema file.
    • If your tables are small or you cannot use TPT, follow the basic instructions. Schema customization is not required.
  • Incremental data movement
    • If you would like to incrementally migrate changes from Teradata to BigQuery, you can use incremental mode. On a regular basis, new and changed records from Teradata will be appended to BigQuery tables.
    • This method requires customizing your schema to annotate COMMIT_TIMESTAMP columns. For more details, see incremental transfers.

Setting up a Teradata migration

This section describes the step-by-step process of setting up a data migration from Teradata to BigQuery. The steps are:

  • Download the migration agent.
  • Set up a BigQuery Data Transfer Service transfer.
  • Initialize the migration agent.
  • Run the migration agent.

Downloading the migration agent

Use this link to download the migration agent to the local machine where the data warehouse is located.

After you've installed the migration agent, you will set up a BigQuery Data Transfer Service transfer, initialize the migration agent, and then run the agent to start the data migration.

Setting up a transfer

Create a transfer with the BigQuery Data Transfer Service.

Classic UI

  1. Go to the BigQuery web UI.

    Go to the BigQuery web UI

  2. Click Transfers.

  3. Click Add Transfer.

  4. On the New Transfer page:

    • For Source, choose Migration: Teradata.
    • For Display name, enter a name for the transfer such as My Migration. The display name can be any value that allows you to easily identify the transfer if you need to modify it later.
    • (Optional) For Schedule, you can leave the default value of every 24 hours (based on creation time), or click Edit to change the time.

      Query schedule

    • For Destination dataset, choose the appropriate dataset.

    • For Table name patterns enter a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.

    • For Database type, choose Teradata.

    • For Cloud Storage bucket, enter the name of the Cloud Storage bucket for staging the migration data. Do not include the prefix gs:// – only enter the bucket name.

    • For Service account email, enter the email attached to the IAM service account you created.

    • (Optional) For Schema file path, enter the path and file name of a JSON schema file. If no schema file is entered, BigQuery automatically detects the table schema by using the source data being transferred. You can create your own schema file, as shown in the screenshot example below, or you can use the migration agent to help you create a schema file. For information on creating a schema file, see the initializing the migration agent section.

      New Teradata migration

    • (Optional) Expand the Advanced section and configure run notifications for your transfer. Transfer run notifications are currently in alpha.

    • For Cloud Pub/Sub topic, enter your Cloud Pub/Sub topic name, for example, projects/myproject/topics/mytopic.

    • Check Send email notifications to allow email notifications of transfer run failures.

      Cloud Pub/Sub topic

  5. Click Add.

  6. When prompted, click Allow to give the BigQuery Data Transfer Service permission to manage your transfer. You must allow browser pop-ups from bigquery.cloud.google.com to view the permissions window.

    Allow transfer

  7. The web UI will display all the transfer setup details, including a Resource name for this transfer. Note the resource name. You need to enter it later when you run the migration agent.

    Transfer Confirmation

CLI

Enter the bq mk command and supply the transfer creation flag — --transfer_config. The following flags are also required:

  • --data_source
  • --display_name
  • --target_dataset
  • --params
bq mk \
--transfer_config \
--project_id=project_id \
--target_dataset=dataset \
--display_name=name \
--params='parameters' \
--data_source=data_source

Where:

  • project_id is your project ID. If --project_id isn't supplied to specify a particular project, the default project is used.
  • dataset is the target dataset for the transfer configuration.
  • name is the display name for the transfer configuration. The transfer name can be any value that allows you to easily identify the transfer if you need to modify it later.
  • parameters contains the parameters for the created transfer configuration in JSON format. For example: --params='{"param":"param_value"}'.
    • For Teradata migrations, these parameters are required: bucket, database_type, table_name_patterns, agent_service_account.
      • bucket is the Cloud Storage bucket that will act as a staging area during the migration.
      • database_type is Teradata.
      • agent_service_account is the email address associated with the service account that you created.
      • table_name_patterns is a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax.
  • data_source is the data source — on_premises.

For example, the following command creates a Teradata transfer named My Transfer using Cloud Storage bucket mybucket and target dataset mydataset. The transfer will migrate all tables from the Teradata data warehouse mydatabase and the optional schema file is myschemafile.json. The transfer is created in the default project:

bq mk \
--transfer_config \
--target_dataset=MyDataset \
--display_name='My Migration' \
--params='{"bucket": "mybucket", "database_type": "Teradata",
"table_name_patterns": "mydatabase..*", "agent_service_account":
"myemail@mydomain.com", "schema_file_path":
"gs://mybucket/myschemafile.json"}' \
--data_source=on_premises

After running the command, you receive a message like the following:

[URL omitted] Please copy and paste the above URL into your web browser and follow the instructions to retrieve an authentication code.

Follow the instructions and paste the authentication code on the command line.

API

Use the projects.locations.transferConfigs.create method and supply an instance of the TransferConfig resource.

Migration Agent

You can optionally create the transfer directly from the migration agent, in the initializing the migration agent section.

To create the transfer from the migration agent, first you will need to grant the Cloud IAM role serviceAccessTokenCreator to the service account you created.

You can grant the Cloud IAM role in one of these two ways:

gcloud projects add-iam-policy-binding user_project_id \
--member='serviceAccount:service-user_project_number@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com' \
--role='roles/iam.serviceAccountTokenCreator'

Once you've granted the serviceAccessTokenCreator permission to the service account, you can move on to downloading the migration agent, and then set up the transfer as part of the initialization step.

Initializing the migration agent

When you are starting a data migration for the first time, initialize the migration agent. Initialization is required only once each time you set up a migration transfer, whether or not it is recurring.

This session will not start the migration; it is only for initial configuration.

  1. Open a new session. On the command line, you issue a command to run the jar file, with some particular flags, in this form:

    java -cp \
    OS-specific-separated-paths-to-jars (JDBC and agent) \
    com.google.cloud.bigquery.dms.Agent \
    --initialize
    

    Unix, Linux, Mac OS

    java -cp \
    /usr/local/migration/Teradata/JDBC/tdgssconfig.jar:/usr/local/migration/Teradata/JDBC/terajdbc4.jar:/usr/local/migration/mirroring-agent.jar \
    com.google.cloud.bigquery.dms.Agent \
    --initialize
    

    Windows

    Copy all the files into the C:\migration folder (or adjust the paths in the command), then run:

    java -cp C:\migration\tdgssconfig.jar;C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --initialize
    
  2. When prompted, enter following parameters:

    • The URI of the source database (the data warehouse you're migrating). Include the port number, if needed.
    • Path to a local scratch space for the migration. Ensure you have the minimum recommended storage space as described in staging space requirements.
  3. When prompted for a BigQuery Data Transfer Service Resource name:

    You can enter the Resource name of the transfer you configured in the BigQuery web UI, or you can create the transfer at this time, through the migration agent itself. Optionally, you can use the migration agent initialization command to create a schema file. See the Migration Agent tab below for this option.

    Classic UI

    Enter the Resource name of the transfer you previously set up in the section on setting up a transfer.

    Migration Agent

    • Enter the Google Cloud Platform Project ID
    • Enter table name pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax, like this: databaseName.tableRegex.
    • Enter the Teradata source database name to migrate
    • Optional: Enter the path to a local JSON schema file (recommended for recurring transfers). This schema file will be uploaded to your Cloud Storage bucket).
      • Choose to create new schema file. In this case, you will be prompted for a Teradata username and password, and the agent will produce a JSON file with converted schema. The file will be created in a local folder, following this pattern: <localpath>/schema/DO_NOT_REMOVE_td2bq_table_schemas_<string>.json. After uploading to your Cloud Storage bucket, the path and file name will follow this pattern: gs://mybucket/myproject_id/schema/DO_NOT_REMOVE_td2bq_table_schemas_<string>.json.
      • Modify the schema file to mark partitioning, clustering, primary keys and change tracking columns, and verify that you want to use this schema for the transfer configuration. See the optional schema file section for tips.
    • Enter the name of the destination dataset in BigQuery.
    • Enter the name of Cloud Storage bucket where migration data will be staged before loading to BigQuery.
    • Enter a name for the transfer configuration.
  4. After entering all the requested parameters, the migration agent creates a configuration file and puts it into the local path provided in the parameters.

    Example:

    
    {
     "connection": {
       "host": "myTeradata",
       "port": 1025
    },
     "cloud": {
       "projectId": "my-project",
       "location": "us",
       "transferConfigurationId": "5b1198ca-0000-2267-9130-001a114f590c"
     },
     "localProcessingSpace": "/usr/local/home/username/bigquery_migration"
    }
    

Running the migration agent

After initializing the migration agent, use the steps below to run the agent and start the migration.

  1. Start to run the agent, with the classpath to the JDBC drivers and path to the configuration file created in the previous initialization step.

    java -cp 
    OS-specific-separated-paths-to-jars (JDBC and agent)
    com.google.cloud.bigquery.dms.Agent
    --configuration_file=path to configuration file

    Unix, Linux, Mac OS

    java -cp \
    /usr/local/migration/Teradata/JDBC/tdgssconfig.jar:/usr/local/migration/Teradata/JDBC/terajdbc4.jar:mirroring-agent.jar \
    com.google.cloud.bigquery.dms.Agent \
    --configuration_file=config.json
    

    Windows

    Copy all the files into the C:\migration folder (or adjust the paths in the command), then run:

    java -cp C:\migration\tdgssconfig.jar;C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --configuration_file=config.json
    

    If you are ready to proceed with the migration, press Enter and the agent will proceed if the classpath provided during initialization is valid.

  2. When prompted, enter the username and password for the database connection. If the username and password are valid, the data migration starts.

    Optional You can also use a flag in the command to start the migration that passes a credentials file to the agent, instead of entering the username and password each time. See optional agent flags for more information.

  3. Leave this session open until the migration is completed. If you created a recurring migration transfer, keep this session open indefinitely. If this session is interrupted, current and future transfer runs will fail.

  4. Periodically monitor if the agent is running. If a transfer run is in progress and no agent responds within 24 hours, the transfer run will fail.

  5. If the migration agent dies while the transfer is in progress or scheduled, the BigQuery Data Transfer Service web UI will show the error status, and will prompt you to restart the agent. To start the migration agent again, resume from the beginning of this section, running the migration agent, with the command for running the migration agent. You do not need to repeat the initialization command. The transfer will resume from the point where tables were not completed.

Options

Optional schema file

About schema files

A schema file is a JSON file that describes database objects. The schema includes a set of databases, each containing a set of tables, each of which contains a set of columns. Each column has a type field — a type that is assigned to a column in BigQuery.

Each object has a name field — a name that will be assigned to it in BigQuery. Each object also has an originalName field — the name of the matching object in the Teradata database.

Making transformations with a schema file

As part of the migration, you can use the schema file to change the name fields of any object, and can add the usageType array to any column. This is especially useful for including additional information about a table, like partitioning, that would otherwise be lost in the migration, if no schema file were specified.

The following usage types are supported:

  • PARTITIONING: only one column per table can be annotated with this usageType. The type field of a column should be DATE or TIMESTAMP. This column will be used for partitioned table definition for containing tables object.
  • CLUSTERING: several columns in one table can be annotated with this usageType. Column types should follow the constraints for clustering in BigQuery. If a PARTITIONING field is specified for the same table, BigQuery will use these columns to create a clustered table. Only columns with the types INT64, STRING, DATE, TIMESTAMP, BOOL or NUMERIC can be marked with this usageType.
  • COMMIT_TIMESTAMP: only one column per table can be annotated with this usageType. Use this usageType to annotate an update timestamp column. This column will be used to extract rows created/updated since the last transfer run. It should have a TIMESTAMP or DATE type.

Example of a modified schema file:


{
  "databases": [
    {
      "name": "db",
      "originalName": "db",
      "tables": [
        {
          "name": "test",
          "originalName": "test",
          "columns": [
            {
              "name": "foo",
              "originalName": "foo",
              "type": "INT64",
              "usageType": ["CLUSTERING"]
            },
            {
              "name": "bar",
              "originalName": "bar",
              "type": "DATE",
              "usageType": ["PARTITIONING"]
            },
            {
              "name": "change",
              "originalName": "change",
              "type": "TIMESTAMP",
              "usageType": ["COMMIT_TIMESTAMP"]
            }
          ]
        }
      ]
    }
  ]
}

For more information on exactly how tables are partitioned or clustered in a transfer, see the incremental transfers section.

Optional agent flags

  • --database_credentials_file=path to credentials file: You can create a credentials properties file to pass Teradata credentials to the migration agent automatically. The file should contain two lines, for the username/password — dbc/dbc:

    username=dbc
    password=dbc
    
  • --tpt: This flag will direct the migration agent to use Teradata Parallel Transporter as an extraction method. For each table, the migration agent will generate a TPT script, start a tbuild process and wait for completion. Once the tbuild process completes, the agent will list and upload the extracted files to Cloud Storage and delete the TPT script.

    To use this extraction method:

    • The tbuild utility should be installed and available for the migration agent to use.
    • The local extraction folder should have enough space for extracting the largest table in CSV format. Due to formatting, a CSV file will be larger than the size of the original table in Teradata.
  • --max_local_storage: This flag controls the maximum amount of local storage space to use in the specified staging directory. The default value is 200GB. The supported format is: numberKB|MB|GB|TB.

    In all extraction modes, files are deleted from your staging directory after they're uploaded to Cloud Storage.

    The actual amount of staging space required depends upon the type of extraction:

    • In the default extraction, small chunks of data will be written and continually uploaded to your specified Cloud Storage bucket. Extraction will pause when your specified limit is reached.
    • In extraction with Teradata Parallel Transporter without a partitioning column, your whole table is extracted, regardless of the max_local_storage setting.
    • In extraction with Teradata Parallel Transporter with a partitioning column, the agent will extract sets of partitions. The staging storage requirements are up to the larger of max_local_storage or twice the size of your largest partition.

Tracking progress of the migration

You can view the status of the migration in the BigQuery Data Transfer Service web UI. You can also set up Cloud Pub/Sub or email notifications. See BigQuery Data Transfer Service notifications.

The BigQuery Data Transfer Service will schedule and initiate a transfer run on a schedule specified upon the creation of transfer configuration. It is important that the migration agent is running when a transfer run is active. If there are no updates from the agent side within 24 hours, a transfer run will fail.

Example of migration status in the BigQuery Data Transfer Service web UI:

Migration status

Incremental transfers

For each transfer run, a timestamp of the transfer run will be saved. For each subsequent transfer run, an agent will receive timestamp of a previous transfer run (T1) and a timestamp when current transfer run started (T2).

For each transfer after initial run, the migration agent will extract data using the following per-table logic:

  • If a table object in a schema file does not have a column with a usageType of COMMIT_TIMESTAMP, then the table will be skipped.
  • If a table has a column with the usageType of COMMIT_TIMESTAMP, then all rows with timestamp between T1 and T2 will be extracted and appended to the existing table in BigQuery.

Upgrading the migration agent

If, during beta, a new version of the migration agent is available, you will need to manually update the migration agent. To receive notices about the BigQuery Data Transfer Service, subscribe to BigQuery release notes here.

What's next

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...

Trenger du hjelp? Gå til brukerstøttesiden vår.