Continuous data replication to BigQuery using Striim

Stay organized with collections Save and categorize content based on your preferences.
By: Edward Bell, Solutions Architect, Striim, Inc.

This tutorial demonstrates how to migrate a MySQL database to BigQuery using Striim. Striim is a comprehensive streaming extract, transform, and load (ETL) platform that enables online database migrations and continuous streaming replication from on-premises and cloud data sources to Google Cloud data services.

This tutorial focuses on the implementation of a continuous replication from Cloud SQL for MySQL to BigQuery. It is intended for database administrators, IT professionals, and data architects interested in taking advantage of BigQuery capabilities.

Objectives

  • Use Google Cloud Marketplace to deploy Striim.
  • Use Striim to read from a source Cloud SQL for MySQL database.
  • Use Striim to continuously replicate from Cloud SQL for MySQL to BigQuery.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

This tutorial also uses Striim, which includes a trial period through the Cloud Marketplace.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  3. Enable the Compute Engine and BigQuery APIs.

    Enable the APIs

  4. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  5. Set the default compute zone to us-central1-a:

    gcloud config set compute/zone us-central1-a
    export COMPUTE_ZONE=us-central1-a
    

    This zone is where you deploy your database and compute resources to. For more information about zones, see Geography and regions.

Creating a Cloud SQL for MySQL instance

You create a Cloud SQL for MySQL instance that you later connect to Striim. In this case, the instance acts as the source transactional system that you later replicate. In a real-world scenario, the source database can be one of many transactional database systems.

  1. In Cloud Shell, create the environment variables to create the instance:

    CSQL_NAME=striim-sql-src
    CSQL_USERNAME=striim-user
    CSQL_USER_PWD=$(openssl rand -base64 18)
    CSQL_ROOT_PWD=$(openssl rand -base64 18)
    

    If you close the Cloud Shell session, you lose the variables.

  2. Make a note of the CSQL_USER_PWD and CSQL_ROOT_PWD passwords generated by the following commands:

    echo $CSQL_USER_PWD and echo $CSQL_ROOT_PWD
    
  3. Create the Cloud SQL for MySQL instance:

    gcloud sql instances create $CSQL_NAME \
        --root-password=$CSQL_ROOT_PWD --zone=$COMPUTE_ZONE \
        --tier=db-n1-standard-2 --enable-bin-log
    
  4. Create a Cloud SQL for MySQL user that Striim can connect to:

    gcloud sql users create $CSQL_USERNAME --instance $CSQL_NAME \
        --password $CSQL_USER_PWD --host=%
    

    The Cloud SQL for MySQL database is set up for Striim to read.

  5. Find the IP address of the Cloud SQL for MySQL instance and make a note of it:

    gcloud sql instances describe $CSQL_NAME --format='get(ipAddresses.ipAddress)'
    

Setting up Striim

To set up an instance of the Striim server software, you use the Cloud Marketplace.

  1. In the Google Cloud console, go to the Striim page in the Cloud Marketplace.

    Go to Striim in the Cloud Marketplace

  2. Click Launch.

  3. In the New Striim Deployment window, complete the following fields:

    • Select the project that you created or selected to use for this tutorial.
    • In the Zone drop-down menu, select us-central1-a.
    • If you accept the terms for service, select the I accept the Google Cloud Marketplace Terms of Service checkbox. Terms of Service** checkbox.

      Cloud Marketplace solutions typically come with various resources that launch to support the software. Review the monthly billing estimate before launching the solution.

    • Leave all other settings at their default values.

  4. Click Deploy.

  5. In the Google Cloud console, go to the Deployments page.

    Go to Deployments

  6. To review the deployment details of the Striim instance, click the name of the Striim instance. Make a note of the name of the deployment and the name of the VM that has deployed.

  7. To allow Striim to communicate with Cloud SQL for MySQL, add the Striim server's IP address to the Cloud SQL for MySQL instance's authorized networks:

    STRIIMVM_NAME=STRIIM_VM_NAME
    STRIIMVM_ZONE=us-central1-a
    gcloud sql instances patch $CSQL_NAME \
        --authorized-networks=$(gcloud compute instances describe $STRIIM_VM_NAME \
        --format='get(networkInterfaces[0].accessConfigs[0].natIP)' \
        --zone=$STRIIMVM_ZONE)
    

    Replace the following:

    • STRIIM_VM_NAME: the name of the VM that you deployed with Striim.
  8. In the Google Cloud console, on the deployment instance details page, click Visit the site to open the Striim web UI.

  9. In the Striim configuration wizard, configure the following:

    • Review the end-user license agreement. If you accept the terms, click Accept Striim EULA and Continue.
    • Enter your contact information.
    • Enter the Cluster Name, Admin, Sys, and Striim Key passwords of your choice. Make a note of these passwords. Click Save and Continue.
    • Leave the key field blank to enable the trial, and then click Save and Continue.
  10. Click Launch. It takes about a minute for Striim to be configured. When done, click Log In.

  11. To log in to the Striim administrator console, log in with the admin user and the administrator password that you previously set. Keep this window open because you return to it in a later step.

Setting up Connector/J

Use MySQL Connector/J to connect Striim to your Cloud SQL for MySQL instance. As of this writing, 5.1.49 is the latest version of Connector/J.

  1. In the Google Cloud console, go to the Deployments page.

    Go to Deployments

  2. For the Striim instance, click SSH to automatically connect to the instance.

  3. Download the Connector/J to the instance and extract it:

    wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.49.tar.gz
    tar -xvzf mysql-connector-java-5.1.49.tar.gz
    
  4. Copy the file to the Striim library path, allow it to be executable, and change ownership of the file that you downloaded:

    sudo cp ~/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49.jar /opt/striim/lib
    sudo chmod +x /opt/striim/lib/mysql-connector-java-5.1.49.jar
    sudo chown striim /opt/striim/lib/mysql-connector-java-5.1.49.jar
    
  5. To recognize the new library, restart the Striim server:

    sudo systemctl stop striim-node
    sudo systemctl stop striim-dbms
    sudo systemctl start striim-dbms
    sudo systemctl start striim-node
    
  6. Go back to the browser window with the administration console in it. Reload the page, and then log in using the admin user credentials.

    It can take a couple minutes for the server to complete its restart from the previous step, so you might get a browser error during that time. If you encounter an error, reload the page and log in again.

Loading sample transactions to Cloud SQL

Before you can configure your first Striim app, load transactions into the MySQL instance.

  1. In Cloud Shell, connect to the instance using the Cloud SQL for MySQL instance credentials that you previously set:

    gcloud sql connect $CSQL_NAME --user=$CSQL_USERNAME
    
  2. Create a sample database and load some transactions into it:

    CREATE DATABASE striimdemo;
    USE striimdemo;
    CREATE TABLE ORDERS (ORDER_ID Integer, ORDER_DATE VARCHAR(50), ORDER_MODE VARCHAR(8), CUSTOMER_ID Integer, ORDER_STATUS Integer, ORDER_TOTAL Float, SALES_REP_ID Integer, PROMOTION_ID Integer, PRIMARY KEY (ORDER_ID));
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1001, 1568927976017, 'In-Store', 1001, 9, 34672.59, 331, 9404);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1002, 1568928036017, 'In-Store', 1002, 1, 28133.14, 619, 2689);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1003, 1568928096017, 'CompanyB', 1003, 1, 37367.95, 160, 30888);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1004, 1568928156017, 'CompanyA', 1004, 1, 7737.02, 362, 89488);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1005, 1568928216017, 'CompanyA', 1005, 9, 15959.91, 497, 78454);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1006, 1568928276017, 'In-Store', 1006, 1, 82531.55, 399, 22488);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1007, 1568928336017, 'CompanyA', 1007, 7, 52929.61, 420, 66256);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1008, 1568928396017, 'Online', 1008, 1, 26912.56, 832, 7262);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1009, 1568928456017, 'CompanyA', 1009, 1, 97706.08, 124, 12185);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1010, 1568928516017, 'CompanyB', 1010, 1, 47539.16, 105, 17868);
    
  3. To check the upload, count the records to ensure that 10 records were inserted:

    SELECT COUNT(*) FROM ORDERS;
    
  4. Leave the Cloud SQL for MySQL instance:

    Exit
    

Creating a BigQuery target dataset

In this section, you create a BigQuery dataset, and load service account credentials so that Striim can write to the target database from the Google Cloud console.

  1. In Cloud Shell, create a BigQuery dataset:

    bq --location=US mk -d \
    --description "Test Target for Striim." striimdemo
    

    For this tutorial, you deploy BigQuery in the US.

  2. Create a new target table:

    bq mk \
        --table \
        --description "Striim Table" \
        --label organization:striimlab striimdemo.orders order_id:INTEGER,order_date:STRING,order_mode:STRING,customer_id:INTEGER,order_status:INTEGER,order_total:FLOAT,sales_rep_id:INTEGER,promotion_id:INTEGER
    
  3. Create a service account for Striim to connect to BigQuery:

    gcloud iam service-accounts create striim-bq \
        --display-name striim-bq
    export sa_striim_bq=$(gcloud iam service-accounts list \
        --filter="displayName:striim-bq" --format='value(email)')
    export PROJECT=$(gcloud info \
        --format='value(config.project)')
    gcloud projects add-iam-policy-binding $PROJECT \
        --role roles/bigquery.dataEditor \
        --member serviceAccount:$sa_striim_bq
    gcloud projects add-iam-policy-binding $PROJECT \
        --role roles/bigquery.user --member serviceAccount:$sa_striim_bq
    gcloud iam service-accounts keys create ~/striim-bq-key.json \
        --iam-account $sa_striim_bq
    

    A key called striim-bq-key.json is created in your home path.

  4. Move the newly generated key to the server:

    gcloud compute scp ~/striim-bq-key.json $STRIIM_VM_NAME:~ \
        --zone=$COMPUTE_ZONE
    
  5. Move the key to the /opt/striim directory:

    gcloud compute ssh \
        --zone=$COMPUTE_ZONE $STRIIM_VM_NAME \
        -- 'sudo cp ~/striim-bq-key.json /opt/striim && sudo chown striim /opt/striim/striim-bq-key.json'
    

    You are now ready to create a Striim app.

Creating an online database migration

An online database migration moves data from a source database (either on-premises or hosted on a cloud provider) to a target database or data warehouse in Google Cloud. The source database remains fully accessible by the business app and with minimal performance impact on the source database during this time.

In an online migration, you perform an initial bulkload, and also continuously capture any changes. You then synchronize the two databases to ensure that data isn't lost. Typically both databases are retained for long periods of time to test and verify that the app and users aren't impacted by switching to a new cloud database.

If you want to focus on creating a change data capture (CDC) pipeline, see the Creating a continuous Cloud SQL for MySQL to BigQuery data pipeline section.

Create the source connection

  1. In the Google Cloud console, on the instance details page, click Visit the site to open the Striim web UI.
  2. In the Striim web UI, click Apps.

    Apps tile in Striim web UI.

  3. Click Add App.

  4. Click Start from Scratch.

  5. In the Name field, enter MySQLToBigQuery_initLoad.

  6. In the Namespace drop-down menu, select the default Admin namespace. This label is used to organize your apps.

  7. Click Save.

  8. On the Flow Designer page, to do a one-time initial bulkload of data, from the Sources pane, drag Database to the flow design palette in the center of the screen and enter the following connection properties:

    • In the Name field, enter mysql_source.
    • Leave the Adapter field at the default value of DatabaseReader.
    • In the Connection URL field, enter jdbc:mysql://PRIMARY_ADDRESS:3306/striimdemo. Replace PRIMARY_ADDRESS with the IP address of the Cloud SQL instance that you created in the previous section.
    • In the Username field, enter the username that you set as the CSQL_USER environment variable, striim-user.
    • In the Password field, enter the CSQL_USER_PWD value that you made a note of when you created a Cloud SQL for MySQL instance.
    • To see more configuration properties, click Show optional properties.
    • In the Tables field, enter striimdemo.ORDERS.
    • For Output to, select New output.
    • In the New output field, enter stream_CloudSQLMySQLInitLoad.
    • Click Save.

    Flow Designer page showing the Sources pane.

  9. To test the configuration settings to make sure that Striim can successfully connect to Cloud SQL for MySQL, click Created, and then select Deploy App.

  10. In the Deployment window, you can specify that you want to run parts of your app on some of your deployment topology. For this tutorial, select Default, and click Deploy.

  11. To preview your data as it flows through the Striim pipeline, click mysql_source DataBase reader, and then click Preview on run.

    Preview button to preview the data flow.

  12. Click Deployed, and then click Start App.

    The Striim app starts running, and data flows through the pipeline. If there are any errors, there is an issue connecting to the source database because there is only a source component in the pipeline. If you see your app successfully run, but no data flows through, typically that means that you don't have any data in your database.

    Visualization of data flowing through the pipeline.

  13. After you've successfully connected to your source database and tested that it can read data, click Running, and then select Stop App.

  14. Click Stopped, and then select Undeploy App. You are now ready to connect this flow to BigQuery.

Perform an initial load into BigQuery

  1. In the Striim web UI, click mysql_source Database reader.

  2. Click Connect to next component, select Connect next Target component, and then complete the following fields:

    • In the Name field, enter bq_target.
    • In the Adapter field, enter BigQueryWriter.
    • The Tables property is a source/target pair separated by commas. It is in the format of srcSchema1.srcTable1,tgtSchema1.tgtTable1;srcSchema2.srcTable2,tgtSchema2.tgtTable2. For this tutorial, enter striimdemo.ORDERS,striimdemo.orders.
    • The Service Account Key requires a fully qualified path and name of the key file that was previously generated. For this tutorial, enter /opt/striim/striim-bq-key.json.
    • In the Project ID field, enter your Google Cloud project ID.

    Component page showing the Target component pane.

  3. Click Save.

  4. To deploy the app and preview the data flow, do the following:

    • Click Created, and then select Deploy App.
    • In the Deployment window, select Default, and then click Deploy.
    • To preview your data as it flows through the Striim pipeline, click mysql_source Database reader, and then click Preview on run.
    • Click Deployed, and then click Start App.
  5. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  6. Click the striimdemo database.

  7. In the query editor, enter SELECT COUNT(*) AS ORDERS, AVG(ORDER_TOTAL) AS ORDERS_AVE, SUM(ORDER_TOTAL) AS ORDERS_SUM FROM striimdemo.orders; and then click Run. It can take up to 90 seconds for the transactions to fully replicate to BigQuery due to the default configuration settings. After it's successfully replicated, the results table outputs the average order of 43148.952 and the total size of the orders, 431489.52.

    Query editor displaying results of replicated data.

    You have successfully set up your Striim environment and pipeline to perform a batch load.

Creating a continuous data pipeline from Cloud SQL for MySQL to BigQuery

With an initial one-time bulkload in place, you can now set up a continuous replication pipeline. This pipeline is similar to the bulk pipeline that you created, but with a different source object.

Create a CDC source

  1. In the Striim web UI, click Home.
  2. Click Apps.

    Navigation to Apps button.

  3. Click Start from Scratch.

  4. In the Name field, enter MySQLToBigQuery_cdc.

  5. In the Namespace drop-down menu, select Admin namespace.

  6. On the Flow Designer page, drag a MySQL CDC source reader to the center of the design palette.

    Flow Designer page showing the MySQL CDC source.

  7. Configure your new MySQL CDC source with the following information:

    • In the Name field, enter mysql_cdc_source.
    • Leave the Adapter field at the default value of MysqlReader.
    • In the Connection URL field, enter jdbc:mysql://PRIMARY_ADDRESS:3306/striimdemo.
    • Enter the username and password that you used in the previous section.
    • To see more configuration properties, click Show optional properties.
    • In the Tables field, enter striimdemo.ORDERS.
    • For Output to, select New output.
    • In the New output field, enter stream_CloudSQLMySQLCDCLoad.
    • Click Save.

Load new transactions into BigQuery

  1. In the Striim web UI, click MysqlReader.
  2. Click Connect to next component, and then select Connect next Target component.

    • In the Name field, enter bq_cdc_target.
    • In the Adapter field, enter BigQueryWriter.
    • The Tables property is a source/target pair separated by commas. It is in the format of srcSchema1.srcTable1,tgtSchema1.tgtTable1;srcSchema2.srcTable2,tgtSchema2.tgtTable2. For this tutorial, use striimdemo.ORDERS,striimdemo.orders.

    • The Service Account Key requires is a fully qualified path and name of the key file that was previously generated. For this tutorial, enter /opt/striim/striim-bq-key.json

    • In the Project ID field, enter your Cloud project ID.

  3. Click Save.

  4. To deploy the app and preview the data flow, do the following:

    • Click Created, and then select Deploy App.
    • In the Deployment window, select Default, and then click Deploy.
    • To preview your data as it flows through the Striim pipeline, click MysqlReader, and then click Preview on Run.
    • Click Deployed, and then click Start App.
  5. In Cloud Shell, connect to your Cloud SQL for MySQL instance:

    gcloud sql connect $CSQL_NAME --user=$CSQL_USERNAME
    
  6. Connect to your database and load new transactions into it:

    USE striimdemo;
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1011, 1568928576017, 'In-Store', 1011, 9, 13879.56, 320, 88252);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1012, 1568928636017, 'CompanyA', 1012, 1, 19729.99, 76, 95203);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1013, 1568928696017, 'In-Store', 1013, 5, 7286.68, 164, 45162);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1014, 1568928756017, 'Online', 1014, 1, 87268.61, 909, 70407);
    INSERT INTO ORDERS (ORDER_ID, ORDER_DATE, ORDER_MODE, CUSTOMER_ID, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID) VALUES (1015, 1568928816017, 'CompanyB', 1015, 1, 69744.13, 424, 79401);
    
  7. In the Striim web UI, on the Transactions view page, transactions now populate the page and show that data is flowing.

    Transactions view page showing data flowing through the pipeline.

  8. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  9. Click the striimdemo database.

  10. To verify that your data is successfully replicated, in the Query Editor enter SELECT COUNT(*) AS ORDERS, AVG(ORDER_TOTAL) AS ORDERS_AVE, SUM(ORDER_TOTAL) AS ORDERS_SUM FROM striimdemo.orders; and then click Run. The results table outputs the average order of 43148.952 and the total size of the orders, 431489.52.

    It can take up to 90 seconds for the transactions to fully replicate to BigQuery due to the default configuration settings.

    Data tab showing transactions replicated to target.

Congratulations, you have successfully set up a streaming replication pipeline from Cloud SQL for MySQL to BigQuery.

Clean up

The easiest way to eliminate billing is to delete the Cloud project you created for the tutorial. Alternatively, you can delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next