Change schemas in a SQL Server replication job

This page describes how to change schemas in Cloud Data Fusion when you replicate data from a Microsoft SQL Server database to a BigQuery table. To ensure the replication job continues to replicate data from a SQL Server source after the schema changes, perform these steps.

Change the schema

Go to your instance:
1. In the Google Cloud console, go to the Cloud Data Fusion page.
2. To open the instance in the Cloud Data Fusion Studio, click Instances, and then click View instance.
  
  Go to Instances
In the Cloud Data Fusion web interface, click Replication.
Stop the replication job.
Apply a supported schema change to the source table, such as Add nullable column.
Create a new capture table using sys.sp_cdc_enable_table procedure with a unique value for parameter @capture_instance.
Apply the same schema change on the target table in BigQuery.
Restart the replication job.
Optional: When the job starts streaming new changes that occur after you stop the job, you can drop the old capture table using the sys.sp_cdc_disable_table stored procedure. Set the parameter @capture_instance to the old capture instance name.

After you apply a schema change to the source table and before you create a new capture table (as you did in the previous steps), all changes are captured by the old capture with the old schema. If you added a column, any change event during that time doesn't contain data for the new column. If your application cannot handle such a transition, follow these steps:

Suspend all applications generating database records.
Ensure that the replication job has processed all existing events.
Stop the replication job.
Apply the schema change to the source table, such as Add nullable column.
Create a new capture table using the sys.sp_cdc_enable_table procedure with a unique value for parameter @capture_instance.
Resume the applications you suspended in the first step.
Apply the same schema change on the target table in BigQuery.
Restart the replication job.
Optional: When the job starts streaming new changes that occur after you created the new capture table in the previous step, you can drop the old capture table using the sys.sp_cdc_disable_table stored procedure. Set the parameter @capture_instance to the old capture instance name.

What's next

Learn more about Replication.
Refer to the supported schema changes.
Refer to the Replication API.