Isolation levels in replication

This page describes how to enable transaction isolation in Microsoft SQL Server and MySQL database snapshots in Cloud Data Fusion replication jobs.

When you set up a replication job for a database, the job takes an initial snapshot of the source tables. To ensure data consistency, place locks on those tables.

After the initial snapshot, incremental changes at the source are captured and applied to the BigQuery target as part of the ongoing replication process.

SQL Server

To capture changes in the source tables in a SQL Server database, the replication job uses a Debezium connector. During the snapshotting phase, Debezium acquires locks according to the configured snapshot.isolation.mode.

The following table compares the supported isolation modes for replication jobs.

Isolation mode Acquired locks Data consistency
read_uncommitted None No.
read_committed Shared locks on a batch of rows at a time Partial. An added record can appear twice: once in the initial snapshot and once in the streaming phase.
repeatable_read
(default)
Shared locks on all rows Partial. An added record can appear twice: once in the initial snapshot and once in the streaming phase.
snapshot None Full.
exclusive Exclusive lock on all tables Full.

For more information about isolation modes, see Set transaction isolation level.

By default, the snapshot isolation mode is repeatable_read. This mode takes shared locks on all data that's read during the snapshotting phase. It prevents other transactions from modifying the existing rows, and can potentially allow insertion of new records (see lock escalation).

Replication with snapshot isolation is recommended if it's already enabled on the source database because it provides full data consistency without locking the tables. If it's not enabled, learn more about the impact of row versioning-based isolation levels in the SQL Server Database Engine before you enable it.

As an alternative, use the read_committed isolation mode, which doesn't lock the tables during the snapshotting phase.

Enable snapshot isolation in a replication job

  1. Enable snapshot isolation in the SQL Server database:

    ALTER DATABASE DATABASE_NAME
    SET ALLOW_SNAPSHOT_ISOLATION ON
    

    Replace DATABASE_NAME with the name of the SQL Server database.

  2. Set the runtime argument snapshot.isolation.mode to snapshot. For more information, see Pass a runtime argument to a replication job.

MySQL

To capture changes in the source tables in a MySQL database, the replication job uses a Debezium connector. During the snapshotting phase, Debezium acquires locks according to the configured snapshot.locking.mode.

By default, the snapshot locking mode is minimal. In this mode, the connector holds the global read lock for the initial portion of the snapshot as it reads the database schemas and other metadata. Then the connector fetches all rows through a consistent read, using the REPEATABLE READ transaction, which doesn't lock the tables.

To prevent any locks, set the mode to none.

As an alternative, to prevent locks on MySQL databases running on Cloud SQL, replicate from the Replica instead of the transactional database.

Change the locking behavior during the snapshot for MySQL

  • To change snapshot locking behavior in the MySQL database, set the runtime argument, snapshot.locking.mode property, to an appropriate locking mode value.

For more information, see Pass a Debezium argument to a replication job.

Limitations

  • Replication in Cloud Data Fusion supports Debezium Connector version 1.3.

Oracle sources in Cloud Data Fusion

Replication from Oracle sources in Cloud Data Fusion is powered by Datastream. Datastream doesn't lock tables.

What's next