This page describes how to enable transaction isolation in Microsoft SQL Server and MySQL database snapshots in Cloud Data Fusion replication jobs.
When you set up a replication job for a database, the job takes an initial snapshot of the source tables. To ensure data consistency, place locks on those tables.
After the initial snapshot, incremental changes at the source are captured and applied to the BigQuery target as part of the ongoing replication process.
SQL Server
To capture changes in the source tables in a SQL Server database, the
replication job uses a Debezium connector. During the
snapshotting
phase,
Debezium acquires locks according to the configured
snapshot.isolation.mode
.
The following table compares the supported isolation modes for replication jobs.
Isolation mode | Acquired locks | Data consistency |
---|---|---|
read_uncommitted |
None | No. |
read_committed |
Shared locks on a batch of rows at a time | Partial. An added record can appear twice: once in the initial snapshot and once in the streaming phase. |
repeatable_read (default) |
Shared locks on all rows | Partial. An added record can appear twice: once in the initial snapshot and once in the streaming phase. |
snapshot |
None | Full. |
exclusive |
Exclusive lock on all tables | Full. |
For more information about isolation modes, see Set transaction isolation level.
By default, the snapshot isolation mode is repeatable_read
. This mode takes
shared locks on all data that's read during the snapshotting phase. It
prevents other transactions from modifying the existing rows, and can
potentially allow insertion of new records (see
lock escalation).
Replication with snapshot isolation is recommended if it's already enabled on the source database because it provides full data consistency without locking the tables. If it's not enabled, learn more about the impact of row versioning-based isolation levels in the SQL Server Database Engine before you enable it.
As an alternative, use the read_committed
isolation mode, which
doesn't lock the tables during the snapshotting phase.
Enable snapshot isolation in a replication job
Enable snapshot isolation in the SQL Server database:
ALTER DATABASE DATABASE_NAME SET ALLOW_SNAPSHOT_ISOLATION ON
Replace
DATABASE_NAME
with the name of the SQL Server database.Set the runtime argument
snapshot.isolation.mode
tosnapshot
. For more information, see Pass a runtime argument to a replication job.
MySQL
To capture changes in the source tables in a MySQL database, the
replication job uses a Debezium connector. During the
snapshotting
phase,
Debezium acquires locks according to the configured
snapshot.locking.mode
.
By default, the snapshot locking mode is minimal
. In this mode, the
connector holds the global read lock for the initial portion of the snapshot
as it reads the database schemas and other metadata. Then the connector
fetches all rows through a consistent read, using the REPEATABLE READ
transaction, which doesn't lock the tables.
To prevent any locks, set the mode to none
.
As an alternative, to prevent locks on MySQL databases running on Cloud SQL, replicate from the Replica instead of the transactional database.
Change the locking behavior during the snapshot for MySQL
- To change snapshot locking behavior in the MySQL database, set the runtime
argument,
snapshot.locking.mode
property, to an appropriate locking mode value.
For more information, see Pass a Debezium argument to a replication job.
Limitations
- Replication in Cloud Data Fusion supports Debezium Connector version 1.3.
Oracle sources in Cloud Data Fusion
Replication from Oracle sources in Cloud Data Fusion is powered by Datastream. Datastream doesn't lock tables.
What's next
- Learn more about Replication.