Stay organized with collections
Save and categorize content based on your preferences.
After you deploy a replication job, you cannot edit or add tables to
it. Instead, add the tables to a new or duplicate replication job.
Option 1: Create a new replication job
Adding tables to a new job is the simplest approach. It prevents historical
reloading of all the tables and prevents data inconsistency issues.
The drawbacks are the increased overhead of managing multiple
replication jobs and the consumption of more compute resources, as
each job runs on a separate ephemeral Dataproc cluster by
default. The latter can be mitigated to some extent by using a shared static
Dataproc cluster for both jobs.
Option 2: Stop the current replication job and create a duplicate
If you duplicate the replication job to add the tables, consider the
following:
Enabling the snapshot for the duplicate job results in the historical load of
all the tables from scratch. This is recommended if you cannot use the
previous option, where you run separate jobs.
Disabling the snapshot to prevent the historical load can result in data
loss, as there could be missed events between when the old pipeline stops and
the new one starts. Creating an overlap to mitigate this issue isn't
recommended, as it can also result in data loss—historical data for the new
tables isn't replicated.
To create a duplicate replication job, follow these steps:
Stop the existing pipeline.
From the Replication jobs page, locate the job that you want to duplicate,
click more_vert and
Duplicate.
Enable the snapshot:
Go to Configure source.
In the Replicate existing data field, select Yes.
Add tables in the Select tables and transformations window and follow the
wizard to deploy the replication pipeline.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["\u003cp\u003eYou cannot directly edit or add tables to an existing replication job after deployment; instead, you must create a new or duplicate job.\u003c/p\u003e\n"],["\u003cp\u003eCreating a new replication job to add tables is the preferred method as it prevents historical reloading and data inconsistency issues, but it increases overhead and resource consumption.\u003c/p\u003e\n"],["\u003cp\u003eDuplicating a replication job to add tables requires careful consideration of snapshot settings, as enabling the snapshot triggers a full historical reload, while disabling it can lead to data loss.\u003c/p\u003e\n"],["\u003cp\u003eRunning duplicate replication jobs against the same target BigQuery dataset as the original job should be avoided, as it can cause data inconsistency.\u003c/p\u003e\n"],["\u003cp\u003eUsing a shared static Dataproc cluster can help mitigate the increased compute resource usage associated with running multiple replication jobs.\u003c/p\u003e\n"]]],[],null,[]]