This page provides an overview of stream concurrency controls, such as the maximum number of concurrent change data capture (CDC) tasks and backfill tasks. You can control stream performance by increasing or decreasing the values of these parameters.
Concurrency controls overview
By using the concurrency controls, you can either achieve faster backfill and CDC, or balance the load on the source database. If you require higher throughputs, and can afford a higher load on the database, then you can increase the concurrency of CDC and backfill tasks. Conversely, if your database is experiencing a high load, and you want to protect it from being overloaded, then you can reduce the values of these parameters.
Maximum number of CDC tasks
The maxConcurrentCdcTasks
parameter lets you control the number of CDC
tasks that a stream runs in parallel. To extend the CDC throughput, increase the
value of this parameter and allow Datastream to process more CDC log files
at the same time.
The key characteristics of the parameter include:
- The default value is
5
. You can set this parameter to any value between1
and50
, inclusive. - The parameter is applicable only to Oracle and MySQL sources.
- The parameter has impact only if there are more database log files available to read than there are CDC tasks. The log files settings are controlled by the source database configuration parameters: the maximum log file size and the maximum log rotation time interval. For more information about these parameters, refer to Oracle and MySQL documentation.
- If you decrease the number of concurrent CDC tasks, Datastream might lag behind the database logs, which might eventually lead to log position loss and stream failure.
Maximum number of backfill tasks
The maxConcurrentBackfillTasks
parameter lets you control the number of backfill
tasks that a stream can run in parallel. You can increase or decrease this value
to control the backfill throughput.
The key characteristics of the parameter include:
- The default value is
15
. You can set this parameter to any value between1
and50
, inclusive. - There is a high risk associated with increasing the backfill concurrency,
because backfill tasks have significant impact on the database performance.
Each backfill task runs an unfiltered
SELECT
query on a table, and for large tables, such queries return a large number of rows. - If you decrease the backfill concurrency, it has no negative impact on the source database except for the backfill taking a longer time to complete.
Change the values of concurrency controls
You can change the values of concurrency control parameters using the Datastream API.
- To learn how to increase or decrease the number of concurrent CDC tasks, see Change the number of maximum concurrent CDC tasks.
- To learn how to increase or decrease the number of concurrent backfill tasks, see Change the number of maximum concurrent backfill tasks
What's next
- See managing streams to learn more about how to use the Datastream API.
- See the Datastream API reference documentation
to learn more about the
Stream
resource.