Stream concurrency controls

This page provides an overview of stream concurrency controls, such as the maximum number of concurrent change data capture (CDC) tasks and backfill tasks. You can control stream performance by increasing or decreasing the values of these parameters.

Concurrency controls overview

By using the concurrency controls, you can either achieve faster backfill and CDC, or balance the load on the source database. If you require higher throughputs, and can afford a higher load on the database, then you can increase the concurrency of CDC and backfill tasks. Conversely, if your database is experiencing a high load, and you want to protect it from being overloaded, then you can reduce the values of these parameters.

Maximum number of CDC tasks

The maxConcurrentCdcTasks parameter lets you control the number of CDC tasks that a stream runs in parallel. To extend the CDC throughput, increase the value of this parameter and allow Datastream to process more CDC log files at the same time.

The key characteristics of the parameter include:

  • The default value is 5. You can set this parameter to any value between 1 and 50, inclusive.
  • The parameter is applicable only to Oracle and MySQL sources.
  • The parameter has impact only if there are more database log files available to read than there are CDC tasks. The log files settings are controlled by the source database configuration parameters: the maximum log file size and the maximum log rotation time interval. For more information about these parameters, refer to Oracle and MySQL documentation.
  • If you decrease the number of concurrent CDC tasks, Datastream might lag behind the database logs, which might eventually lead to log position loss and stream failure.

Maximum number of backfill tasks

The maxConcurrentBackfillTasks parameter lets you control the number of backfill tasks that a stream can run in parallel. You can increase or decrease this value to control the backfill throughput.

The key characteristics of the parameter include:

  • The default value is 15. You can set this parameter to any value between 1 and 50, inclusive.
  • There is a high risk associated with increasing the backfill concurrency, because backfill tasks have significant impact on the database performance. Each backfill task runs an unfiltered SELECT query on a table, and for large tables, such queries return a large number of rows.
  • If you decrease the backfill concurrency, it has no negative impact on the source database except for the backfill taking a longer time to complete.

Change the values of concurrency controls

You can change the values of concurrency control parameters using the Datastream API.

What's next