Configuring environments for DAG serialization

In a normal Cloud Composer environment, Directed Acrylic Graphs (DAGs) are processed continuously by the Airflow scheduler and web server. You can improve the reliability and performance of the Airflow web server by enabling DAG serialization, which forces the scheduler to process DAG files before they are sent to the web server.

How it works

Without DAG serialization, DAGs are processed simultaneously by the scheduler and web server, and the web server loads the entire DAG bag as soon as it starts. Enabling DAG serialization forces the scheduler to parse all DAG files before the web server starts, storing the results in a serialized DAG table. The web server then loads each DAG on-demand from the table for processing. Serializing DAGs in this way reduced the CPU and memory usage by the web server, especially when processing a large number of DAGs.

Prerequisites and limitations

  • DAG serialization can only be enabled on Cloud Composer environments using Composer version 1.8.2 or newer AND Airflow version 1.10.3 or newer. See the Cloud Composer version list for all available versions.

  • DAG serialization can't be enabled at the same time as asynchronous DAG loading.

  • Enabling DAG serialization disables all Airflow web server plugins for Cloud Composer. This doesn't impact scheduler or worker plugins, including Airflow operators, sensors etc.

Enabling DAG serialization

To enable DAG serialization, you must specify the following configuration parameters:

Section Key Value
core store_serialized_dags True
core store_dag_code True
core min_serialized_dag_update_interval 30
scheduler dag_dir_list_interval 30

[core] min_serialized_dag_update_interval controls how frequently the serialized DAG is updated in the database, while [scheduler] dag_dir_list_interval controls how frequently removed DAGs are deleted from the database. We recommend setting these to 30 seconds, as a high update frequency can negatively impact performance.

Overriding Airflow configurations

There are two ways to override Airflow configurations:

Disabling DAG serialization

To disable DAG serialization, use Airflow configuration overrides to set [core] store_serialized_dags and [core] store_dag_code to False.

References

To learn more about DAG serialization, read the relevant article in the Airflow documentation.