Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
Cloud Composer synchronizes specific folders in your environment's bucket to Airflow components that run in your environment. See Data stored in Cloud Storage for more information. This page refers to issues that could disrupt the synchronization process and how to troubleshoot them.
Common Issues
The following sections describe symptoms and potential fixes for some common file synchronization issues.
Handling a large number of DAGs and plugins in dags and plugins folders
Contents of /dags
and /plugins
folders are synchronized from
your environment's bucket to local file systems of Airflow workers and
schedulers.
The more data stored in these folders, the longer it takes to perform the synchronization. To address such situations:
Limit the number of files in
/dags
and/plugins
folders. Store only the minimum of required files.Increase the disk space available to Airflow schedulers and workers.
Increase CPU and memory of Airflow schedulers and workers, so that the sync operation is performed faster.
In case of a very large number of DAGs, divide DAGs into batches, compress them into zip archives and deploy these archives into the
/dags
folder. This approach speeds up the DAGs syncing process. Airflow components extract zip archives before processing DAGs.Generating DAGs in a programmatic way might also be a method for limiting the number of DAG files stored in the
/dags
folder. See the Programmatic DAGs section in the DAGs Troubleshooting page to avoid problems with scheduling and executing DAGs generated programmatically.
Anti-patterns impacting DAGs and plugins syncing to schedulers, workers and web servers
Cloud Composer synchronizes the content of /dags
and /plugins
folders to schedulers and workers. Certain objects in /dags
and /plugins
folders might prevent this synchronization to work correctly or slow it down.
The
/dags
folder is synchronized to schedulers and workers.This folder is not synchronized to the web server.
The
/plugins
folder is synchronized to schedulers, workers and web servers.
You might encounter the following issues:
You uploaded gzip-compressed files that use compression transcoding to
/dags
and/plugins
folders. It usually happens if you use the--gzip-local-all
flag in agcloud storage cp
command to upload data to the bucket.Solution: Delete the object that used compression transcoding and re-upload it to the bucket.
One of the objects is named
.
. Such an object is not synchronized to schedulers and workers, and it might stop synchronizing at all.Solution: Rename the object.
A folder and a DAG Python file have the same names, for example
a.py
. In this case, the DAG file is not properly synchronized to Airflow components.Solution: Remove the folder that has the same name as the DAG Python file.
One of the objects in
/dags
or/plugins
folders contains a/
symbol at the end of the object's name. Such objects can interfere with the synchronization process because the/
symbol means that an object is a folder, not a file.Solution: Remove the
/
symbol from the name of the problematic object.Don't store unnecessary files in
/dags
and/plugins
folders.Sometimes DAGs and plugins that you implement come with additional files, such as files that store tests for these components. These files are synchronized to workers and schedulers and impact the time needed to copy these files to schedulers, workers and web servers.
Solution: Don't store any additional and unnecessary files in
/dags
and/plugins
folders.
Workers and DAG processors generate 'Is a directory' or 'Is a file' errors
This problem happens because objects can have overlapping namespace in Cloud Storage, while at the same time Airflow components of your environment use conventional Linux file systems. In Cloud Storage, it is possible to add both a folder and an object with the same name to a bucket. When the bucket is synchronized to the environment's Airflow components, an error is generated:
- If the object was synchronized first, then contents of the folder won't be
synchronized and the
Is a file
error is generated. - If the folder was synchronized first, then the object isn't synchronized and
Is a directory
error is generated.
Example:
Done [Errno 21] Is a directory: '/home/airflow/gcs/dags/...'
Both of these errors can lead to task failures because of missing files. In some cases, the synchronization process can be disrupted and other objects in the environment's bucket won't be synchronized as well.
Solution:
To fix this problem, make sure that there are no overlapping namespaces in the environment's bucket.
For example, if both /dags/misc
(an object) and /dags/misc/example_file.txt
(another object) are in a bucket, rename either the misc
object or the folder
where example_file.txt
is located, so that there's no overlap.