Cloud Composer 1 | Cloud Composer 2
This page describes what data Cloud Composer stores for your environment in Cloud Storage.
When you create an environment, Cloud Composer creates a
Cloud Storage bucket and associates the bucket
with your environment. The name of the bucket is based on the environment region, name,
and a random ID such as
Cloud Composer stores the source code for your workflows (DAGs) and their dependencies in specific folders in Cloud Storage and uses Cloud Storage FUSE to map the folders to the Airflow instances in your Cloud Composer environment.
Folders in the Cloud Storage bucket
|Folder||Storage path||Mapped directory||Description|
||Stores the DAGs for your environment. Only the DAGs in this folder are scheduled for your environment.|
||Stores your custom plugins, such as custom in-house Airflow operators, hooks, sensors, or interfaces.|
||Stores the data that tasks produce and use. This folder is mounted on all worker nodes.|
||Stores the Airflow logs for tasks. Logs are also available in the Airflow web interface.|
DAGs and plugins: By default, Cloud Composer provisions
100 GB capacity for your environment, the
dags/ folder, and the
To avoid a workflow failure, store your DAGs, plugins, and Python modules
plugins/ folders—even if your Python modules
do not contain DAGs or plugins. For example, you should store
py_file that a
DataFlowPythonOperator references in
Data and logs: The
data/ folder and
logs/ folder are not subject to
To avoid a webserver error, make sure that data the webserver needs to
parse a DAG (not run) is available in the
dags/ folder. Otherwise, the
webserver can't access the data or load the Airflow web interface.
When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster.
Cloud Composer synchronizes the
uni-directionally by copying locally. Unidirectional synching means that local
changes in these folders are overwritten.
logs/ folders synchronize
bi-directionally by using Cloud Storage FUSE.
Data is not synchronized to the webserver because of limited capacity and because the webserver parses but doesn't run DAGs in a Cloud Composer environment. The workers run the DAGs.