Data Stored in Cloud Storage

Stay organized with collections Save and categorize content based on your preferences.

Cloud Composer 1 | Cloud Composer 2

This page describes what data Cloud Composer stores for your environment in Cloud Storage.

When you create an environment, Cloud Composer creates a Cloud Storage bucket and associates the bucket with your environment. The name of the bucket is based on the environment region, name, and a random ID such as us-central1-b1-6efannnn-bucket.

Cloud Composer stores the source code for your workflows (DAGs) and their dependencies in specific folders in Cloud Storage and uses Cloud Storage FUSE to map the folders to the Airflow instances in your Cloud Composer environment.

Folders in the Cloud Storage bucket

Folder Storage path Mapped directory Description
DAG gs://bucket-name/dags /home/airflow/gcs/dags Stores the DAGs for your environment. Only the DAGs in this folder are scheduled for your environment.
Plugins gs://bucket-name/plugins /home/airflow/gcs/plugins Stores your custom plugins, such as custom in-house Airflow operators, hooks, sensors, or interfaces.
Data gs://bucket-name/data /home/airflow/gcs/data Stores the data that tasks produce and use. This folder is mounted on all worker nodes.
Logs gs://bucket-name/logs Stores the Airflow logs for tasks. Logs are also available in the Airflow web interface.

Capacity considerations

DAGs and plugins: By default, Cloud Composer provisions 100 GB capacity for your environment, the dags/ folder, and the plugins/ folder.

To avoid a workflow failure, store your DAGs, plugins, and Python modules in the dags/ or plugins/ folders—even if your Python modules do not contain DAGs or plugins. For example, you should store the py_file that a DataFlowPythonOperator references in dags/ or plugins/.

Data and logs: The data/ folder and logs/ folder are not subject to capacity limits.

To avoid a webserver error, make sure that data the webserver needs to parse a DAG (not run) is available in the dags/ folder. Otherwise, the webserver can't access the data or load the Airflow web interface.

Data synchronization

When you modify DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster.

Cloud Composer synchronizes the dags/ and plugins/ folders uni-directionally by copying locally. Unidirectional synching means that local changes in these folders are overwritten.

The data/ and logs/ folders synchronize bi-directionally by using Cloud Storage FUSE.

Data is not synchronized to the webserver because of limited capacity and because the webserver parses but doesn't run DAGs in a Cloud Composer environment. The workers run the DAGs.

What's next