Data Stored in Cloud Storage

This page describes what data Cloud Composer stores for your environment in Cloud Storage.

When you create an environment, Cloud Composer creates a Cloud Storage bucket and associates the bucket with your environment. The name of the bucket is based on the environment region, name, and a random ID such as us-central1-b1-6efannnn-bucket.

Cloud Composer stores the source code for your workflows (DAGs) and their dependencies in specific folders in Cloud Storage and uses Cloud Storage FUSE to map the folders to the Airflow instances in your Cloud Composer environment.

Folders in the Cloud Storage bucket

FolderDescriptonStorage pathMapped directory
DAG Stores the DAGs for your environment. Only the DAGs in this folder are scheduled for your environment. gs://bucket-name/dags /home/airflow/gcs/dags
Plugins Stores your custom plugins, such as custom in-house Airflow operators, hooks, sensors, or interfaces. gs://bucket-name/plugins /home/airflow/gcs/plugins
Data Stores the data that tasks produce and use. This folder is mounted on all worker nodes. gs://bucket-name/data /home/airflow/gcs/data
Logs Stores the Airflow logs for tasks. Logs are also available in the Airflow web interface. gs://bucket-name/logs /home/airflow/gcs/logs

Capacity considerations

DAGs and plugins: By default, Cloud Composer provisions 100 GB capacity for your environment, the DAGs folder, and the Plugins folder. To avoid workflow failures, only store the DAGs and plugins for your environment in these folders.

Data and logs: The Data folder and Logs folder are not subject to capacity limits. However, to avoid a webserver error, make sure the data the webserver needs to parse a DAG (not run) is available in the DAGs folder. Otherwise, the webserver can't access the data or load the Airflow web interface.

Currently, you cannot change the storage capacity.

Data synchronization

When you modify the DAGs or plugins in the Cloud Storage bucket, Cloud Composer synchronizes the data across all the nodes in the cluster. Cloud Composer synchronizes the DAGs and plugins folders uni-directionally by copying locally and synchronizes data and logs folders bi-directionally by using Cloud Storage FUSE.

Data is not synchronized to the webserver because of limited capacity and because the webserver parses but doesn't run DAGs in a Cloud Composer environment. The workers run the DAGs.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Composer