Cloud Composer 1 | Cloud Composer 2
This page describes architecture of Cloud Composer 2 environments.
Environment architecture configurations
Cloud Composer 2 environments can have the following architecture configurations:
Each configuration slightly alters the architecture of environment resources.
Customer and tenant projects
When you create an environment, Cloud Composer distributes the environment's resources between a tenant and a customer project.
Customer project is a Google Cloud project where you create your environments. You can create more than one environment in a single customer project.
Tenant project is a Google-managed tenant project. Tenant project provides unified access control and an additional layer of data security for your environment. Each Cloud Composer environment has its own tenant project.
Environment components
A Cloud Composer environment consists of environment components.
An environment component is an element of a managed Airflow infrastructure that runs on Google Cloud, as a part of your environment.
Environment components run either in the tenant or in the customer project of your environment.
Some of your environment's components are based on standalone Google Cloud products. Quotas and limits for these products also apply to your environments. For example, Cloud Composer environments use VPC peerings. Quotas on the maximum number of VPC peerings apply to your customer project, so once your project reaches this maximum number of peerings, you cannot create additional environments.
Environment's cluster
Environment's cluster is an Autopilot mode VPC-native Google Kubernetes Engine cluster of your environment:
Environment nodes are VMs in the environment's cluster.
Pods in the environment's cluster run containers with other environment components, such as Airflow workers and schedulers. Pods run on environment nodes.
Workload resources of your environment's cluster manage sets of pods in your environment's cluster. Many components of your environment are implemented as different types of workload resources. For example, Airflow workers run as Deployments. In addition to Deployments, your environment also has StatefulSets, DaemonSets, and Jobs workload types.
By default, Cloud Composer enables node auto-upgrades and node auto-repair to protect your environment's cluster from security vulnerabilities. These operations happen during maintenance windows that you specify for your environment.
Airflow schedulers, triggerer, workers and Redis queue
Airflow schedulers control the scheduling of DAG runs and individual tasks from DAGs. Schedulers distribute tasks to Airflow workers by using a Redis queue, which runs as an application in your environment's cluster. Airflow schedulers run as Deployments in your environment's cluster.
Airflow workers execute individual tasks from DAGs by taking them from the Redis queue. Airflow workers run as Custom Resources in your environment's cluster.
Airflow triggerer asynchronously monitors all deferred tasks in your environment. By default, the triggerer is disabled in your environment. After you enable it, you can use deferrable operators in your DAGs. Even if the triggerer is disabled, your environment's cluster still runs a workload for it, with zero pods. If the triggerer is enabled, then it is billed as other environment components, with Cloud Composer Compute SKUs.
Redis queue holds a queue of individual tasks from your DAGs. Airflow schedulers fill the queue; Airflow workers take their tasks from it. Redis queue runs as a StatefulSet application in your environment's cluster, so that messages persist across container restarts.
Airflow web server
Airflow web server runs the Airflow UI of your environment.
In Cloud Composer 2 the Airflow web server runs as a Deployment in your environment's cluster.
Identity-Aware Proxy is not used for access in Cloud Composer 2.
Airflow database
Airflow database is a Cloud SQL instance that runs in the tenant project of your environment. It hosts the Airflow metadata database.
To protect sensitive connection and workflow information, Cloud Composer allows database access only to the service account of your environment.
Environment's bucket
Environment's bucket is a Cloud Storage bucket that stores DAGs, plugins, data dependencies, and Airflow logs. Environment's bucket resides in the customer project.
When you upload your DAG files to the /dags
folder in your
environment's bucket, Cloud Composer synchronizes the DAGs to
workers, schedulers, and the web server of your environment. You can store
your workflow artifacts in the data/
and logs/
folders without worrying
about size limitations, and retain full access control of your data.
Other environment components
A Cloud Composer environment has several additional environment components:
Cloud SQL Storage. Stores the Airflow database backups. Cloud Composer backs up the Airflow metadata daily to minimize potential data loss.
Cloud SQL Storage runs in the tenant project of your environment. You cannot access the Cloud SQL Storage contents.
Cloud SQL Proxy. Connects other components of your environment to the Airflow database.
Your Public IP environment can have one or more Cloud SQL Proxy instances depending on the volume of the traffic towards Airflow database.
In the case of Public IP environment, a Cloud SQL proxy runs as a Deployment in your environment's cluster.
When deployed in your environment's cluster, Cloud SQL Proxy also authorizes access to your Cloud SQL instance from an application, client, or other Google Cloud service.
Airflow monitoring. Reports environment metrics to Cloud Monitoring and triggers the
airflow_monitoring
DAG. Theairflow_monitoring
DAG reports the environment health data, which is later used, for example, on the monitoring dashboard of your environment. Airflow monitoring runs as a Deployment in your environment's cluster.Composer Agent performs environment operations such as creating, updating, upgrading, and deleting environments. In general, this component is responsible for introducing changes to your environment. Runs as a Job in your environment's cluster.
Airflow InitDB creates a Cloud SQL instance and initial database schema. Airflow InitDB runs as a Job in your environment's cluster.
FluentD. Collects logs from all environment components and uploads the logs to Cloud Logging. Runs as a DaemonSet in your environment's cluster.
Pub/Sub subscriptions. Your environment communicates with its GKE service agent through Pub/Sub subscriptions. It relies on Pub/Sub's default behavior to manage messages. Do not delete
.*-composer-.*
Pub/Sub topics. Pub/Sub supports a maximum of 10,000 topics per project.PSC endpoint connects Airflow schedulers and workers to the Airflow database in the Private IP with PSC architecture.
Customer Metrics Stackdriver Adapter reports metrics of your environment, for autoscaling. This component runs as a Deployment in your environment's cluster.
Airflow Worker Set Controller automatically scales your environment based on metrics from Customer Metrics Stackdriver Adapter. This component runs as a Deployment in your environment's cluster.
Cloud Storage FUSE. Mounts your environment's bucket as a file system on Airflow workers, schedulers, and web server, so that these components can access the data from the bucket. Runs as a DaemonSet in your environment's cluster.
Public IP environment architecture
In a Public IP environment architecture for Cloud Composer 2:
- The tenant project hosts a Cloud SQL instance and Cloud SQL storage.
- The customer project hosts all other components of the environment.
- Airflow schedulers and workers in the customer project communicate with the Airflow database through a Cloud SQL proxy instance located in the customer project.
Private IP
By default, Cloud Composer 2 uses Private Service Connect, so that your Private IP environments communicate internally without the use of VPC peerings. It's also possible to use VPC peerings instead of Private Service Connect in your environment. This is a non-default option.
In the Private IP environment architecture:
- The tenant project hosts a Cloud SQL instance and Cloud SQL storage.
- The customer project hosts all other components of the environment.
- Airflow schedulers and workers connect to the Airflow database through the configured PSC endpoint.
Integration with Cloud Logging and Cloud Monitoring
Cloud Composer integrates with Cloud Logging and Cloud Monitoring of your Google Cloud project , so that you have a central place to view the Airflow service and workflow logs.
Cloud Monitoring collects and ingests metrics, events, and metadata from Cloud Composer to generate insights through dashboards and charts.
Because of the streaming nature of Cloud Logging, you can view the logs that the Airflow scheduler and workers emit immediately instead of waiting for Airflow logs to appear in the Cloud Storage bucket of your environment. Because the Cloud Logging logs for Cloud Composer are based on google-fluentd, you have access to all logs produced by Airflow schedulers and workers.
To limit the number of logs in your Google Cloud project, you can stop all logs ingestion. Do not disable Logging.