Environment architecture

Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3

This page describes the architecture of Cloud Composer environments.

Environment architecture configurations

Cloud Composer 2 environments can have the following architecture configurations:

Customer and tenant projects

When you create an environment, Cloud Composer distributes the environment's resources between a tenant and a customer project:

  • Customer project is a Google Cloud project where you create your environments. You can create more than one environment in a single customer project.

Tenant project is a Google-managed tenant project. Tenant project provides unified access control and an additional layer of data security to your environment. Each Cloud Composer environment has its own tenant project.

Environment components

A Cloud Composer environment consists of environment components.

An environment component is an element of a managed Airflow infrastructure that runs on Google Cloud, as a part of your environment. Environment components run either in the tenant or in the customer project of your environment.

Environment's cluster

Environment's cluster is an Autopilot mode VPC-native Google Kubernetes Engine cluster of your environment:

By default, Cloud Composer enables node auto-upgrades and node auto-repair to protect your environment's cluster from security vulnerabilities. These operations happen during maintenance windows that you specify for your environment.

Environment's bucket

Environment's bucket is a Cloud Storage bucket that stores DAGs, plugins, data dependencies, and Airflow logs. Environment's bucket is located in the customer project.

When you upload your DAG files to the /dags folder in your environment's bucket, Cloud Composer synchronizes the DAGs to Airflow components of your environment.

Airflow web server

Airflow web server runs the Airflow UI of your environment.

Cloud Composer provides access to the interface based on user identities and IAM policy bindings defined for users.

Airflow database

Airflow database is a Cloud SQL instance that runs in the tenant project of your environment. It hosts the Airflow metadata database.

To protect sensitive connection and workflow information, Cloud Composer allows database access only to the service account of your environment.

Other airflow components

Other Airflow components that run in your environment are:

  • Airflow schedulers parse DAG definition files, schedule DAG runs based on the schedule interval, and queues tasks for execution by Airflow workers. In Cloud Composer 2 Airflow DAG processors run as a part of scheduler components.

  • Airflow triggerers asynchronously monitor all deferred tasks in your environment. If you set the number of triggerers in your environment above zero, then you can use deferrable operators in your DAGs.

  • Airflow workers execute tasks that are scheduled by Airflow schedulers. The minimum and maximum number of workers in your environment changes dynamically depending on the number of tasks in the queue.

Public IP environment architecture

Public IP Cloud Composer environment resources in the tenant project and the customer project
Figure 1. Public IP environment architecture (click to enlarge)

In a Public IP environment architecture for Cloud Composer 2:

  • The tenant project hosts a Cloud SQL instance and Cloud SQL storage.
  • The customer project hosts all other components of the environment.
  • Airflow schedulers and workers in the customer project communicate with the Airflow database through a Cloud SQL proxy instance located in the customer project.

Private IP environment architecture

Private IP with PSC Cloud Composer environment resources in the tenant project and the customer project (click to enlarge)
Figure 2. Private IP Cloud Composer environment resources in the tenant project and the customer project (click to enlarge)

By default, Cloud Composer 2 uses Private Service Connect, so that your Private IP environments communicate internally without the use of VPC peerings. It's also possible to use VPC peerings instead of Private Service Connect in your environment. This is a non-default option.

In the Private IP environment architecture:

  • The tenant project hosts a Cloud SQL instance and Cloud SQL storage.
  • The customer project hosts all other components of the environment.
  • Airflow schedulers and workers connect to the Airflow database through the configured PSC endpoint.

Highly resilient Private IP architecture

Highly resilient Private IP environment resources in the tenant project and the customer project (click to enlarge)
Figure 3. Highly resilient Private IP Cloud Composer environment resources in the tenant project and the customer project (click to enlarge)

Highly resilient Cloud Composer environments are Cloud Composer 2 environments that use built-in redundancy and failover mechanisms that reduce the environment's susceptibility to zonal failures and single point of failure outages.

In this type of Private IP environment:

  • A Cloud SQL instance of your environment is configured for high availability (is a regional instance). Within a regional instance, the configuration is made up of a primary instance and a standby instance.
  • Your environment runs two Airflow schedulers, two web servers, and if triggerers are used, a minimum of two (up to ten total) triggerers. These pairs of components run in two separate zones.
  • The minimum number of workers is set to two, and your environment's cluster distributes worker instances between zones. In case of a zonal outage, affected worker instances are rescheduled in a different zone.

Integration with Cloud Logging and Cloud Monitoring

Cloud Composer integrates with Cloud Logging and Cloud Monitoring of your Google Cloud project, so that you have a central place to view Airflow and DAG logs.

Cloud Monitoring collects and ingests metrics, events, and metadata from Cloud Composer to generate insights through dashboards and charts.

Because of the streaming nature of Cloud Logging, you can view logs emitted by Airflow components immediately instead of waiting for Airflow logs to appear in the Cloud Storage bucket of your environment.

To limit the number of logs in your Google Cloud project, you can stop all logs ingestion. Do not disable Logging.

What's next