Cloud Composer 1 | Cloud Composer 2
This page provides an overview of the features and capabilities of Cloud Composer.
To learn more about the differences between Cloud Composer 1 and Cloud Composer 2 please refer to the versioning overview
Cloud Composer is a managed Apache Airflow service that helps you create, schedule, monitor, and manage workflows.
A Cloud Composer environment is a wrapper around Apache Airflow. Cloud Composer creates the following components for each environment:
- GKE cluster: The Airflow schedulers, workers, and Redis Queue run as GKE workloads on a single cluster, and are responsible for processing and executing DAGs. The cluster also hosts other Cloud Composer components like Composer Agent and Airflow Monitoring, which help manage the Cloud Composer environment, gather logs to store in Cloud Logging, and gather metrics to upload to Cloud Monitoring.
- Web server: The web server runs the Apache Airflow web interface, and Identity-Aware Proxy protects the interface. For more information, see Airflow Web Interface.
- Database: The database holds the Apache Airflow metadata.
- Cloud Storage bucket: Cloud Composer associates a Cloud Storage bucket with the environment. The associated bucket stores the DAGs, logs, custom plugins, and data for the environment. For more information about the storage bucket for Cloud Composer, see Data Stored in Cloud Storage.
To access and manage your Airflow environments, you can use the following Airflow-native tools:
- Web interface: You can access the Airflow web interface from the Google Cloud console or by direct URL with the appropriate permissions. For information, see Airflow Web Interface.
- Command line tools: After you install the Google Cloud CLI, you can run
gcloud composer environmentscommands to issue Airflow command-line commands to Cloud Composer environments. For information, see Airflow Command-line Interface.
In addition to native tools, the Cloud Composer REST and RPC APIs provide programmatic access to your Airflow environments. For more information, see APIs and References.
In general, the configurations that Cloud Composer provides for Apache Airflow are the same as the configurations for a locally-hosted Airflow deployment. Some Airflow configurations are preconfigured in Cloud Composer, and you cannot change the configuration properties. Other configurations, you specify when creating or updating your environment. For more information, see Blocked Airflow Configurations.
Airflow DAGs (workflows)
An Apache Airflow DAG is a workflow: a collection of tasks with additional task dependencies. Cloud Composer uses Cloud Storage to store DAGs. To add or remove DAGs from your Cloud Composer environment, you add or remove the DAGs from the environment's bucket associated with the environment. After you've moved DAGs to the storage bucket, DAGs are automatically added and scheduled in your environment.
In addition to scheduling DAGs, you can trigger DAGs manually or in response to events, such as changes that occur in the associated Cloud Storage bucket. For more information, see Triggering DAGs.
You can install custom plugins, such as custom, in-house Apache Airflow operators, hooks, sensors, or interfaces, into your Cloud Composer environment. For more information, see Installing Custom Plugins.
If the dependencies are not in the package index, you can also use the plugins feature.
You manage security at the Google Cloud project level and can assign Identity and Access Management (IAM) roles that prevent individual users from modifying or creating environments. If someone does not have access to your project or does not have an appropriate Cloud Composer IAM role, that person cannot access any of your environments. For more information, see Access control.
Logging and monitoring
Cloud Composer also provides audit logs, such as Admin Activity audit logs, for your Google Cloud projects. For information, see Viewing Audit Logs.
Networking and security
For additional security and networking flexibility, Cloud Composer also supports the following features.
Shared VPC enables shared network resource management from a central host project to enforce consistent network policies across projects.
When Cloud Composer participates in a shared VPC, the Cloud Composer environment is in a service project and can invoke services hosted in other Google Cloud projects. Resources within your service projects communicate securely across project boundaries using internal IP addresses. For network and host project requirements, see Configuring shared VPC.
VPC-native Cloud Composer environment
In this configuration, Cloud Composer deploys a VPC-native GKE cluster using alias IP addresses in your environment. When you use VPC-native clusters, GKE automatically chooses a secondary range. For specific networking requirements, you can also configure the secondary ranges for your GKE pods and GKE services during Cloud Composer environment configuration.
Private IP Cloud Composer environment
With private IP, Cloud Composer workflows are fully isolated from the public internet.
In this configuration, Cloud Composer deploys a VPC-native GKE cluster using alias IP addresses in the customer project. The GKE cluster for your environment is configured as a private cluster, and the Cloud SQL instance is configured for private IP.
Cloud Composer also creates a peering connection between your customer project's VPC network and your tenant project's VPC network.
Data lineage integration with Dataplex
Running Cloud Composer DAGs can often result in creating or updating data sources such as BigQuery tables in your project. Data lineage is a Dataplex feature that lets you track how data moves through your systems: where it comes from, where it is passed to, and what transformations are applied to it.