This page provides an overview of the features and capabilities of Cloud Composer.
Cloud Composer is a managed Apache Airflow service that helps you create, schedule, monitor, and manage workflows.
A Cloud Composer environment is a wrapper around Apache Airflow. Cloud Composer creates the following components for each environment:
- GKE Cluster: The Airflow scheduler, workers, and Redis Queue run as GKE workloads on a single cluster, and are responsible for processing and executing DAGs. The cluster also hosts other Cloud Composer components like the Composer Agent and Airflow Monitoring, which help manage the Cloud Composer environment, gather logs to store in Cloud Logging, and gather metrics to upload to Cloud Monitoring.
- Web server: The web server runs the Apache Airflow web interface, and Identity-Aware Proxy protects the interface. For more information, see Airflow Web Interface.
- Database: The database holds the Apache Airflow metadata.
- Cloud Storage bucket: Cloud Composer associates a Cloud Storage bucket with the environment. The associated bucket stores the DAGs, logs, custom plugins, and data for the environment. For more information about the storage bucket for Cloud Composer, see Data Stored in Cloud Storage.
To access and manage your Airflow environments, you can use the following Airflow-native tools:
- Web interface: You can access the Airflow web interface from the Google Cloud Console or by direct URL with the appropriate permissions. For information, see Airflow Web Interface.
- Command line tools: After you install the Cloud SDK, you can run
gcloud composer environmentscommands to issue Airflow command-line commands to Cloud Composer environments. For information, see Airflow Command-line Interface.
In addition to native tools, the Cloud Composer REST and RPC APIs provide programmatic access to your Airflow environments. For more information, see APIs & References.
In general, the configurations that Cloud Composer provides for Apache Airflow are the same as the configurations for a locally-hosted Airflow deployment. Some Airflow configurations are preconfigured in Cloud Composer, and you cannot change the configuration properties. Other configurations, you specify when creating or updating your environment. For more information, see Blocked Airflow Configurations.
Airflow DAGs (workflows)
An Apache Airflow DAG is a workflow: a collection of tasks with additional task dependencies. Cloud Composer uses Cloud Storage to store DAGs. To add or remove DAGs from your Cloud Composer environment, you add or remove the DAGs from the Cloud Storage bucket associated with the environment. After you've moved DAGs to the storage bucket, DAGs are automatically added and scheduled in your environment.
In addition to scheduling DAGs, you can trigger DAGs manually or in response to events, such as changes that occur in the associated Cloud Storage bucket. For more information, see Triggering DAGs.
You can install custom plugins, such as custom, in-house Apache Airflow operators, hooks, sensors, or interfaces, into your Cloud Composer environment. For more information, see Installing Custom Plugins.
If the dependencies are not in the package index, you can also use the plugins feature.
You manage security at the Google Cloud project level and can assign Identity and Access Management (IAM) roles that prevent individual users from modifying or creating environments. If someone does not have access to your project or does not have an appropriate Cloud Composer IAM role, that person cannot access any of your environments. For more information, see Cloud Composer Access Control.
Logging and monitoring
You can view Airflow logs that are associated with single DAG tasks
in the Airflow web interface
logs folder in the associated Cloud Storage bucket.
Streaming logs are available for Cloud Composer. You can access the streaming logs in Logs Viewer in the Google Cloud Console and by using Google Cloud's operations suite. For information about using Google Cloud's operations suite, see Monitoring Cloud Composer Environments.
Cloud Composer also provides audit logs, such as Admin Activity audit logs, for your Google Cloud projects. For information, see Viewing Audit Logs.
Networking and security
By default, Cloud Composer deploys a route-based GKE cluster that uses the default VPC network for machine communications. For additional security and networking flexibility, Cloud Composer also supports the following features.
Shared VPC enables shared network resource management from a central host project to enforce consistent network policies across projects.
When Cloud Composer participates in a shared VPC, the Cloud Composer environment is in a service project and can invoke services hosted in other Google Cloud projects. Resources within your service projects communicate securely across project boundaries using internal IP addresses. For network and host project requirements, see Configuring shared VPC.
VPC-native Cloud Composer environment
In this configuration, Cloud Composer deploys a VPC-native GKE cluster using alias IP addresses in your environment. When you use VPC-native clusters, GKE automatically chooses a secondary range. For specific networking requirements, you can also configure the secondary ranges for your GKE pods and GKE services during Cloud Composer environment configuration.
Private IP Cloud Composer environment
With private IP, Cloud Composer workflows are fully isolated from the public internet.
In this configuration, Cloud Composer deploys a VPC-native GKE cluster using alias IP addresses in the customer project. The GKE cluster for your environment is configured as a private cluster, and the Cloud SQL instance is configured for private IP. Cloud Composer also creates a peering connection between your customer project's VPC network and your tenant project's VPC network.