This page provides an overview of the features and capabilities of Cloud Composer.
Cloud Composer is a managed Apache Airflow service that helps you create, schedule, monitor, and manage workflows.
A Cloud Composer environment is a wrapper around Apache Airflow. Cloud Composer creates the following components for each environment:
- Web server: The web server runs the Apache Airflow web interface, and Cloud Identity-Aware Proxy protects the interface. For more information, see Airflow Web Interface.
- Database: The database holds the Apache Airflow metadata.
- Cloud Storage bucket: Cloud Composer associates a Cloud Storage bucket with the environment. The associated bucket stores the DAGs, logs, custom plugins, and data for the environment. For more information about the storage bucket for Cloud Composer, see Data Stored in Cloud Storage.
To access and manage your Airflow environments, you can use the following Airflow-native tools:
- Web interface: You can access the Airflow web interface from the Google Cloud Platform Console or by direct URL with the appropriate permissions. For information, see Airflow Web Interface.
- Command line tools: After you install the Cloud SDK, you can run
gcloud composer environmentscommands to issue Airflow command-line commands to Cloud Composer environments. For information, see Airflow Command-line Interface.
In addition to native tools, the Cloud Composer REST and RPC APIs provide programmatic access to your Airflow environments. For more information, see APIs & References.
In general, the configurations that Cloud Composer provides for Apache Airflow are the same as the configurations for a locally-hosted Airflow deployment. Some Airflow configurations are preconfigured in Cloud Composer, and you cannot change the configuration properties. Other configurations, you specify when creating or updating your environment. For more information, see Blocked Airflow Configurations.
Airflow DAGs (workflows)
An Apache Airflow DAG is a workflow: a collection of tasks with additional task dependencies. Cloud Composer uses Cloud Storage to store DAGs. To add or remove DAGs from your Cloud Composer environment, you add or remove the DAGs from the Cloud Storage bucket associated with the environment. After you've moved DAGs to the storage bucket, DAGs are automatically added and scheduled in your environment.
In addition to scheduling DAGs, you can trigger DAGs manually or in response to events, such as changes that occur in the associated Cloud Storage bucket. For more information, see Triggering DAGs.
You can install custom plugins, such as custom, in-house Apache Airflow operators, hooks, sensors, or interfaces, into your Cloud Composer environment. For more information, see Installing Custom Plugins.
If the dependencies are not in the package index, you can also use the plugins feature.
You manage security at the GCP project level and can assign Cloud Identity and Access Management (IAM) roles that prevent individual users from modifying or creating environments. If someone does not have access to your project or does not have an appropriate Cloud Composer IAM role, that person cannot access any of your environments. For more information, see Cloud Composer Access Control.
Logging and monitoring
You can view Airflow logs that are associated with single DAG tasks
in the Airflow web interface
logs folder in the associated Cloud Storage bucket.
Streaming logs are available for Cloud Composer. You can access the streaming logs in Logs Viewer in the Google Cloud Platform Console and by using Stackdriver. For information about using Stackdriver, see Monitoring Cloud Composer Environments.
Cloud Composer also provides audit logs, such as Admin Activity audit logs, for your GCP projects. For information, see Viewing Audit Logs.
Networking and security
During environment creation, Cloud Composer provides the following configuration options:
- Cloud Composer environment with a route-based GKE cluster (default)
- Private IP Cloud Composer environment
- Cloud Composer environment with a VPC Native GKE cluster using alias IP addresses
- Shared VPC
Features not yet available
VPC Service Controls
VPC Service Controls enables service perimeter configuration around VPC resources and Google-managed services to control the movement of data across the perimeter boundary.