Cloud Composer overview

Cloud Composer 1 | Cloud Composer 2

Cloud Composer is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and manage workflow pipelines that span across clouds and on-premises data centers.

Cloud Composer is built on the popular Apache Airflow open source project and operates using the Python programming language.

By using Cloud Composer instead of a local instance of Apache Airflow, you can benefit from the best of Airflow with no installation or management overhead. Cloud Composer helps you create managed Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, so you can focus on your workflows and not your infrastructure.

Apache Airflow and Cloud Composer

Workflows, DAGs, and tasks

In data analytics, a workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using DAGs, or "Directed Acyclic Graphs".

Relationship between DAGs and tasks
Figure 1. Relationship between DAGs and tasks

A DAG is a collection of tasks that you want to schedule and run, organized in a way that reflects their relationships and dependencies. DAGs are created in Python scripts, which define the DAG structure (tasks and their dependencies) using code.

Each task in a DAG can represent almost anything—for example, one task might perform any of the following functions:

  • Preparing data for ingestion
  • Monitoring an API
  • Sending an email
  • Running a pipeline

A DAG should not be concerned with the function of each constituent task—its purpose is to ensure that each task is executed at the right time, in the right order, or with the right issue handling.

Airflow workflows, DAGs, and tasks

For more information on DAGs and tasks, see the Apache Airflow documentation.

Cloud Composer environments

To run workflows, you first need to create an environment. Airflow depends on many micro-services to run, so Cloud Composer provisions Google Cloud components to run your workflows. These components are collectively known as a Cloud Composer environment.

Environments are self-contained Airflow deployments based on Google Kubernetes Engine. They work with other Google Cloud services using connectors built into Airflow. You can create one or more environments in a single Google Cloud project. You can create Cloud Composer environments in any supported region.

For an in-depth look at the components of an environment, see Cloud Composer environment architecture.

Cloud Composer features

When using Cloud Composer, you can manage and use features such as:

  • Airflow environments
  • Airflow management
  • Airflow configuration
  • Airflow DAGs (workflows)
  • Custom Apache Plugins

To learn how Cloud Composer works with Airflow features such as Airflow DAGs, Airflow configuration parameters, custom plugins, and python dependencies, see Cloud Composer features.

Frequently Asked Questions

What version of Apache Airflow does Cloud Composer use?

Cloud Composer supports both Airflow 1 and Airflow 2.

Cloud Composer environments are based on Cloud Composer images. When you create an environment, you can select an image with a specific Airflow version.

You have control over the Apache Airflow version of your environment. You can decide to upgrade your environment to a newer version of Cloud Composer image. Each Cloud Composer release supports several Apache Airflow versions.

Can I use native Airflow UI and CLI?

You can access the Apache Airflow web interface of your environment. Each of your environments has its own Airflow UI. For more information about accessing the Airflow UI, see Airflow web interface.

To run Airflow CLI commands in your environments, you use gcloud commands. For more information about running Airflow CLI commands in Cloud Composer environments, see Airflow command-line interface.

Can I use my own database as the Airflow Metadata DB?

Cloud Composer uses a managed database service for the Airflow Metadata DB. It is not possible to use a user-provided database as the Airflow Metadata DB.

Can I use my own cluster as a Cloud Composer cluster?

Cloud Composer uses Google Kubernetes Engine service to create, manage and delete environment clusters where Airflow components run. These clusters are fully managed by Cloud Composer.

It is not possible to build a Cloud Composer environment based on a self-managed Google Kubernetes Engine cluster.

Can I use my own container registry?

Cloud Composer uses Artifact Registry service to manage container image repositories used by Cloud Composer environments. It is not possible to replace it with a user-provided container registry.

Are Cloud Composer environments zonal or regional?

Cloud Composer 1 environments are zonal.

Cloud Composer 2 environments have a zonal Airflow Metadata DB and a regional Airflow scheduling & execution layer. Airflow schedulers, workers and web servers run in the Airflow execution layer.

What's next