This page explains how to create a Cloud Composer environment.
A Cloud Composer environment runs the Apache Airflow software. When creating a new environment in a Google Cloud (GCP) project, you can specify several parameters, such as the Compute Engine machine type or the number of nodes in the cluster.
Before you begin
Most
gcloud composer
commands require a location. You can specify the location by using the--location
flag or by setting the default location.Some Airflow configurations are preconfigured for Cloud Composer, and you cannot change them.
It takes approximately 25 minutes for the system to create your environment.
Shared VPC: There are specific network requirements to use Shared VPC with Cloud Composer. For information, see Configuring shared VPC.
Private IP: There are specific network and peering requirements to create a private IP Cloud Composer environment. For information, see Configuring private IP.
Access control
The following permissions are required to create Cloud Composer environments: (1)
composer.environments.create
and (2)iam.serviceAccounts.actAs
on the service account under which the environment will run. For more information, see Cloud Composer Access Control.By default, Cloud Composer environments run using the Compute Engine default service account. During environment creation, you can specify a custom service account. At minimum, this service account requires the permissions that the
composer.worker
role provides to access resources in the Cloud Composer environment. You must also be authorized to "act as" the service account by having theiam.serviceAccounts.actAs
permission enabled on the service account or project that contains the service account. Make sure you have been granted one of the roles that includes this permission, such as, Service Account User, Owner, or Editor. See Understanding service accounts for additional details.If your custom service account needs to access other resources in your Google Cloud project during task execution, you can grant the service account the required roles. Alternatively, you can provide the relevant credentials as an Airflow connection, then reference the connection in the operator.
Google-managed service accounts are displayed in your project's IAM policy and GCP Console. For example,
service-PROJECT_ID@cloudcomposer-accounts.iam.gserviceaccount.com
is the name of a Google-managed service account used by the Cloud Composer service to manage (create, update, and delete) Cloud Composer environments in your project.To deploy Cloud Composer environments inside a security perimeter, see Configuring VPC Service Controls. When used with Cloud Composer, VPC Service Controls have several known limitations.
Creating a new environment
To create a Cloud Composer environment:
Console
- Open the Create Environment page in the Google Cloud Console.
- Enter a name for your environment.
The name must start with a lowercase letter followed by up to 62 lowercase letters, numbers, or hyphens, and cannot end with a hyphen. The environment name is used to create subcomponents for the environment, so you must provide a name that is also valid as a Cloud Storage bucket name. See Bucket naming guidelines for a list of restrictions.
Under Node configuration, specify the settings for nodes in the Google Kubernetes Engine cluster. If you do not specify a setting, the default is used.
Setting Description Node count The number of Google Kubernetes Engine nodes used to run the environment. The default is 3 nodes. Once you specify the number of nodes, it stays fixed until you update your environment. Location (Required) The Compute Engine region where the environment is created. Zone suffix The Compute Engine zone where the virtual machine instances that run Apache Airflow are created. A random zone within the location is selected if unspecified. Machine type The Compute Engine machine type used for cluster instances. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type is n1-standard-1
.Disk size The disk size in GB used for the node VM instances. The minimum size is 20 GB. The default size is 100 GB. OAuth Scopes The set of Google API scopes made available on all node VM instances. The default is https://www.googleapis.com/auth/cloud-platform
and must be included in the list of specified scopes.Service account The Google Cloud service account to be used by the node VM instances. The default Compute Engine service account is used if unspecified. Tags The list of instance tags applied to all the node VM instances. Tags are used to identify valid sources or targets for network firewalls. Each tag within the list must comply with RFC 1035. Image version The Cloud Composer version to use for your environment (includes Cloud Composer and Airflow version). For default version information, see Versions list. Python version The Python version to use for your environment. Supported versions are Python 2 and Python 3. The default version is 3. Under Cloud SQL configuration, specify the settings for Cloud SQL instance running the Airflow database. If you do not specify a setting, the default is used.
Setting Description Cloud SQL machine type The machine type for the Cloud SQL instance running the Airflow database. The machine type determines the number of CPUs and the amount of memory for your environment. Under Network configuration, specify the network settings for the Google Kubernetes Engine cluster. If you do not specify a setting, the default is used.
Setting Description Enable VPC-native (using alias IP) Creates a VPC-native GKE cluster with alias IPs for your environment. The default is a routes-based GKE cluster. Required for a private IP Cloud Composer environment. Network The Virtual Private Cloud network that is used for machine communications. The network is required to specify a subnetwork. The default network is used if unspecified. Shared VPC requires a host project. Subnetwork The Virtual Private Cloud subnetwork that is used for machine communications. If your network uses a custom-mode network, the subnetwork is required. Pod IP Address Allocation The secondary range to allocate IP addresses for pods in the GKE cluster. If unspecified, a new secondary range is created. This setting is permanent. Service IP Address Allocation The secondary range to reserve space for Cloud Composer services. If unspecified, a new secondary range is created. This setting is permanent. Private IP environment Enables a private IP Cloud Composer environment. Disabled by default. Access GKE master using its external IP address Enables public access to the GKE cluster master. Requires Private IP environment. GKE Master IP range The private RFC 1918 range for the master's VPC. If unspecified, uses the default value 172.16.0.0/28
. Required for Private IP environment.Ensure that secondary ranges are large enough to accommodate the cluster's size and anticipated growth. For example, the network prefixes of the secondary ranges for a 3-node Cloud Composer environment should be no longer than:
- Pods:
/22
- Services:
/27
See Creating a VPC-native cluster for guidelines on configuring secondary ranges for Pods and Services.
- Pods:
Under Web server configuration, specify the IP ranges that can access the Airflow web server for your environment and a machine type for the Airflow web server.
Setting Description Allow access from all IP addresses (default) All IP ranges can access the Airflow web server. Allow access only from specific IP addresses Only specific IP ranges can access the web server. To add a new range, click Add IP range. To remove a range, click the trash button for that row. To deny all IP ranges, delete all rows. Web server machine type The machine type for the Compute Engine instance that runs the Airflow web server. The machine type determines the number of CPUs and the amount of memory for your environment. (Preview) Under Maintenance windows, you can set custom time windows for Cloud Composer to perform environment maintenance. Your environment may be temporarily unavailable during these windows, so choose hours (e.g. weekend or off-peak) when you are less likely to run workflows. Your maintenance windows must encompass at least 12 hours per week in total. This can also be set after environment creation from the Environment configuration tab of the Environment details page.
(Optional) To override the default values in the Airflow configuration file (
airflow.cfg
), click Add Airflow configuration property.(Optional) To configure environment variables, click Add environment variable. See Environment Variables for requirements.
(Optional) To add a label, click Add labels.
Label keys and label values can only contain letters, numbers, dashes, and underscores. Label keys must start with a letter or number.
Click Create.
gcloud
gcloud composer environments create ENVIRONMENT_NAME \ --location LOCATION \ OTHER_ARGUMENTS
The following parameters are required:
ENVIRONMENT_NAME
is the name of the environment. Must match the pattern:^[a-z](?:[-0-9a-z]{0,61}[0-9a-z])?$
. The environment name is used to create subcomponents for the environment, so you must provide a name that is also valid as a Cloud Storage bucket name. See Bucket naming guidelines for a list of restrictions.LOCATION
is the Compute Engine region where the environment is located. Ensure that the location you specify is one where Composer is available.
The following parameters are optional:
airflow-configs
is a list of SECTION_NAME-PROPERTY_NAME=VALUE Airflow configuration overrides. The section name and property name must be separated by a hyphen.cloud-sql-machine-type
is a machine type for the Cloud SQL instance running the Airflow database. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type isdb-n1-standard-2
. Possible values for this parameter are:db-n1-standard-2
,db-n1-standard-4
,db-n1-standard-8
, anddb-n1-standard-16
.disk-size
is the disk size in GB used for the node VMs. The minimum size is 20 GB. The default disk size is 100 GB.env-variables
is a list of NAME=VALUE environment variables that are set on the Airflow scheduler, worker, and web server processes.enable-private-environment
enables a private IP Cloud Composer environment.master-ipv4-cidr
is the private RFC 1918 range for the master's VPC. Required whenenable-private-environment
is true.
enable-private-endpoint
enables public access to the GKE cluster master. Requiresenable-private-environment
.enable-ip-alias
enables VPC Native using alias IP addresses. Required whenenable-private environment
is true or to configure secondary ranges for pods and services:cluster-secondary-range-name
orcluster-ipv4-cidr
configures the configure the secondary range for pods.services-secondary-range-name
orservices-ipv4-cidr
configures the secondary range for services.- Preview:
max-pods-per-node
configures the maximum pods per node in the GKE cluster allocated during environment creation. Lowering this value reduces IP address consumption by the Cloud Composer Kubernetes cluster. For more information, see Optimizing IP address allocation. This value can only be set during environment creation, and only if the environment is VPC-Native. The range of possible values is 8-110, and the default is 32. While in Preview, this parameter requires you to use thegcloud beta composer
command.
image-version
is thecomposer-addon
version and Airflow version to use for your environment in the formcomposer-a.b.c-airflow-x.y.z
. For version alias and default version information, see Cloud Composer Versioning.labels
are user-specified labels that are attached to the environment and its resources.machine-type
is the Compute Engine machine type. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type isn1-standard-1
.network
is the Virtual Private Cloud network used for machine communications.- The network is required to specify a subnetwork. The default network is used if unspecified.
- When using Shared VPC, the network's relative resource name
be provided using the format
projects/HOST_PROJECT_ID/global/networks/NETWORK_ID
. For Shared VPC subnetwork requirements, seesubnetwork
below.
node-count
is the number of GKE nodes used to run the environment. The default node count is 3. Once you specify the number of nodes, it stays fixed until you update your environment.oauth-scopes
is the set of Google API scopes made available on all of the node VMs. The default OAuth scope ishttps://www.googleapis.com/auth/cloud-platform
and must be included in the list of scopes if specified.python-version
is the Python version to use for your environment. Supported versions are Python 2 and Python 3. The default version is 2.subnetwork
is the Compute Engine subnetwork to which the environment is connected.- If your network uses a custom-mode network, the subnetwork is required.
- When creating a Shared VPC environment using gcloud, you must use the
secondary IP ranges composer-pods and composer-services.
You can specify different secondary range names by
using the Cloud Composer API.
The subnetwork name must also be specified as a relative resource name
using the format
projects/HOST_PROJECT_ID/regions/REGION_ID/subnetworks/SUBNET_ID
.
service-account
is the Google Cloud service account to be used by the node VM instances. The default Compute Engine service account is used if unspecified.tags
is the list of instance tags applied to all the node VMs. Tags are used to identify valid sources or targets for network firewalls. Each tag within the list must comply with RFC 1035.web-server-machine-type
is a machine type for the Compute Engine instance that runs the Airflow web server. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type iscomposer-n1-webserver-2
. Possible values for this parameter are:composer-n1-webserver-2
,composer-n1-webserver-4
, andcomposer-n1-webserver-8
.
The following example creates an environment running the latest
supported Cloud Composer image version
in the us-central1
region that uses
the n1-standard-2
machine type with a beta
environment label:
gcloud beta composer environments create test-environment \ --location us-central1 \ --zone us-central1-f \ --machine-type n1-standard-2 \ --image-version composer-latest-airflow-x.y.z \ --labels env=beta
The following Shared VPC example creates an environment in the host project. The
environment is in the us-central1
region and uses
the n1-standard-2
machine type with a beta
environment label:
gcloud beta composer environments create host-project-environment \ --network vpc-network-name --subnetwork vpc-subnetwork-name --location us-central1 \ --zone us-central1-f \ --machine-type n1-standard-2 \ --labels env=beta
Preview: Custom maintenance windows
You can set custom time windows for Cloud Composer to perform environment maintenance. Your environment may be temporarily unavailable during these windows, so choose hours (e.g. weekend or off-peak) when you are less likely to run workflows. Your maintenance windows must encompass at least 12 hours per week in total. Use the following optional parameters:
maintenance-window-start
sets the start time of a custom maintenance window.maintenance-window-end
sets the end time of a custom maintenance window.maintenance-window-recurrence
sets the days for maintenance window recurrence.
For example:
gcloud beta composer environments create test-environment \ --location us-central1 \ --zone us-central1-f \ --machine-type n1-standard-2 \ --image-version composer-latest-airflow-x.y.z \ --labels env=beta \ --maintenance-window-start-time='2019-08-01T01:00:00Z' \ --maintenance-window-end-time='2019-08-01T07:00:00Z' \ --maintenance-window-recurrence='FREQ=WEEKLY;BYDAY=SA,SU'
This creates an environment with a maintenance window between 01:00 and
07:00 (UTC) every Sunday and Saturday. Days of the week (Sunday through
Saturday) are represented as follow: SU, MO, TU, WE, TH, FR, SA
.
Using FREQ=DAILY
will set the maintenance window to recur every day.
API
To create a new Cloud Composer environment with the
Cloud Composer REST API,
construct an environments.create
API request, filling in the Environment
resource with your configuration information.
Terraform
To configure this environment using Terraform, add the following resource block to your Terraform configuration and run terraform apply
.
resource "google_composer_environment" "example-resource" { name = "ENVIRONMENT_NAME" region = "LOCATION" }
The following parameters are required:
name
, whereENVIRONMENT_NAME
is the name of the environment. Must match the pattern:^[a-z](?:[-0-9a-z]{0,61}[0-9a-z])?$
. The environment name is used to create subcomponents for the environment, so you must provide a name that is also valid as a Cloud Storage bucket name. See Bucket naming guidelines for a list of restrictions.region
, whereLOCATION
is the Compute Engine region where the environment is located. Ensure that the location you specify is one where Composer is available.
Usage of additional optional parameters is defined in the Terraform Argument Reference
The following example creates an environment running the latest
supported Cloud Composer image version
in the us-central1
region that uses
the n1-standard-2
machine type with a beta
environment label. To configure this environment using Terraform, add the following resource block to your Terraform configuration and run terraform apply
:
resource "google_composer_environment" "example-resource" { name = "example-environment" region = "us-central1" config { node_config { zone = "us-central1-f" machine_type = "n1-standard-2" } software_config { image_version = "composer-latest-airflow-x.y.z" } } labels = {"env": "beta"} }
The following Shared VPC example creates an environment in the host project. The
environment is in the us-central1
region and uses
the n1-standard-2
machine type with a beta
environment label. To configure this environment using Terraform, add the following resource block to your Terraform configuration and run terraform apply
:
resource "google_composer_environment" "example-resource" { name = "host-project-environment" region = "us-central1" config { node_config { zone = "us-central1-f" machine_type = "n1-standard-2" network = "vpc-network-name" subnetwork = "vpc-subnetwork-name" } software_config { image_version = "composer-latest-airflow-x.y.z" } } labels = {"env": "beta"} }
Configuring email notifications
Configuring SendGrid email services
To receive notifications, configure your environment variables to send email through the SendGrid email service.
If you haven't already, sign up with SendGrid via the Google Cloud Console and create an API key. As a Google Cloud developer, you can start with 12,000 free emails per month.
In the Cloud Console, open the Create Environment page.
Under Node configuration, click Add environment variable.
Enter the following environment variables:
Name Value SENDGRID_MAIL_FROM
The From: email address, such as noreply-composer@<your-domain>
.SENDGRID_API_KEY
Your SendGrid API key. To test SendGrid configuration:
- Create a test DAG that uses the
EmailOperator
. - Upload the DAG to your environment and check that the EmailOperator task succeeds.
- Sign in to SendGrid in with your SendGrid credentials.
- In the SendGrid UI, go to Activity page.
- Search the list for the email. You should see that SendGrid processed and delivered the email.
- If the email is not processed and delivered:
- Check your Sendgrid configurations.
- Verify that the
SENDGRID_MAIL_FROM
andSENDGRID_API_KEY
environment variables are correct. - Check the spam filter in your email client.
- Create a test DAG that uses the
Configuring third-party SMTP services
To send email through a third-party SMTP service,
override the email_backend
Airflow configuration option
and configure other SMTP-related parameters.
To configure a third-party SMTP service, override Airflow configuration options:
Section | Key | Value |
---|---|---|
email |
email_backend |
airflow.utils.email.send_email_smtp |
smtp |
smtp_host |
The hostname for the SMTP server. |
smtp |
smtp_user |
The user name on the SMTP server. |
smtp |
smtp_port |
The port for the SMTP server. Port 25 is not available. You can use other ports, such as standard SMTP ports 465 and 587. |
smtp |
smtp_password |
Setting a password via smtp_password is not supported. To set an SMTP password, follow instructions provided in Configuring an SMTP password. |
smtp |
smtp_mail_from |
The From: email address, such as noreply-composer@ . |
smtp |
smtp_starttls |
For enhanced security, set to True . |
smtp |
smtp_ssl |
For enhanced security, set to True . |
For other SMTP configurations, see the default_airflow.cfg
for your Airflow release.
Configuring an SMTP password for a third-party SMTP service
Keeping an SMTP password in plain text in Airflow configuration file is a bad security practice. That's why Cloud Composer does not support this method. Instead, you can use two other methods for configuring an SMTP password.
Using a command to retrieve an SMTP password
You can use a configuration override to specify a command that obtains the SMTP password. When communicating with your SMTP service, Airflow uses this command to get the value of the password.
To use this method, override the following Airflow configuration property:
Section | Key | Value |
---|---|---|
smtp |
smtp_password_cmd |
Specify a command that returns the SMTP password. |
Using a secret stored in Secret Manager to retrieve an SMTP password
You can configure Secret Manager as your Airflow secrets backend.
Once you configure Secret Manager for your Composer environment, you can store an SMTP password in Secret Manager:
Create a new secret:
echo "SMTP_PASSWORD" | gcloud beta secrets create airflow-config-smtp-password --data-file=- --replication-policy=automatic
Replace
SMTP_PASSWORD
with your SMTP password.Configure Airflow to obtain the SMTP password from Secret Manager. To do so, override the following Airflow configuration property:
Section Key Value smtp
smtp_password_secret
smtp-password