Creating environments

This page explains how to create a Cloud Composer environment and override default Airflow environment settings during the creation process.

A Cloud Composer environment runs the Apache Airflow software. When creating a new environment in a Google Cloud (GCP) project, you can specify several parameters, such as the Compute Engine machine type or the number of nodes in the cluster.

Before you begin

Access control

  • The following permissions are required to create Cloud Composer environments: (1) composer.environments.create and (2) iam.serviceAccounts.actAs on the service account under which the environment will run. For more information, see Cloud Composer Access Control.

  • By default, Cloud Composer environments run using the Compute Engine default service account. During environment creation, you can specify a custom service account. At minimum, this service account requires the permissions that the composer.worker role provides to access resources in the Cloud Composer environment. You must also be authorized to "act as" the service account by having the iam.serviceAccounts.actAs permission enabled on the service account or project that contains the service account. Make sure you have been granted one of the roles that includes this permission, such as, Service Account User, Owner, or Editor. See Understanding service accounts for additional details.

  • If your custom service account needs to access other resources in your Google Cloud project during task execution, you can grant the service account the required roles. Alternatively, you can provide the relevant credentials as an Airflow connection, then reference the connection in the operator.

  • You might see some additional Google-owned service accounts in your project's IAM policy or in GCP Console (e.g. service-<var>PROJECT_ID</var>@cloudcomposer-accounts.iam.gserviceaccount.com). For information about the types and roles available, see Service Accounts.

  • Domain restricted sharing for Cloud Composer is currently in Beta. If you have enabled the domain restricted sharing policy, you must use the Beta API when creating a Cloud Composer environment. Please refer to Beta Feature Support to learn how to deploy a Cloud Composer environment using the Beta API.

  • Support for VPC Service Controls is currently in Beta. See Configuring VPC Service Controls to learn how to deploy Cloud Composer environments inside a security perimeter. For more information, see the VPC Service Controls known limitations.

Beta features

This section lists features that are currently available in Beta.

  • Web server network access control: This feature lets you specify the IP ranges that can access the Airflow web server for your environment.

  • Machine type for Airflow web server: This parameter lets you specify the type of the Google App Engine Virtual Machine to run the Airflow web server.

  • Machine type for Airflow Database: This parameter lets you specify the type of a machine that will be used for running CloudSQL instance used to run the Airflow database.

Creating a new environment

To create a Cloud Composer environment:

Console

  1. Open the Create Environment page in the Google Cloud Console.

    Open the Create Environment page

  2. Enter a name for your environment.

    The name must start with a lowercase letter followed by up to 63 lowercase letters, numbers, or hyphens, and cannot end with a hyphen. The environment name is used to create subcomponents for the environment, so you must provide a name that is also valid as a Cloud Storage bucket name. See Bucket naming guidelines for a list of restrictions.

  3. Under Node configuration, specify the settings for nodes in the Google Kubernetes Engine cluster. If you do not specify a setting, the default is used.

    Setting Description
    Node count The number of Google Kubernetes Engine nodes used to run the environment. The default is 3 nodes. The node count is the only Google Kubernetes Engine cluster setting that you can change after environment creation.
    Location (Required) The Compute Engine region where the environment is created.
    Zone suffix The Compute Engine zone where the virtual machine instances that run Apache Airflow are created. A random zone within the location is selected if unspecified.
    Machine type The Compute Engine machine type used for cluster instances. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type is n1-standard-1.
    Disk size The disk size in GB used for the node VM instances. The minimum size is 20 GB. The default size is 100 GB.
    OAuth Scopes The set of Google API scopes made available on all node VM instances. The default is https://www.googleapis.com/auth/cloud-platform and must be included in the list of specified scopes.
    Service account The Google Cloud service account to be used by the node VM instances. The default Compute Engine service account is used if unspecified.
    Tags The list of instance tags applied to all the node VM instances. Tags are used to identify valid sources or targets for network firewalls. Each tag within the list must comply with RFC 1035.
    Image version The Cloud Composer version to use for your environment (includes Cloud Composer and Airflow version). For default version information, see Versions list.
    Python version The Python version to use for your environment. Supported versions are Python 2 and Python 3. The default version is 3.
  4. Under Network configuration, specify the network settings for the Google Kubernetes Engine cluster. If you do not specify a setting, the default is used.

    Setting Description
    Enable VPC-native (using alias IP) Creates a VPC-native GKE cluster with alias IPs for your environment. The default is a routes-based GKE cluster. Required for a private IP Cloud Composer environment.
    Network The Virtual Private Cloud network that is used for machine communications. The network is required to specify a subnetwork. The default network is used if unspecified. Shared VPC requires a host project.
    Subnetwork The Virtual Private Cloud subnetwork that is used for machine communications. If your network uses a custom-mode network, the subnetwork is required.
    Pod IP Address Allocation The secondary range to allocate IP addresses for pods in the GKE cluster. If unspecified, a new secondary range is created. This setting is permanent.
    Service IP Address Allocation The secondary range to reserve space for Cloud Composer services. If unspecified, a new secondary range is created. This setting is permanent.
    Private IP environment Enables a private IP Cloud Composer environment. Disabled by default.
    Access GKE master using its external IP address Enables public access to the GKE cluster master. Requires Private IP environment.
    GKE Master IP range The private RFC 1918 range for the master's VPC. If unspecified, uses the default value 172.16.0.0/28. Required for Private IP environment.

    Ensure that secondary ranges are large enough to accommodate the cluster's size and anticipated growth. For example, the network prefixes of the secondary ranges for a 3-node Cloud Composer environment should be no longer than:

    • Pods: /22
    • Services: /27

    See Creating a VPC-native cluster for guidelines on configuring secondary ranges for Pods and Services.

  5. (Beta) Under Web server network access control, specify the IP ranges that can access the Airflow web server for your environment.

    Setting Description
    Allow access from all IP addresses (default) All IP ranges can access the Airflow web server.
    Allow access only from specific IP addresses Only specific IP ranges can access the web server. To add a new range, click Add IP range. To remove a range, click the trash button for that row. To deny all IP ranges, delete all rows.
  6. (Optional) To change or override the default values in the Airflow configuration file (airflow.cfg), click Add Airflow configuration property.

  7. (Optional) To configure environment variables, click Add environment variable. See Environment Variables for requirements.

  8. (Optional) To add a label, click Add labels.

    Label keys and label values can only contain letters, numbers, dashes, and underscores. Label keys must start with a letter or number.

  9. Click Create.

gcloud

gcloud composer environments create ENVIRONMENT_NAME \
    --location LOCATION \
    OTHER_ARGUMENTS

The following parameters are required:

  • ENVIRONMENT_NAME is the name of the environment. Must match the pattern: ^[a-z](?:[-0-9a-z]{0,62}[0-9a-z])?$. The environment name is used to create subcomponents for the environment, so you must provide a name that is also valid as a Cloud Storage bucket name. See Bucket naming guidelines for a list of restrictions.
  • LOCATION is the Compute Engine region where the environment is located. Ensure that the location you specify is one where Composer is available.

The following parameters are optional:

  • airflow-configs is a list of SECTION_NAME-PROPERTY_NAME=VALUE Airflow configuration overrides. The section name and property name must be separated by a hyphen.
  • cloud-sql-machine-type is a machine type used for the Cloud SQL instance used as Airflow database. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type is db-n1-standard-2. This parameter is in Beta, and requires the gcloud beta composer environments create command. The possible values for this parameter are: db-n1-standard-2, db-n1-standard-4, db-n1-standard-8 and db-n1-standard-16 and specification for these machines can be found on Cloud SQL page
  • disk-size is the disk size in GB used for the node VMs. The minimum size is 20 GB. The default disk size is 100 GB.
  • env-variables is a list of NAME=VALUE environment variables that are set on the Airflow scheduler, worker, and web server processes.
  • enable-private-environment enables a private IP Cloud Composer environment.
    • master-ipv4-cidr is the private RFC 1918 range for the master's VPC. Required when enable-private-environment is true.
  • enable-private-endpoint enables public access to the GKE cluster master. Requires enable-private-environment.
  • enable-ip-alias enables VPC Native using alias IP addresses. Required when enable-private environment is true or to configure secondary ranges for pods and services:
    • cluster-secondary-range-name or cluster-ipv4-cidr configures the configure the secondary range for pods.
    • services-secondary-range-name orservices-ipv4-cidr configures the secondary range for services.
  • image-version is the composer-addon version and Airflow version to use for your environment in the form composer-a.b.c-airflow-x.y.z. For version alias and default version information, see Cloud Composer Versioning.
  • labels are user-specified labels that are attached to the environment and its resources.
  • machine-type is the Compute Engine machine type. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type is n1-standard-1.
  • network is the Virtual Private Cloud network used for machine communications.
    • The network is required to specify a subnetwork. The default network is used if unspecified.
    • When using Shared VPC, the network's relative resource name be provided using the format projects/HOST_PROJECT_ID/global/networks/NETWORK_ID. For Shared VPC subnetwork requirements, see subnetwork below.
  • node-count is the number of GKE nodes used to run the environment. The default node count is 3. The node count is the only Google Kubernetes Engine cluster setting that you can change after environment creation.
  • oauth-scopes is the set of Google API scopes made available on all of the node VMs. The default OAuth scope is https://www.googleapis.com/auth/cloud-platform and must be included in the list of scopes if specified.
  • python-version is the Python version to use for your environment. Supported versions are Python 2 and Python 3. The default version is 2.
  • subnetwork is the Compute Engine subnetwork to which the environment is connected.
    • If your network uses a custom-mode network, the subnetwork is required.
    • When creating a Shared VPC environment using gcloud, you must use the secondary IP ranges composer-pods and composer-services. You can specify different secondary range names by using the Cloud Composer API. The subnetwork name must also be specified as a relative resource name using the format projects/HOST_PROJECT_ID/regions/REGION_ID/subnetworks/SUBNET_ID.
  • service-account is the Google Cloud service account to be used by the node VM instances. The default Compute Engine service account is used if unspecified.
  • tags is the list of instance tags applied to all the node VMs. Tags are used to identify valid sources or targets for network firewalls. Each tag within the list must comply with RFC 1035.
  • web-server-machine-type is a machine type used to run Airflow web server. The machine type determines the number of CPUs and the amount of memory for your environment. The default machine type is composer-n1-webserver-2. This parameter is in Beta, and requires the gcloud beta composer environments create command. The possible values for this parameter are: composer-n1-webserver-2, composer-n1-webserver-4 and composer-n1-webserver-8.

The following example creates an environment running the latest supported Cloud Composer image version in the us-central1 region that uses the n1-standard-2 machine type with a beta environment label:

gcloud beta composer environments create test-environment \
    --location us-central1 \
    --zone us-central1-f \
    --machine-type n1-standard-2 \
    --image-version composer-latest-airflow-x.y.z \
    --labels env=beta  

The following Shared VPC example creates an environment in the host project. The environment is in the us-central1 region and uses the n1-standard-2 machine type with a beta environment label:

gcloud beta composer environments create host-project-environment \
    --network vpc-network-name --subnetwork vpc-subnetwork-name
    --location us-central1 \
    --zone us-central1-f \
    --machine-type n1-standard-2 \
    --labels env=beta  

API

To create a new Cloud Composer environment with the Cloud Composer REST API, construct an environments.create API request, filling in the Environment resource with your configuration information.

Terraform

To configure this environment using Terraform, add the following resource block to your Terraform configuration and run terraform apply.

resource "google_composer_environment" "example-resource" {
  name   = "ENVIRONMENT_NAME"
  region = "LOCATION"
}

The following parameters are required:

  • name, where ENVIRONMENT_NAME is the name of the environment. Must match the pattern: ^[a-z](?:[-0-9a-z]{0,62}[0-9a-z])?$. The environment name is used to create subcomponents for the environment, so you must provide a name that is also valid as a Cloud Storage bucket name. See Bucket naming guidelines for a list of restrictions.
  • region, where LOCATION is the Compute Engine region where the environment is located. Ensure that the location you specify is one where Composer is available.

Usage of additional optional parameters is defined in the Terraform Argument Reference

The following example creates an environment running the latest supported Cloud Composer image version in the us-central1 region that uses the n1-standard-2 machine type with a beta environment label. To configure this environment using Terraform, add the following resource block to your Terraform configuration and run terraform apply:

resource "google_composer_environment" "example-resource" {
  name   = "example-environment"
  region = "us-central1"

  config {
    node_config {
      zone = "us-central1-f"
      machine_type = "n1-standard-2"
    }
    software_config {
      image_version = "composer-latest-airflow-x.y.z"
    }
  }
  labels = {"env": "beta"}
}

The following Shared VPC example creates an environment in the host project. The environment is in the us-central1 region and uses the n1-standard-2 machine type with a beta environment label. To configure this environment using Terraform, add the following resource block to your Terraform configuration and run terraform apply:

resource "google_composer_environment" "example-resource" {
  name   = "host-project-environment"
  region = "us-central1"

  config {
    node_config {
      zone = "us-central1-f"
      machine_type = "n1-standard-2"
      network = "vpc-network-name"
      subnetwork = "vpc-subnetwork-name"
    }
    software_config {
      image_version = "composer-latest-airflow-x.y.z"
    }
  }
  labels = {"env": "beta"}
}

Configuring email notifications

Configuring SendGrid email services

To receive notifications, configure your environment variables to send email through the SendGrid email service.

  1. If you haven't already, sign up with SendGrid via the Google Cloud Console and create an API key. As a Google Cloud developer, you can start with 12,000 free emails per month.

  2. In the Cloud Console, open the Create Environment page.

    Open the Create Environment page

  3. Under Node configuration, click Add environment variable.

  4. Enter the following environment variables:

    Name Value
    SENDGRID_MAIL_FROM The From: email address, such as noreply-composer@<your-domain>.
    SENDGRID_API_KEY Your SendGrid API key.
  5. To test SendGrid configuration:

    1. Create a test DAG that uses the EmailOperator.
    2. Upload the DAG to your environment and check that the EmailOperator task succeeds.
    3. Sign in to SendGrid in with your SendGrid credentials.
    4. In the SendGrid UI, go to Activity page.
    5. Search the list for the email. You should see that SendGrid processed and delivered the email.
    6. If the email is not processed and delivered:
      • Check your Sendgrid configurations.
      • Verify that the SENDGRID_MAIL_FROM and SENDGRID_API_KEY environment variables are correct.
      • Check the spam filter in your email client.

Configuring third-party SMTP services

To send email through a third-party SMTP service, you must override the email_backend Airflow configuration.

  1. Open the Create Environment page.

    Open the Create Environment page

  2. Under Airflow configuration overrides, click Add Airflow configuration override.
  3. Enter the following configuration properties:

    Section Key Value
    email email_backend airflow.utils.email.send_email_smtp
    smtp smtp_host The hostname for the SMTP server.
    smtp smtp_user The user name on the SMTP server.
    smtp smtp_port A port other than port 25. Port 25 is blocked.
    smtp smtp_password The default SMTP password for Airflow. You cannot configure a new password.
    smtp smtp_mail_from The From: email address, such as noreply-composer@.
    smtp smtp_starttls For enhanced security, set to True.
    smtp smtp_ssl For enhanced security, set to True.

For other SMTP configurations, see the default_airflow.cfg for your Airflow release.

Overriding Airflow configurations

When you create or update an environment, you can override Apache Airflow configuration properties. Some properties are blocked.

Console

  1. Open the Create Environment page.

    Open the Create Environment page

  2. Under Airflow configuration overrides, click Add Airflow configuration override.

  3. Enter the Section, Key, and new Value for the configuration.

For example:

Section Key Value
webserver dag_orientation RL

gcloud

To override Airflow configurations when you create an environment:

gcloud composer environments create ENVIRONMENT_NAME \
    --location LOCATION \
    --airflow-configs=KEY=VALUE,KEY=VALUE,...

where:

  • ENVIRONMENT_NAME is the name of the environment.
  • LOCATION is the Compute Engine region where the environment is located.
  • KEY=VALUE is the configuration section and the property name separated by a hyphen, such as core-print_stats_interval, and its corresponding value.

For example:

gcloud composer environments create test-environment \
    --location us-central1 \
    --airflow-configs=core-load_example=True,webserver-dag_orientation=TB 

The command terminates when the operation is finished. To avoid waiting, use the --async flag. See the 'gcloud composer environments update' reference page for additional examples.

API

To override Airflow properties during the creation of the Cloud Composer environment with the Cloud Composer REST API, fill in the Environment resource's optional airflowConfigOverrides field when constructing the environments.create request.

What's next