Airflow Summit 2023
Join the Airflow community on September 19—21 during the Airflow Summit 2023 conference to learn more about Airflow and share your expertise. Call for papers is now open

Scale environments

Cloud Composer 1 | Cloud Composer 2

This page describes how to scale Cloud Composer environments in Cloud Composer 2.

Other pages about scaling:

Scale vertically and horizontally

Options for horizontal scaling:

Options for vertical scaling:

Adjust the minimum and maximum number of workers

You can set the minimum and maximum number of workers for your environment. Cloud Composer automatically scales your environment within the set limits. You can adjust these limits at any time.

Console

  1. Go to the Environments page in the Google Cloud console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads configuration item, click Edit.

  5. In the Workloads configuration dialog, in the Workers autoscaling section adjust the limits for Airflow workers:

    • In the Minimum number of workers field, specify the number of Airflow workers that your environment must always run. The number of workers in your environment does not go below this number, even if a lower number of workers can handle the load.

    • In the Maximum number of workers field, specify the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

  6. Click Save.

gcloud

Run the following Google Cloud CLI command:

gcloud composer environments update ENVIRONMENT_NAME \
  --location LOCATION \
  --min-workers WORKERS_MIN \
  --max-workers WORKERS_MAX

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • WORKERS_MIN with the minimum number of Airflow workers that your environment can run. The number of workers in your environment does not go below this number, even if a lower number of workers can handle the load.
  • WORKERS_MAX with the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

Example:

gcloud composer environments update example-environment \
  --location us-central1 \
  --min-workers 2 \
  --max-workers 6

API

  1. Construct an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.softwareConfig.workloadsConfig.worker.minCount,config.softwareConfig.workloadsConfig.worker.maxCount mask.

    2. In the request body, in the minCount and maxCount fields, specify the new worker limits.

"config": {
  "workloadsConfig": {
    "worker": {
      "minCount": WORKERS_MIN,
      "maxCount": WORKERS_MAX
    }
  }
}

Replace:

  • WORKERS_MIN with the minimum number of Airflow workers that your environment can run. The number of workers in your environment does not go below this number, even if a lower number of workers can handle the load.
  • WORKERS_MAX with the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.workloadsConfig.worker.minCount,
// config.workloadsConfig.worker.maxCount

"config": {
  "workloadsConfig": {
    "worker": {
      "minCount": 2,
      "maxCount": 6
    }
  }
}

Terraform

The min_count and max_count fields in the workloadsConfig.worker block specify the minimum and maximum number of workers in your environment:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {
      worker {
        min_count = WORKERS_MIN
        max_count = WORKERS_MAX
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • WORKERS_MIN with the minimum number of Airflow workers that your environment can run. The number of workers in your environment does not go below this number, even if a lower number of workers can handle the load.
  • WORKERS_MAX with the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {
      worker {
        min_count = 2
        max_count = 6
      }
    }

  }
}

Adjust the number of schedulers

Your environment can run more than one Airflow scheduler at the same time. Use multiple schedulers to distribute load between several scheduler instances for better performance and reliability.

You can have up to 10 schedulers in your environment.

Increasing the number of schedulers does not always improve Airflow performance. For example, having only one scheduler might provide better performance than having two. This might happen when the extra scheduler is not utilized, and thus consumes resources of your environment without contributing to overall performance. The actual scheduler performance depends on the number of Airflow workers, the number of DAGs and tasks that run in your environment, and the configuration of both Airflow and the environment.

We recommend starting with two schedulers and then monitoring the performance of your environment. If you change the number of schedulers, you can always scale your environment back to the original number of schedulers.

For more information about configuring multiple schedulers, see Airflow documentation.

To change the number of schedulers for your environment:

Console

  1. Go to the Environments page in the Google Cloud console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads configuration item, click Edit.

  5. In the Workloads configuration dialog, in the Number of schedulers drop-down list, set the number of schedulers for your environment.

  6. Click Save.

gcloud

Run the following Google Cloud CLI command:

gcloud composer environments update ENVIRONMENT_NAME \
  --location LOCATION \
  --scheduler-count SCHEDULER_COUNT

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • SCHEDULER_COUNT with the number of schedulers.

Example:

gcloud composer environments update example-environment \
  --location us-central1 \
  --scheduler-count 2

API

  1. Create an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.softwareConfig.workloadsConfig.scheduler mask.

    2. In the request body, in the count field, specify the number of schedulers.

"config": {
  "workloadsConfig": {
    "scheduler": {
      "count": SCHEDULER_COUNT
    }
  }
}

Replace:

  • SCHEDULER_COUNT with the number of schedulers.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environmentsexample-environment?updateMask=
// config.workloadsConfig.scheduler

"config": {
  "workloadsConfig": {
    "scheduler": {
      "count": 2
    }
  }
}

Terraform

The count field in the workloads_config.scheduler block specifies the number of schedulers in your environment:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {
      scheduler {
        count = SCHEDULER_COUNT
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • SCHEDULER_COUNT with the number of schedulers.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {

      scheduler {
        count = 2
      }

    }
  }
}

Adjust the number of triggerers

By default, the Airflow triggerer is disabled in your environment, and the number of triggerers is set to 0. After you set the number of triggerers to 1, the triggerer is enabled and you can use deferrable operators in your DAGs.

Even if the triggerer is disabled, your environment's cluster still runs a workload for it, with zero pods. If the triggerer is enabled, then it is billed as other environment components, with Cloud Composer Compute SKUs.

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads item, click Edit. The Workloads configuration pane opens.

  5. Select Enable triggerer. As an option, you can also adjust the CPU and memory for the triggerer.

  6. Click Save and wait until your environment is updated.

gcloud

Run the following Google Cloud CLI command:

gcloud beta composer environments update ENVIRONMENT_NAME \
  --location LOCATION \
  --triggerer-count TRIGGERER_COUNT

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • TRIGGERER_COUNT with the number of triggerers.

Example:

gcloud composer environments update example-environment \
  --location us-central1 \
  --triggerer-count 1

API

  1. Create an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.workloadsConfig.triggerer mask.

    2. Your environment can have only one triggerer. In the request body, specify the number of triggerers in the following way:

      • To enable the Airflow triggerer, set the count value to 1.
      • To disable the Airflow triggerer, set the count value to 0.
  "config": {
    "workloadsConfig": {
      "triggerer": {
        "count": 1
      }
    }
  }

The following example enables the triggerer and sets default CPU and memory parameters for it. If you want to use custom parameters, specify them in the same API call.

// PATCH https://composer.googleapis.com/v1beta1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.workloadsConfig.triggerer
"config": {
  "workloadsConfig": {
    "triggerer": {
      "count": 1
    }
  }
}

Terraform

The count field in the workloads_config.triggerer block specifies the number of triggerers in your environment:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {
      triggerer {
        count = TRIGGERER_COUNT
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • TRIGGERER_COUNT with the number of triggerers.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {

      triggerer {
        count = 1
      }

    }
  }
}

Adjust worker, scheduler, triggerer and web server scale and performance parameters

You can specify the amount of CPUs, memory, and disk space used by your environment. In this way, you can increase performance of your environment, in addition to horizontal scaling provided by using multiple workers and schedulers.

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads item, click Edit. The Workloads configuration pane opens.

  5. In the Number of schedulers and Number of triggerers drop-down lists select the number of schedulers and triggerers in your environment.

  6. In the Workloads configuration pane, in the CPU, Memory, and Storage fields specify the number of CPUs, memory, and storage for Airflow schedulers, triggerer, web server, and workers.

  7. Click Save.

gcloud

The following arguments control the CPU, memory, and disk space parameters of Airflow schedulers, web server, and workers. Each scheduler and worker uses the specified amount of resources.

  • --scheduler-cpu specifies the number of CPUs for an Airflow scheduler.
  • --scheduler-memory specifies the amount of memory for an Airflow scheduler.
  • --scheduler-storage specifies the amount of disk space for an Airflow scheduler.
  • --triggerer-cpu specifies the number of CPUs for an Airflow triggerer.
  • --triggerer-memory specifies the amount of memory for an Airflow triggerer.
  • --web-server-cpu specifies the number of CPUs for the Airflow web server.
  • --web-server-memory specifies the amount of memory for the Airflow web server.
  • --web-server-storage specifies the amount of disk space for the Airflow web server.
  • --worker-cpu specifies the number of CPUs for an Airflow worker.
  • --worker-memory specifies the amount of memory for an Airflow worker.
  • --worker-storage specifies the amount of disk space for an Airflow worker.
gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
    --scheduler-cpu SCHEDULER_CPU \
    --scheduler-memory SCHEDULER_MEMORY \
    --scheduler-storage SCHEDULER_STORAGE \
    --triggerer-cpu TRIGGERER_CPU \
    --triggerer-memory TRIGGERER_MEMORY \
    --web-server-cpu WEB_SERVER_CPU \
    --web-server-memory WEB_SERVER_MEMORY \
    --web-server-storage WEB_SERVER_STORAGE \
    --worker-cpu WORKER_CPU \
    --worker-memory WORKER_MEMORY \
    --worker-storage WORKER_STORAGE

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • SCHEDULER_CPU with the number of CPUs for a scheduler, in vCPU units.
  • SCHEDULER_MEMORY with the amount of memory for a scheduler.
  • SCHEDULER_STORAGE with the disk size for a scheduler.
  • TRIGGERER_CPU with the number of CPUs for a triggerer, in vCPU units.
  • TRIGGERER_MEMORY with the amount of memory for a triggerer.
  • WEB_SERVER_CPU with the number of CPUs for web server, in vCPU units.
  • WEB_SERVER_MEMORY with the amount of memory for web server.
  • WEB_SERVER_STORAGE with the amount of memory for the web server.
  • WORKER_CPU with the number of CPUs for a worker, in vCPU units.
  • WORKER_MEMORY with the amount of memory for a worker.
  • WORKER_STORAGE with the disk size for a worker.

Example:

gcloud composer environments update example-environment \
    --location us-central1 \
    --scheduler-cpu 0.5 \
    --scheduler-memory 2.5GB\
    --scheduler-storage 2GB \
    --triggerer-cpu 1 \
    --triggerer-memory 1GB \
    --web-server-cpu 1 \
    --web-server-memory 2.5GB \
    --web-server-storage 2GB \
    --worker-cpu 1 \
    --worker-memory 2GB \
    --worker-storage 2GB

API

  1. Create an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the fields that you want to update. For example, to update all parameters for schedulers, specify the config.workloadsConfig.scheduler.cpu,config.workloadsConfig.scheduler.memoryGb,config.workloadsConfig.scheduler.storageGB mask.

    When you update triggerer parameters, specify the config.workloadsConfig.triggerer mask. It is not possible to specify masks for individual parameters of the triggerer.

    1. In the request body, specify the scale and performance parameters.
  "config": {
    "workloadsConfig": {
      "scheduler": {
        "cpu": SCHEDULER_CPU,
        "memoryGb": SCHEDULER_MEMORY,
        "storageGb": SCHEDULER_STORAGE
      },
      "triggerer": {
        "count": 1,
        "cpu": TRIGGERER_CPU,
        "memoryGb": TRIGGERER_MEMORY
      }
      "webServer": {
        "cpu": WEB_SERVER_CPU,
        "memoryGb": WEB_SERVER_MEMORY,
        "storageGb": WEB_SERVER_STORAGE
      },
      "worker": {
        "cpu": WORKER_CPU,
        "memoryGb": WORKER_MEMORY,
        "storageGb": WORKER_STORAGE
      }
    }
  }

Replace:

  • SCHEDULER_CPU with the number of CPUs for a scheduler, in vCPU units.
  • SCHEDULER_MEMORY with the amount of memory for a scheduler, in GB.
  • SCHEDULER_STORAGE with the disk size for a scheduler, in GB.
  • TRIGGERER_CPU with the number of CPUs for a triggerer, in vCPU units.
  • TRIGGERER_MEMORY with the amount of memory for a triggerer, in GB.
  • WEB_SERVER_CPU with the number of CPUs for the web server, in vCPU units.
  • WEB_SERVER_MEMORY with the amount of memory for the web server, in GB.
  • WEB_SERVER_STORAGE with the disk size for the web server, in GB.
  • WORKER_CPU with the number of CPUs for a worker, in vCPU units.
  • WORKER_MEMORY with the amount of memory for a worker, in GB.
  • WORKER_STORAGE with the disk size for a worker, in GB.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.workloadsConfig.scheduler.cpu,
// config.workloadsConfig.scheduler.memoryGB,
// config.workloadsConfig.scheduler.storageGb,
// config.workloadsConfig.triggerer
// config.workloadsConfig.webServer.cpu,
// config.workloadsConfig.webServer.memoryGb,
// config.workloadsConfig.webServer.storageGb,
// config.workloadsConfig.worker.cpu,
// config.workloadsConfig.worker.memoryGb,
// config.workloadsConfig.worker.storageGb

"config": {
  "workloadsConfig": {
    "scheduler": {
      "cpu": 0.5,
      "memoryGb": 2.5,
      "storageGb": 2
    },
    "triggerer": {
      "count": 1,
      "cpu": 1,
      "memoryGb": 1
    },
    "webServer": {
      "cpu": 0.5,
      "memoryGb": 2.5,
      "storageGb": 2
    },
    "worker": {
      "cpu": 1,
      "memoryGb": 2,
      "storageGb": 2
    }
  }
}

Terraform

The following blocks in the workloadsConfig block control the CPU, memory, and disk space parameters of Airflow schedulers, web server, triggerers, and workers. Each scheduler, triggerer, and worker uses the specified amount of resources.

  • The scheduler.cpu field specifies the number of CPUs for an Airflow scheduler.
  • The scheduler.memory_gb field specifies the amount of memory for an Airflow scheduler.
  • The scheduler.storage_gb field specifies the amount of disk space for a scheduler.
  • The triggerer.cpu field specifies the number of CPUs for an Airflow triggerer.
  • The triggerer.memory_gb field specifies the amount of memory for an Airflow triggerer.
  • The web_server.cpu field specifies the number of CPUs for the Airflow web server.
  • The web_server.memory_gb field specifies the amount of memory for the Airflow web server.
  • The web_server.storage_gb field specifies the amount of disk space for the Airflow web server.
  • The worker.cpu field specifies the number of CPUs for an Airflow worker.
  • The worker.memory_gb field specifies the amount of memory for an Airflow worker.
  • The worker.storage_gb field specifies the amount of disk space for an Airflow worker.
resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {

      scheduler {
        cpu = SCHEDULER_CPU
        memory_gb = SCHEDULER_MEMORY
        storage_gb = SCHEDULER_STORAGE
      }
      triggerer {
        cpu = TRIGGERER_CPU
        memory_gb = TRIGGERER_MEMORY
        count = 1
      }
      web_server {
        cpu = WEB_SERVER_CPU
        memory_gb = WEB_SERVER_MEMORY
        storage_gb = WEB_SERVER_STORAGE
      }
      worker {
        cpu = WORKER_CPU
        memory_gb = WORKER_MEMORY
        storage_gb = WORKER_STORAGE
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • SCHEDULER_CPU with the number of CPUs for a scheduler, in vCPU units.
  • SCHEDULER_MEMORY with the amount of memory for a scheduler, in GB.
  • SCHEDULER_STORAGE with the disk size for a scheduler, in GB.
  • TRIGGERER_CPU with the number of CPUs for a triggerer, in vCPU units.
  • TRIGGERER_MEMORY with the amount of memory for a triggerer, in GB.
  • WEB_SERVER_CPU with the number of CPUs for the web server, in vCPU units.
  • WEB_SERVER_MEMORY with the amount of memory for the web server, in GB.
  • WEB_SERVER_STORAGE with the disk size for the web server, in GB.
  • WORKER_CPU with the number of CPUs for a worker, in vCPU units.
  • WORKER_MEMORY with the amount of memory for a worker, in GB.
  • WORKER_STORAGE with the disk size for a worker, in GB.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {

      scheduler {
        cpu = 0.5
        memory_gb = 1.875
        storage_gb = 1
      }
      triggerer {
        cpu = 0.5
        memory_gb = 0.5
        count = 1
      }
      web_server {
        cpu = 0.5
        memory_gb = 1.875
        storage_gb = 1
      }
      worker {
        cpu = 0.5
        memory_gb = 1.875
        storage_gb = 1
      }
    }

  }
}

Adjust the environment size

The Environment size controls the performance parameters of the managed Cloud Composer infrastructure that includes the Airflow database. Consider selecting a larger environment size if you want to run a large number of DAGs and tasks.

Console

  1. Go to the Environments page in the Google Cloud console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Core infrastructure item, click Edit.

  5. In the Core infrastructure dialog, in the Environment size field, specify the environment size.

  6. Click Save.

gcloud

The --environment-size argument controls the environment size:

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
    --environment-size ENVIRONMENT_SIZE

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • ENVIRONMENT_SIZE with small, medium, or large.

Example:

gcloud composer environments update example-environment \
    --location us-central1 \
    --environment-size medium

API

  1. Create an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.environmentSize mask.

    2. In the request body, specify the environment size.

  "config": {
    "environmentSize": "ENVIRONMENT_SIZE"
  }

Replace:

  • ENVIRONMENT_SIZE with the environment size, ENVIRONMENT_SIZE_SMALL, ENVIRONMENT_SIZE_MEDIUM, or ENVIRONMENT_SIZE_LARGE.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.environmentSize

"config": {
  "environmentSize": "ENVIRONMENT_SIZE_MEDIUM"
}

Terraform

The environment_size field in the config block controls the environment size:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    environment_size = "ENVIRONMENT_SIZE"

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • ENVIRONMENT_SIZE with the environment size, ENVIRONMENT_SIZE_SMALL, ENVIRONMENT_SIZE_MEDIUM, or ENVIRONMENT_SIZE_LARGE.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    environment_size = "ENVIRONMENT_SIZE_SMALL"

    }
  }
}

What's next