Scale environments

Cloud Composer 1 | Cloud Composer 2

This page describes how to scale Cloud Composer environments in Cloud Composer 2.

For information about how environment scaling works, see Environment scaling.

Options for horizontal scaling:

Options for vertical scaling:

Adjust the minimum and maximum number of workers

You can set the minimum and maximum number of workers for your environment. Cloud Composer automatically scales your environment within the set limits. You can adjust these limits at any time.

Console

  1. Go to the Environments page in the Google Cloud Console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads configuration item, click Edit.

  5. In the Workloads configuration dialog, in the Workers autoscaling section adjust the limits for Airflow workers:

    • In the Minimum number of workers field, specify the number of Airflow workers that your environment must always run. The number of workers in your environment does not go below this number, even if a lower number of workers can handle the load.

    • In the Maximum number of workers field, specify the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

  6. Click Save.

gcloud

Run the following gcloud beta composer command:

gcloud beta composer environments update ENVIRONMENT_NAME \
  --location LOCATION \
  --min-workers WORKERS_MIN \
  --max-workers WORKERS_MAX

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • WORKERS_MIN with the minimum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a lower number of workers can handle the load.
  • WORKERS_MAX with the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

Example:

gcloud beta composer environments update example-environment \
  --location us-central1 \
  --min-workers 2 \
  --max-workers 6

API

  1. Construct an environments.patch beta API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.softwareConfig.workloadsConfig.worker.minCount,config.softwareConfig.workloadsConfig.worker.maxCount mask.

    2. In the request body, in the minCount and maxCount fields, specify the new worker limits.

"config": {
  "workloadsConfig": {
    "worker": {
      "minCount": WORKERS_MIN,
      "maxCount": WORKERS_MAX
    }
  }
}

Replace:

  • WORKERS_MIN with the minimum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a lower number of workers can handle the load.
  • WORKERS_MAX with the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

Example:

// PATCH https://composer.googleapis.com/v1beta1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.workloadsConfig.worker.minCount,
// config.workloadsConfig.worker.maxCount

"config": {
  "workloadsConfig": {
    "worker": {
      "minCount": 2,
      "maxCount": 6
    }
  }
}

Terraform

The min_count and max_count fields in the workloadsConfig.worker block specify the minimum and maximum number of workers in your environment:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {
      worker {
        min_count = WORKERS_MIN
        max_count = WORKERS_MAX
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • WORKERS_MIN with the minimum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a lower number of workers can handle the load.
  • WORKERS_MAX with the maximum number of Airflow workers that your environment can run. The number of workers in your environment does not go above this number, even if a higher number of workers is required to handle the load.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {
      worker {
        min_count = 2
        max_count = 6
      }
    }

  }
}

Adjust the number of schedulers

Your environment can run more than one Airflow scheduler at the same time. Use multiple schedulers to distribute load between several scheduler instances for better performance and reliability. You can specify a number of schedulers up to the number of nodes in your environment.

Increasing the number of schedulers does not always improve Airflow performance. For example, having only one scheduler might provide better performance than having two. This might happen when the extra scheduler is not utilized, and thus consumes resources of your environment without contributing to overall performance. The actual scheduler performance depends on the number of Airflow workers, the number of DAGs and tasks that run in your environment, and the configuration of both Airflow and the environment.

We recommend starting with two schedulers and then monitoring the performance of your environment. If you change the number of schedulers, you can always scale your environment back to the original number of schedulers.

For more information about configuring multiple schedulers, see Airflow documentation.

To change the number of schedulers for your environment:

Console

  1. Go to the Environments page in the Google Cloud Console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads configuration item, click Edit.

  5. In the Workloads configuration dialog, in the Number of schedulers field, set the number of schedulers for your environment.

  6. Click Save.

gcloud

Run the following gcloud composer command:

gcloud composer environments update ENVIRONMENT_NAME \
  --location LOCATION \
  --scheduler-count SCHEDULER_COUNT

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • SCHEDULER_COUNT with the number of schedulers.

Example:

gcloud composer environments update example-environment \
  --location us-central1 \
  --scheduler-count 2

API

  1. Create an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.softwareConfig.workloadsConfig.scheduler mask.

    2. In the request body, in the count field, specify the number of schedulers.

"config": {
  "workloadsConfig": {
    "scheduler": {
      "count": SCHEDULER_COUNT
    }
  }
}

Replace:

  • SCHEDULER_COUNT with the number of schedulers.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environmentsexample-environment?updateMask=
// config.workloadsConfig.scheduler

"config": {
  "workloadsConfig": {
    "scheduler": {
      "count": 2
    }
  }
}

Terraform

The count field in the workloadsConfig.scheduler block specifies the number of schedulers in your environment:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {
      scheduler {
        count = SCHEDULER_COUNT
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • SCHEDULER_COUNT with the number of schedulers.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {

      scheduler {
        count = 2
      }

    }
  }
}

Adjust worker, scheduler, and web server scale and performance parameters

You can specify the amount of CPUs, memory, and disk space used by your environment. In this way, you can increase performance of your environment, in addition to horizontal scaling provided by using multiple workers and schedulers.

Console

  1. Go to the Environments page in the Google Cloud Console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Workloads configuration item, click Edit.

  5. In the Workloads configuration dialog, in the CPU, Memory, and Storage fields specify the number of CPUs, memory, and storage for Airflow schedulers, web server, and workers.

  6. Click Save.

gcloud

The following arguments control the CPU, memory, and disk space parameters of Airflow schedulers, web server, and workers. Each scheduler and worker uses the specified amount of resources.

  • --scheduler-cpu specifies the number of CPUs for an Airflow scheduler.
  • --scheduler-memory specifies the amount of memory for an Airflow scheduler.
  • --scheduler-storage specifies the amount of disk space for an Airflow scheduler.
  • --web-server-cpu specifies the number of CPUs for the Airflow web server.
  • --web-server-memory specifies the amount of memory for the Airflow web server.
  • --web-server-storage specifies the amount of disk space for the Airflow web server.
  • --worker-cpu specifies the number of CPUs for an Airflow worker.
  • --worker-memory specifies the amount of memory for an Airflow worker.
  • --worker-storage specifies the amount of disk space for an Airflow worker.
gcloud beta composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
    --scheduler-cpu SCHEDULER_CPU \
    --scheduler-memory SCHEDULER_MEMORY \
    --scheduler-storage SCHEDULER_STORAGE \
    --web-server-cpu WEB_SERVER_CPU \
    --web-server-memory WEB_SERVER_MEMORY \
    --web-server-storage WEB_SERVER_STORAGE \
    --worker-cpu WORKER_CPU \
    --worker-memory WORKER_MEMORY \
    --worker-storage WORKER_STORAGE \

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • SCHEDULER_CPU with the number of CPUs for a scheduler, in vCPU units.
  • SCHEDULER_MEMORY with the amount of memory for a scheduler.
  • SCHEDULER_STORAGE with the disk size for a scheduler.
  • WEB_SERVER_CPU with the number of CPUs for web server, in vCPU units.
  • WEB_SERVER_MEMORY with the amount of memory for web server.
  • WEB_SERVER_STORAGE with the amount of memory for the web server.
  • WORKER_CPU with the number of CPUs for a worker, in vCPU units.
  • WORKER_MEMORY with the amount of memory for a worker.
  • WORKER_STORAGE with the disk size for a worker.

Example:

gcloud beta composer environments update example-environment \
    --location us-central1 \
    --scheduler-cpu 0.5 \
    --scheduler-memory 2.5 \
    --scheduler-storage 2 \
    --web-server-cpu 1 \
    --web-server-memory 2.5 \
    --web-server-storage 2 \
    --worker-cpu 1 \
    --worker-memory 2 \
    --worker-storage 2 \

API

  1. Create an environments.patch beta API request.

  2. In this request:

    1. In the updateMask parameter, specify the fields that you want to update. For example, to update all parameters for schedulers, specify config.softwareConfig.workloadsConfig.scheduler.cpu,config.softwareConfig.workloadsConfig.scheduler.memoryGb,config.softwareConfig.workloadsConfig.scheduler.storageGB mask.

    2. In the request body, specify the scale and performance parameters.

  "config": {
    "workloadsConfig": {
      "scheduler": {
        "cpu": SCHEDULER_CPU,
        "memoryGb": SCHEDULER_MEMORY,
        "storageGb": SCHEDULER_STORAGE
      },
      "webServer": {
        "cpu": WEB_SERVER_CPU,
        "memoryGb": WEB_SERVER_MEMORY,
        "storageGb": WEB_SERVER_STORAGE
      },
      "worker": {
        "cpu": WORKER_CPU,
        "memoryGb": WORKER_MEMORY,
        "storageGb": WORKER_STORAGE
      }
    }
  }

Replace:

  • SCHEDULER_CPU with the number of CPUs for a scheduler, in vCPU units.
  • SCHEDULER_MEMORY with the amount of memory for a scheduler, in GB.
  • SCHEDULER_STORAGE with the disk size for a scheduler, in GB.
  • WEB_SERVER_CPU with the number of CPUs for the web server, in vCPU units.
  • WEB_SERVER_MEMORY with the amount of memory for the web server, in GB.
  • WEB_SERVER_STORAGE with the disk size for the web server, in GB.
  • WORKER_CPU with the number of CPUs for a worker, in vCPU units.
  • WORKER_MEMORY with the amount of memory for a worker, in GB.
  • WORKER_STORAGE with the disk size for a worker, in GB.

Example:

// PATCH https://composer.googleapis.com/v1beta1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.workloadsConfig.scheduler.cpu,
// config.workloadsConfig.scheduler.memoryGB,
// config.workloadsConfig.scheduler.storageGb,
// config.workloadsConfig.webServer.cpu,
// config.workloadsConfig.webServer.memoryGb,
// config.workloadsConfig.webServer.storageGb,
// config.workloadsConfig.worker.cpu,
// config.workloadsConfig.worker.memoryGb,
// config.workloadsConfig.worker.storageGb

"config": {
  "workloadsConfig": {
    "scheduler": {
      "cpu": 0.5,
      "memoryGb": 2.5,
      "storageGb": 2
    },
    "webServer": {
      "cpu": 0.5,
      "memoryGb": 2.5,
      "storageGb": 2
    },
    "worker": {
      "cpu": 1,
      "memoryGb": 2,
      "storageGb": 2
    }
  }
}

Terraform

The following blocks in the workloadsConfig block control the CPU, memory, and disk space parameters of Airflow schedulers, web server, and workers. Each scheduler and worker uses the specified amount of resources.

  • The scheduler.cpu field specifies the number of CPUs for an Airflow scheduler.
  • The scheduler.memoryGb field specifies the amount of memory for an Airflow scheduler.
  • The scheduler.storageGb field specifies the amount of disk space for a scheduler.
  • The webServer.cpu field specifies the number of CPUs for the Airflow web server.
  • The webServer.memoryGb field specifies the amount of memory for the Airflow web server.
  • The webServer.storageGb field specifies the amount of disk space for the Airflow web server.
  • The worker.cpu field specifies the number of CPUs for an Airflow worker.
  • The worker.memoryGb field specifies the amount of memory for an Airflow worker.
  • worker.storageGb specifies the amount of disk space for an Airflow worker.
resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    workloads_config {

      scheduler {
        cpu = SCHEDULER_CPU
        memory_gb = SCHEDULER_MEMORY
        storage_gb = SCHEDULER_STORAGE
      }
      web_server {
        cpu = WEB_SERVER_CPU
        memory_gb = WEB_SERVER_MEMORY
        storage_gb = WEB_SERVER_STORAGE
      }
      worker {
        cpu = WORKER_CPU
        memory_gb = WORKER_MEMORY
        storage_gb = WORKER_STORAGE
      }
    }

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • SCHEDULER_CPU with the number of CPUs for a scheduler, in vCPU units.
  • SCHEDULER_MEMORY with the amount of memory for a scheduler, in GB.
  • SCHEDULER_STORAGE with the disk size for a scheduler, in GB.
  • WEB_SERVER_CPU with the number of CPUs for the web server, in vCPU units.
  • WEB_SERVER_MEMORY with the amount of memory for the web server, in GB.
  • WEB_SERVER_STORAGE with the disk size for the web server, in GB.
  • WORKER_CPU with the number of CPUs for a worker, in vCPU units.
  • WORKER_MEMORY with the amount of memory for a worker, in GB.
  • WORKER_STORAGE with the disk size for a worker, in GB.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    workloads_config {

      scheduler {
        cpu = 0.5
        memory_gb = 1.875
        storage_gb = 1
      }
      web_server {
        cpu = 0.5
        memory_gb = 1.875
        storage_gb = 1
      }
      worker {
        cpu = 0.5
        memory_gb = 1.875
        storage_gb = 1
      }
    }

  }
}

Adjust the environment size

The Environment size controls the performance parameters of the managed Cloud Composer infrastructure that includes the Airflow database. Consider selecting a larger environment size if you want to run a large number of DAGs and tasks.



Console

  1. Go to the Environments page in the Google Cloud Console:

    Go to the Environments page

  2. Select your environment.

  3. Go to the Environment configuration tab.

  4. In the Resources > Core infrastructure item, click Edit.

  5. In the Core infrastructure dialog, in the Environment size field, specify the environment size.

  6. Click Save.

gcloud

The --environment-size argument controls the environment size:

gcloud beta composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
    --environment-size ENVIRONMENT_SIZE

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • ENVIRONMENT_SIZE with small, medium, or large.

Example:

gcloud beta composer environments update example-environment \
    --location us-central1 \
    --environment-size medium

API

  1. Create an environments.patch beta API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.environmentSize mask.

    2. In the request body, specify the environment size.

  "config": {
    "environmentSize": "ENVIRONMENT_SIZE"
  }

Replace:

  • ENVIRONMENT_SIZE with the environment size, ENVIRONMENT_SIZE_SMALL, ENVIRONMENT_SIZE_MEDIUM, or ENVIRONMENT_SIZE_LARGE.

Example:

// PATCH https://composer.googleapis.com/v1beta1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.environmentSize

"config": {
  "environmentSize": "ENVIRONMENT_SIZE_MEDIUM"
}

Terraform

The environment_size field in the config block controls the environment size:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    environment_size = "ENVIRONMENT_SIZE"

  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the Compute Engine region where the environment is located.
  • ENVIRONMENT_SIZE with the environment size, ENVIRONMENT_SIZE_SMALL, ENVIRONMENT_SIZE_MEDIUM, or ENVIRONMENT_SIZE_LARGE.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    environment_size = "ENVIRONMENT_SIZE_SMALL"

    }
  }
}

What's next