Migrate environments to Cloud Composer 2 (from Airflow 1)

Cloud Composer 1 | Cloud Composer 2

This page explains how to transfer DAGs, data and configuration from your existing Cloud Composer 1, Airflow 1 environments to Cloud Composer 2, Airflow 2.

Other migration guides

From To Method Guide
Cloud Composer 1, Airflow 2 Cloud Composer 2, Airflow 2 Side-by-side, using snapshots Migration guide (snapshots)
Cloud Composer 1, Airflow 1 Cloud Composer 2, Airflow 2 Side-by-side, using snapshots Migration guide (snapshots)
Cloud Composer 1, Airflow 2 Cloud Composer 2, Airflow 2 Side-by-side, manual transfer Manual migration guide
Cloud Composer 1, Airflow 1 Cloud Composer 2, Airflow 2 Side-by-side, manual transfer This guide (manual migration)
Airflow 1 Airflow 2 Side-by-side, manual transfer Manual migration guide

Before you begin

  • Because Cloud Composer 2 uses Airflow 2, the migration includes switching your DAGs and environment configuration to Airflow 2. Check the migration guide from Airflow 1 to Airflow 2 for information about the breaking changes between Airflow 1 and Airflow 2 in Cloud Composer.

  • In this guide, you combine migration to Airflow 2 and migration to Cloud Composer 2 in one migration procedure. In this way, you do not need to migrate to a Cloud Composer 1 environment with Airflow 2 before migrating to Cloud Composer 2.

Step 1: Upgrade to Airflow 1.10.15

If your environment uses an Airflow version earlier than 1.10.15, upgrade your environment to a Cloud Composer version that uses Airflow 1.10.15.

Step 2: Check compatibility with Airflow 2

To check for potential conflicts with Airflow 2, use upgrade check scripts provided by Airflow in your existing Airflow 1.10.15 environment.

gcloud

  1. Run upgrade checks through the gcloud composer environments run command. Some upgrade checks that are relevant for standalone Airflow 1.10.15 are not relevant for Cloud Composer. The following command excludes these checks.

    gcloud composer environments run \
        COMPOSER_1_ENV  \
        --location=COMPOSER_1_LOCATION \
        upgrade_check \
        -- --ignore VersionCheckRule --ignore LoggingConfigurationRule \
        --ignore PodTemplateFileRule --ignore SendGridEmailerMovedRule
    

    Replace:

    • COMPOSER_1_ENV with the name of your Airflow 1.10.15 environment.
    • COMPOSER_1_LOCATION with the region where the environment is located.
  2. Check the output of the command. Update check scripts report potential compatibility issues in existing environments.

  3. Implement other changes to DAGs, as described in the Upgrading to Airflow 2.0+ guide, in the section about upgrading DAGs.

Step 3: Get the list of configuration overrides, custom PyPI packages, and environment variables

Console

Get the list of your Cloud Composer 1 environment's configuration overrides, custom PyPI packages, and environment variables:

  1. Go to the Environments page in the Google Cloud console:

    Go to Environments

  2. Select your Cloud Composer 1 environment.

  3. View environment variables on the Environment variables tab.

  4. View configuration overrides on the Airflow configurations overrides tabs.

  5. View custom PyPI packages on the PyPI packages tab.

gcloud

To get the list of environment variables, run:

gcloud composer environments describe \
    COMPOSER_1_ENV \
    --location COMPOSER_1_LOCATION \
    --format="value(config.softwareConfig.envVariables)"

To get the list of environment's Airflow configuration overrides, run:

gcloud composer environments describe \
    COMPOSER_1_ENV \
    --location COMPOSER_1_LOCATION \
    --format="value(config.softwareConfig.airflowConfigOverrides)"

To get the list of custom PyPI packages, run:

gcloud composer environments describe \
    COMPOSER_1_ENV \
    --location COMPOSER_1_LOCATION \
    --format="value(config.softwareConfig.pypiPackages)"

Replace:

  • COMPOSER_1_ENV with the name of your Cloud Composer 1 environment.
  • COMPOSER_1_LOCATION with the region where the Cloud Composer 1 environment is located.

Terraform

Skip this step. Your Cloud Composer 1 environment's configuration already lists configuration overrides, custom PyPI packages, and environment variables for your environment.

Step 4: Create a Cloud Composer 2 environment

In this step, create a Cloud Composer 2 environment. You can start with an environment preset that matches your expected resource demands, and later scale and optimize your environment further.

Console

Create a Cloud Composer 2 environment and specify configuration overrides and environment variables.

As an alternative, you can override Airflow configurations and environment variables after you create an environment.

Some configuration options from Airflow 1 use a different name and section in Airflow 2. For more information, see Configuration changes.

gcloud

Create a Cloud Composer 2 environment and specify configuration overrides and environment variables.

As an alternative, you can override Airflow configurations and environment variables after you create an environment.

Some configuration options from Airflow 1 use a different name and section in Airflow 2. For more information, see Configuration changes.

Terraform

Create a Cloud Composer 2 environment based on the configuration of the Cloud Composer 1 environment:

  1. Copy your Cloud Composer 1 environment's configuration.
  2. Change the name of your environment.
  3. Use the google-beta provider:

    resource "google_composer_environment" "example_environment_composer_2" {
      provider = google-beta
      # ...
    }
    
  4. Specify a Cloud Composer 2 image in the config.software_config block:

    software_config {
      image_version = "composer-2.6.6-airflow-2.6.3"
      # ...
    }
    
  5. If not aleady, specify configuration overrides and environment variables.

  6. Specify custom PyPI packages in the config.software_config.pypi_packages block:

    software_config {
    
      # ...
    
      pypi_packages = {
        numpy = ""
        scipy = ">=1.1.0"
      }
    
    }
    

Step 5: Install PyPI packages to the Cloud Composer 2 environment

After your Cloud Composer 2 environment is created, install custom PyPI packages to it.

Console

  1. Go to the Environments page in the Google Cloud console:

    Go to Environments

  2. Select your Cloud Composer 2 environment.

  3. Go to the PyPI packages tab and click Edit.

  4. Copy PyPI package requirements from your Cloud Composer 1 environment. Click Save and wait until the environment updates.

gcloud

  1. Create a requirements.txt file with the list of custom PyPI packages:

      numpy
      scipy>=1.1.0
    
  2. Update your environment and pass the requirements.txt file in the :to the --update-pypi-packages-from-file command:

    gcloud composer environments update COMPOSER_2_ENV \
      --location COMPOSER_2_LOCATION  \
      --update-pypi-packages-from-file requirements.txt
    

    Replace:

    • COMPOSER_2_ENV with the name of your Cloud Composer 2 environment.
    • COMPOSER_2_LOCATION with the region where the Cloud Composer 2 environment is located.

Terraform

Skip this step. You already installed custom PyPI packages when you created the environment.

Step 6: Transfer variables and pools

Airflow supports exporting variables and pools to JSON files. You can then import these files to your Cloud Composer 2 environment.

Airflow CLI commands used in this step operate on local files in Airflow workers. To upload or download the files, use the /data folder in the Cloud Storage bucket of your environment. This folder syncs to the /home/airflow/gcs/data/ directory in Airflow workers. In the Airflow CLI commands, specify /home/airflow/gcs/data/ in the FILEPATH parameter.

gcloud

  1. Export variables from your Cloud Composer 1 environment:

    gcloud composer environments run \
        COMPOSER_1_ENV \
        --location COMPOSER_1_LOCATION \
         variables -- -e /home/airflow/gcs/data/variables.json
    

    Replace:

    • COMPOSER_1_ENV with the name of your Cloud Composer 1 environment.
    • COMPOSER_1_LOCATION with the region where the Cloud Composer 1 environment is located.
  2. Export pools from your Cloud Composer 1 environment:

    gcloud composer environments run COMPOSER_1_ENV \
        --location COMPOSER_1_LOCATION \
         pool -- -e /home/airflow/gcs/data/pools.json
    

    Replace:

    • COMPOSER_1_ENV with the name of your Cloud Composer 1 environment.
    • COMPOSER_1_LOCATION with the region where the Cloud Composer 1 environment is located.
  3. Get your Cloud Composer 2 environment's bucket URI.

    1. Run the following command:

      gcloud composer environments describe COMPOSER_2_ENV \
          --location COMPOSER_2_LOCATION \
           --format="value(config.dagGcsPrefix)"
      

      Replace:

      • COMPOSER_2_ENV with the name of your Cloud Composer 2 environment.
      • COMPOSER_2_LOCATION with the region where the environment is located.
    2. In the output, remove the /dags folder. The result is the URI of your Cloud Composer 2 environment's bucket.

      For example, change gs://us-central1-example-916807e1-bucket/dags to gs://us-central1-example-916807e1-bucket.

  4. Transfer JSON files with variables and pools to your Cloud Composer 2 environment:

    gcloud composer environments storage data export \
        --destination=COMPOSER_2_BUCKET/data \
        --environment=COMPOSER_1_ENV \
        --location=COMPOSER_1_LOCATION \
        --source=variables.json
    
    gcloud composer environments storage data export \
        --destination=COMPOSER_2_BUCKET/data \
        --environment=COMPOSER_1_ENV \
        --location=COMPOSER_1_LOCATION \
        --source=pools.json
    

    Replace:

    • COMPOSER_2_BUCKET with the URI of your Cloud Composer 2 environment bucket, obtained on the previous step.
    • COMPOSER_1_ENV with the name of your Cloud Composer 1 environment.
    • COMPOSER_1_LOCATION with the region where the Cloud Composer 1 environment is located.
  5. Import variables and pools to Cloud Composer 2:

    gcloud composer environments run \
        COMPOSER_2_ENV \
        --location COMPOSER_2_LOCATION \
        variables import \
        -- /home/airflow/gcs/data/variables.json
    
    gcloud composer environments run \
        COMPOSER_2_ENV \
        --location COMPOSER_2_LOCATION \
        pools import \
        -- /home/airflow/gcs/data/pools.json
    
  6. Check that variables and pools are imported:

    gcloud composer environments run \
        COMPOSER_2_ENV \
        --location COMPOSER_2_LOCATION \
        variables list
    
    gcloud composer environments run \
        COMPOSER_2_ENV \
        --location COMPOSER_2_LOCATION \
        pools list
    
  7. Remove JSON files from the buckets:

    gcloud composer environments storage data delete \
        variables.json \
        --environment=COMPOSER_2_ENV \
        --location=COMPOSER_2_LOCATION
    
    gcloud composer environments storage data delete \
        pools.json \
        --environment=COMPOSER_2_ENV \
        --location=COMPOSER_2_LOCATION
    
    gcloud composer environments storage data delete \
        variables.json \
        --environment=COMPOSER_1_ENV \
        --location=COMPOSER_1_LOCATION
    
    gcloud composer environments storage data delete \
        pools.json \
        --environment=COMPOSER_1_ENV \
        --location=COMPOSER_1_LOCATION
    

Step 7: Transfer other data from your Cloud Composer 1 environment's bucket

Transfer plugins and other data from your Cloud Composer 1 environment's bucket.

gcloud

  1. Transfer plugins to your Cloud Composer 2 environment. To do so, export plugins from your Cloud Composer 1 environment's bucket to the /plugins folder in your Cloud Composer 2 environment's bucket:

    gcloud composer environments storage plugins export \
        --destination=COMPOSER_2_BUCKET/plugins \
        --environment=COMPOSER_1_ENV \
        --location=COMPOSER_1_LOCATION
    
  2. Check that the /plugins folder is successfully imported:

    gcloud composer environments storage plugins list \
        --environment=COMPOSER_2_ENV \
        --location=COMPOSER_2_LOCATION
    
  3. Export the /data folder from your Cloud Composer 1 environment to the Airflow 2 environment:

    gcloud composer environments storage data export \
        --destination=COMPOSER_2_BUCKET/data \
        --environment=COMPOSER_1_ENV \
        --location=COMPOSER_1_LOCATION
    
  4. Check that the /data folder is successfully imported:

    gcloud composer environments storage data list \
        --environment=COMPOSER_2_ENV \
        --location=COMPOSER_2_LOCATION
    

Step 8: Transfer connections

Airflow 1.10.15 does not support exporting connections. To transfer connections, manually create connections in your Cloud Composer 2 environment from the Cloud Composer 1 environment.

gcloud

  1. To get a list of connections in your Cloud Composer 1 environment, run:

    gcloud composer environments run COMPOSER_1_ENV \
        --location COMPOSER_1_LOCATION \
         connections -- --list
    
  2. To create a new connection in your Cloud Composer 2 environment, run the connections Airflow CLI command through gcloud. For example:

    gcloud composer environments run \
        COMPOSER_2_ENV \
        --location COMPOSER_2_LOCATION \
        connections add \
        -- --conn-host postgres.example.com \
        --conn-port 5432 \
        --conn-type postgres \
        --conn-login example_user \
        --conn-password example_password \
        --conn-description "Example connection" \
        example_connection
    

Step 9: Transfer user accounts

This step explains how to transfer users by creating them manually.

Airflow 1.10.15 does not support exporting users. To transfer users and connections, manually create new user accounts in your Airflow 2 environment from the Cloud Composer 1 environment.

Airflow UI

  1. To view a list of users in your Cloud Composer 1 environment:

    1. Open the Airflow web interface for your Cloud Composer 1 environment.

    2. Go to Admin > Users.

  2. To create a user in your Cloud Composer 2 environment:

    1. Open the Airflow web interface for your Cloud Composer 2 environment.

    2. Go to Security > List Users.

    3. Click Add a new record.

gcloud

  1. It is not possible to view a list of users through gcloud in Airflow 1. Please use the Airflow UI.

  2. To create a new user account in your Cloud Composer 2 environment, run the users create Airflow CLI command through gcloud. For example:

    gcloud composer environments run \
        COMPOSER_2_ENV \
        --location COMPOSER_2_LOCATION \
        users create \
        -- --username example_username \
        --firstname Example-Name \
        --lastname Example-Surname \
        --email example-user@example.com \
        --use-random-password \
        --role Op
    

    Replace:

    • COMPOSER_2_ENV with the name of your Cloud Composer 2 environment.
    • COMPOSER_2_LOCATION with the region where the Cloud Composer 2 environment is located.
    • All user configuration parameters with their values from your Cloud Composer 1 environment, including user's role.

Step 10: Make sure that your DAGs are ready for Airflow 2

Before transferring DAGs to your Cloud Composer 1 environment, make sure that:

  1. Upgrade checks scripts for your DAGs run successfully and there are no remaining compatibility issues.

  2. Your DAGs use correct import statements.

    For example, the new import statement for BigQueryCreateDataTransferOperator can look like this:

    from airflow.providers.google.cloud.operators.bigquery_dts \
        import BigQueryCreateDataTransferOperator
    
  3. Your DAGs are upgraded for Airflow 2. This change is compatible with Airflow 1.10.14 and later versions.

Step 11: Transfer DAGs to the Cloud Composer 2 environment

The following potential problems might happen when you transfer DAGs between environments:

  • If a DAG is enabled (not paused) in both environments, each environment runs its own copy of the DAG, as scheduled. This might lead to duplicate DAG runs for the same data and execution time.

  • Because of DAG catchup, Airflow schedules extra DAG runs, beginning from the start date specified in your DAGs. This happens because the new Airflow instance does not take into account the history of DAG runs from the Cloud Composer 1 environment. This might lead to a large number of DAG runs scheduled starting from the specified start date.

Prevent duplicate DAG runs

In your Cloud Composer 2 environment, In your Airflow 2 environment, add an Airflow configuration option override for the dags_are_paused_at_creation option. After you make this change, all new DAGs are paused by default.

Section Key Value
core dags_are_paused_at_creation True

Prevent extra or missing DAG runs

To avoid gaps and overlaps in execution dates disable catch up in your Cloud Composer 2. In this way, after you upload DAGs to your Cloud Composer 2 environment, Airflow does not schedule DAG runs that were already run in the Cloud Composer 1 environment. Add an Airflow configuration option override for the catchup_by_default option:

Section Key Value
scheduler catchup_by_default False

Transfer your DAGs to the Cloud Composer 2 environment

To transfer your DAGs to the Cloud Composer 2 environment:

  1. Upload the DAG from the Cloud Composer 1 environment to the Cloud Composer 2 environment. Skip the airflow_monitoring.py DAG.

  2. The DAGs are paused in the Cloud Composer 2 environment because of the configuration override, so no DAG runs are scheduled.

  3. In the Airflow web interface, go to DAGs and check for reported DAG syntax errors.

  4. At the time when you plan to transfer the DAG:

    1. Pause the DAGs in your Cloud Composer 1 environment.

    2. Un-pause the DAGs in your Cloud Composer 2 environment.

    3. Check that the new DAG runs are scheduled at the correct time.

    4. Wait for the DAG runs to happen in the Cloud Composer 2 environment and check if they were successful. If a DAG run was successful, do not unpause it in the Cloud Composer 1 environment; if you do so, a DAG run for the same time and date happens in your Cloud Composer 1 environment.

  5. If a specific DAG runs fails, attempt to troubleshoot the DAG until it successfully runs in Cloud Composer 2.

    If required, you can always fall back to the Cloud Composer 1 version of the DAG and execute DAG runs that failed in Cloud Composer 2 from your Cloud Composer 1 environment:

    1. Pause the DAG in your Cloud Composer 2 environment.

    2. Un-pause the DAG in your Cloud Composer 1 environment. This schedules catch up DAG runs for the time when the DAG was paused in Cloud Composer 1 environment.

Step 12: Monitor your Cloud Composer 2 environment

After you transfer all DAGs and configuration to the Cloud Composer 2 environment, monitor it for potential issues, failed DAG runs, and overall environment health. If the Cloud Composer 2 environment runs without problems for a sufficient period of time, consider deleting the Cloud Composer 1 environment.

What's next