Testing DAGs (workflows)

Before deploying DAGs to production, you can execute Airflow CLI sub-commands to parse DAG code in the same context under which the DAG is executed.

Testing during DAG creation

You can run a single task instance locally and view the log output. Viewing the output enables you to check for syntax and task errors. Testing locally does not check dependencies or communicate status to the database.

We recommend that you put the DAGs in a data/test folder in your test environment.

Creating a test directory

  1. In the Cloud Storage bucket for your environment, create a test directory and copy your DAGs to it.

    gsutil

    gsutil cp -r BUCKET_NAME/dags BUCKET_NAME/data/test
    

    Where:

    • BUCKET_NAME is the name of the Cloud Storage bucket associated with your Cloud Composer environment

    For example:

    gsutil

    gsutil cp -r gs://us-central1-test-environment-a12bc345-bucket/dags gs://us-central1-test-environment-a12bc345-bucket/data/test
    

Checking for syntax errors

  1. In the Cloud Storage bucket for your environment, create a test directory and copy your DAGs to it.

  2. To check for syntax errors, enter the following gcloud command:

    Airflow 1.10 CLI

    gcloud composer environments run \
      ENVIRONMENT_NAME \
      --location ENVIRONMENT_LOCATION \
       list_dags -- -sd /home/airflow/gcs/data/test
    

    Airflow 2.0 CLI

    gcloud beta composer environments run \
      ENVIRONMENT_NAME \
      --location ENVIRONMENT_LOCATION \
       dags list -- --subdir /home/airflow/gcs/data/test
    

    Where:

    • ENVIRONMENT_NAME is the name of the environment.
    • ENVIRONMENT_LOCATION is the Compute Engine region where the environment is located.

    For example:

    Airflow 1.10 CLI

    gcloud composer environments run \
    test-environment --location us-central1 \
    list_dags -- -sd /home/airflow/gcs/data/test
    

    Airflow 2.0 CLI

    gcloud beta composer environments run \
    test-environment --location us-central1 \
    dags list -- --subdir /home/airflow/gcs/data/test
    

Checking for task errors

  1. In the Cloud Storage bucket for your environment, create a test directory and copy your DAGs to it.

  2. To check for task-specific errors, enter the following gcloud command:

Airflow 1.10 CLI

gcloud composer environments run \
  ENVIRONMENT_NAME \
  --location ENVIRONMENT_LOCATION \
  test -- -sd /home/airflow/gcs/data/test DAG_ID \
  TASK_ID DAG_EXECUTION_DATE

Airflow 2.0 CLI

gcloud beta composer environments run \
  ENVIRONMENT_NAME \
  --location ENVIRONMENT_LOCATION \
  tasks test -- --subdir /home/airflow/gcs/data/test \
  DAG_ID TASK_ID \
  DAG_EXECUTION_DATE

Where:

  • ENVIRONMENT_NAME is the name of the environment.
  • ENVIRONMENT_LOCATION is the Compute Engine region where the environment is located.
  • DAG_ID is the ID of the DAG.
  • TASK_ID is the ID of the task.
  • DAG_EXECUTION_DATE is the execution date of the DAG. This date is used for templating purposes. Regardless of the date you specify here, the DAG runs immediately.

For example:

Airflow 1.10 CLI

gcloud composer environments run test-environment \
  --location us-central1 \
  test -- -sd /home/airflow/gcs/data/test \
  hello_world print_date 2021-04-22

Airflow 2.0 CLI

gcloud beta composer environments run \
  test-environment \
  --location us-central1 \
  tasks test -- --subdir /home/airflow/gcs/data/test \
  hello_world print_date 2021-04-22

Updating and testing a deployed DAG

To test updates to your DAGs in your test environment:

  1. Copy the deployed DAG that you want to update to data/test.
  2. Update the DAG.
  3. Test the DAG.
    1. Check for syntax errors.
    2. Check for task-specific errors.
  4. Make sure the DAG runs successfully.
  5. Turn off the DAG in your test environment.
    1. Go to the Airflow UI > DAGs page.
    2. If the DAG you're modifying runs constantly, turn off the DAG.
    3. To expedite outstanding tasks, click the task and Mark Success.
  6. Deploy the DAG to your production environment.
    1. Turn off the DAG in your production environment.
    2. Upload the updated DAG to the dags/ folder in your production environment.

FAQs for testing DAGs

How do I isolate DAG runs in my production and test environments?

For example, Airflow has a global repository of source code in the dags/ folder that all DAG runs share. You want to update source code in production or test without interfering with running DAGs.

Airflow does not provide strong DAG isolation. We recommend that you maintain separate production and test Cloud Composer environments to prevent your test DAGs from interfering with your production DAGs.

How do I avoid DAG interference when I run integration tests from different GitHub branches

Use unique task names to prevent interference. For example, you can prefix your task IDs with the branch name.

What is a best practice for integration testing with Airflow?

We recommend that you use a dedicated environment for integration testing with Airflow. One way to signal the DAG run success is to write into a file in a Cloud Storage folder and then check the content in your own integration test cases.

How do I collaborate efficiently with other DAG contributors?

Each contributor can have a subdirectory in the data/ folder for development.

DAGs added to the data/ folder are not picked up automatically by the Airflow scheduler or web server

DAG contributors can create manual DAG runs by using the gcloud composer environments run command and the test sub-command with the --subdir flag to specify the contributor's development directory.

For example:

gcloud composer environments run test-environment-name \
    test -- dag-id task-id execution-date \
    --subdir /home/airflow/gcs/data/alice_dev

In Airflow 2, use the tasks test command:

gcloud beta composer environments run test-environment-name \
    tasks test -- dag-id task-id execution-date \
    --subdir /home/airflow/gcs/data/alice_dev

How do I keep my deployment and production environments in sync?

To manage access:

To deploy from development to production:

  • Ensure consistent configuration, such as environment variables and PyPI packages.

  • Ensure consistent DAG arguments. To avoid hard-coding, we recommend that you use Airflow macros and variables.

    For example:

    gcloud composer environments run test-environment-name \
        variables -- --set DATA_ENDPOINT_KEY DATA_ENDPOINT_VALUE
    

    In Airflow 2.0 CLI, use the variables set command:

    gcloud beta composer environments run test-environment-name \
        variables set -- DATA_ENDPOINT_KEY DATA_ENDPOINT_VALUE
    

What's next