Composer-Airflow Repository

About the Repository

The Composer-Airflow repository is a read-only repository that contains the patched Apache Airflow code running in Cloud Composer. It can be used as both as a reference and for local testing and development.

A particular version of Apache Airflow found in Cloud Composer is not always an exact match of the corresponding version in upstream Airflow because Cloud Composer uses a patched version of Airflow. This repository holds the code for every patched version of Airflow used in Cloud Composer. For information about which versions of Airflow are found in Cloud Composer, see Cloud Composer Versions list.

Contributing

This code is not a fork of Apache Airflow - the code found in this repository comes directly from the Apache Airflow repository, but at different timeline than normal Apache Airflow releases. If you would like to contribute to this codebase, please do not do so here - Pull Requests are not accepted in this repository and should instead be contributed to Airflow directly.

Issues + Support

If you have an issue with the code found in this repository, please follow Airflow bug reporting instructions. If you have an issue with Cloud Composer, please utilize Cloud Composer support channels.

The repository has one branch for each version of Airflow available in Cloud Composer. Not all versions of Airflow are supported in Cloud Composer. For information about version support, see Cloud Composer Versioning.

Use Cases

Is this commit from the Airflow repository in my version of Cloud Composer?

Commit SHA1s in the composer-airflow repo do not correspond to commit SHA1s in the upstream Airflow repo, which means the easiest way to search for a specific commit is by searching for the corresponding commit message.

Currently, it is not possible to use the GitHub UI to search through commit messages in branches other than the default branch, however, it is possible to do so using the git CLI. To search for a particular commit in this repository, you need to have git installed.

  1. Clone the repository and change to the repository directory using the following command:
    git clone git@github.com:GoogleCloudPlatform/composer-airflow.git && cd composer-airflow
  2. Search for the commit message:

    git log --source --grep="COMMIT_MESSAGE" --all

    where:

    • --source shows the branch where the commit is found
    • --grep tells git what message to search the log for
    • --all tells git to search all branches
  3. The branch is located next to the commit hash in the first line of every result. If the command returns a commit, this means it is in your version of Cloud Composer. Additionally, if the version of Airflow in your environment is later than the branch version, then the commit is used in your environment's Airflow version.

For example, if you want to search for the commit message "Force explicit choice on GPL dependency", your command would be:

    git log --source --grep="Force explicit choice on GPL dependency" --all

and if there is a matching commit, your results would look something like this (there may be more than one result).

commit 64ff1089e30e80b08bf5155edd9e49f5293ebbe4 refs/heads/<strong>1.10.2</strong>
Author: example_airflow_committer <example_airflow_committer@users.noreply.github.com>
Date:   Wed Aug 1 11:25:31 2018 +0200

    [AIRFLOW-2817] Force explicit choice on GPL dependency (#3660)

    By default one of Apache Airflow's dependencies pulls in a GPL
    library. Airflow should not install (and upgrade) without an explicit choice.

    This is part of the Apache requirements as we cannot depend on Category X
    software.

    (cherry picked from commit c37fc0b6ba19e3fe5656ae37cef9b59cef3c29e8)
    Signed-off-by: Example Airflow Committer  <example_airflow_committer@users.noreply.github.com>
    (cherry picked from commit b39e4532d9d1086c60b31553d08972bcc68df641)
    Signed-off-by: Example Airflow Committer  <example_airflow_committer@users.noreply.github.com>
    GitOrigin-RevId: cefcf4c61f64be3792cbfed509b82a9eb4cc47be

What does this Operator look like in my version of Composer?

If you're not using the providers packages

If you know the filepath for a particular operator, for example, the GoogleCloudStorageCreateBucketOperator, navigate to it using the GitHub UI or the CLI.

If you do not know its filepath, you can search for it with the following command:

    git grep GoogleCloudStorageCreateBucketOperator

The output is a list of files where the string (in this case, the operator name) can be found. From that list, navigate to the appropriate file and examine its contents further.

airflow/contrib/operators/gcs_operator.py:class GoogleCloudStorageCreateBucketOperator(BaseOperator):
airflow/contrib/operators/gcs_operator.py:            CreateBucket = GoogleCloudStorageCreateBucketOperator(
airflow/contrib/operators/gcs_operator.py:        super(GoogleCloudStorageCreateBucketOperator, self).__init__(*args, **kwargs)
docs/code.rst:.. autoclass:: airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator
docs/integration.rst:- :ref:`GoogleCloudStorageCreateBucketOperator` : Creates a new cloud storage bucket.
docs/integration.rst:.. _GoogleCloudStorageCreateBucketOperator:
docs/integration.rst:GoogleCloudStorageCreateBucketOperator
docs/integration.rst:.. autoclass:: airflow.contrib.operators.gcs_operator.GoogleCloudStorageCreateBucketOperator
tests/contrib/operators/test_gcs_operator.py:from airflow.contrib.operators.gcs_operator import GoogleCloudStorageCreateBucketOperator
tests/contrib/operators/test_gcs_operator.py:        operator = GoogleCloudStorageCreateBucketOperator(

If you are using the providers packages

Starting with certain versions of Airflow 1.10.x, certain operators and accompanying code are packaged and released separately from core Airflow in PyPI packages called the "backport provider packages" (Airflow 1.10.x) or the "provider packages" (Airflow 2.0 and above.) Refer to the Cloud Composer backport provider documentation for more information about backport providers in Cloud Composer.

Certain versions of these packages are installed by default in Cloud Composer. To know which version is installed in your environment, check the "PyPI packages for Python 3" column of the version list.

To look at the code for an operator in a particular release:

GitHub UI

  1. Go to the upstream Airflow repo.
  2. Enter the name of the particular operator you are searching for in the GitHub search bar at the top of the page.
  3. If more than one file is returned, click on the code file with a path beginning in airflow/providers. For example, if you search for the GoogleCloudStorageCreateBucketOperator, choose airflow/providers/google/cloud/operators/gcs.py.
  4. Click on the branch selector, which opens up the "Switch branches/tags" drop-down list.
  5. Click on the "Tags" tab.
  6. For providers packages, search for the name of your provider and the version by typing
    providers-PROVIDER_NAME/PROVIDER_VERSION
    in the drop-down list's search bar, where PROVIDER_NAME is the name of the provider, and PROVIDER_VERSION is the name of the version you are looking for. For example, if you want to see version 4.0.0 of the apache-airflow-providers-google package, you would search for providers-google/4.0.0.
  7. For backport-providers packages, search for
    backport-providers-PROVIDER_VERSION
    in the drop-down list's search bar. For example, if you want to see version 2021.3.3 of the apache-airflow-backport-providers-google package you would search for backport-providers-2021.3.3.
  8. Click on the result that matches your query.
  9. The code on the screen is exactly what is running in your version of that operator. You can also click History to see the commit history up until this point.

git CLI

  1. Clone the upstream Airflow repo.
  2. For providers packages, run
    git checkout providers-PROVIDER_NAME/PROVIDER_VERSION
    where PROVIDER_NAME is the name of the provider, and PROVIDER_VERSION is the name of the version you are looking for. For example, if you want to see version 4.0.0 of the apache-airflow-providers-google package, you would run git checkout providers-google/4.0.0.
  3. For backport-providers packages, run
    git checkout backport-providers-PROVIDER_VERSION
    in the drop-down list's search bar. For example, if you want to see version 2021.3.3 of the apache-airflow-backport-providers-google package you would run git checkout backport-providers-2021.3.3.

Then follow the instructions in If you're not using the providers packages