Install Python dependencies

Cloud Composer 1 | Cloud Composer 2

This page describes how to install Python packages to your environment.

About preinstalled and custom PyPI packages

Preinstalled PyPI packages are packages that are included in the Cloud Composer image of your environment. Each Cloud Composer image contains PyPI packages that are specific for your version of Cloud Composer and Airflow.

Custom PyPI packages are packages that you can install in your environment in addition to preinstalled packages.

Options for managing Python packages

Option Use if
Install from PyPI The package has no external dependencies and does not conflict with Cloud Composer dependencies.
Install from a private repository The package is hosted in a private package repository available on the internet.
Install as a local Python library The package cannot be found in the Python Package Index, and the library does not have any external dependencies, such as dist-packages.
Install a plugin The package provides plugin-specific functionality, such as modifying the Airflow web interface.
PythonVirtualenvOperator You do not want the package to be installed for all workers, or the dependency conflicts with preinstalled packages. The package can be found in the Python Package Index and has no external dependencies.
KubernetesPodOperator You require external dependencies that cannot be installed from pip, such as dist-packages, or are on an internal pip server. This option requires more setup and maintenance and should generally be considered only if other options do not work.

Before you begin

  • You must have a role that can trigger environment update operations. In addition, the service account of the environment must have a role that has enough permissions to perform update operations. For more information, see Access control.
  • If your environment is protected by a VPC Service Controls perimeter, then before installing PyPI dependencies you must grant additional user identities with access to services that the service perimeter protects and enable support for a private PyPI repository.
  • Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.
  • PyPI dependency updates generate Docker images in Artifact Registry. Do not modify or delete the images.
  • If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.

View the list of packages

You can get the list of packages for your environment in several formats.

Preinstalled packages

To view the list of preinstalled packages for your environment, see the list of packages for the Cloud Composer image of your environment.

All packages

To view all packages (both preinstalled and custom) in your environment:

gcloud

The following gcloud CLI command returns the result of the python -m pip list command for an Airflow worker in your environment.

As an alternative, you can use the --tree argument to get the result of the python -m pipdeptree --warn command.

gcloud beta composer environments list-packages \
    ENVIRONMENT_NAME \
    --location LOCATION

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.

Custom PyPI packages

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Go to the PYPI Packages tab.

gcloud

gcloud composer environments describe ENVIRONMENT_NAME \
  --location LOCATION \
  --format="value(config.softwareConfig.pypiPackages)"

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.

Install a package from PyPI

A package can be installed from Python Package Index if it has no external dependencies or conflicts with preinstalled packages.

To add, update, or delete the Python dependencies for your environment:

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Go to the Environment configuration tab.

  4. Click Edit.

  5. In the PyPI packages section, specify package names, with optional version specifiers and extras.

    For example:

    • scikit-learn
    • scipy, >=0.13.3
    • nltk, [machine_learning]

gcloud

gcloud CLI has several agruments for working with custom PyPI packages:

  • --update-pypi-packages-from-file replaces replaces all existing custom PyPI packages with the specified packages. Packages that you do not specify are removed.
  • --update-pypi-package updates or installs one package.
  • --remove-pypi-packages removes specified packages.
  • --clear-pypi-packages removes all packages.

Installing requirements from a file

The requirements.txt file must have each requirement specifier on a separate line.

For example:

scipy>=0.13.3
scikit-learn
nltk[machine_learning]

Update your environment, and specify the requirements.txt file in the --update-pypi-packages-from-file argument:

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
     --update-pypi-packages-from-file requirements.txt

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.

Installing one package

Update your environment, and specify the package, version, and extras in the --update-pypi-package argument:

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
     --update-pypi-package PACKAGE_NAMEEXTRAS_AND_VERSION

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • PACKAGE_NAME with the name of the package.
  • EXTRAS_AND_VERSION with the optional version and extras specifier. To omit versions and extras, specify an empty value.

Example:

gcloud composer environments update example-environment \
    --location us-central1 \
    --update-pypi-package scipy>=0.13.3

Removing packages

Update your environment, and specify the packages that you want to delete in the --remove-pypi-packages argument:

gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
     --remove-pypi-packages PACKAGE_NAMES

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • PACKAGE_NAMES with a comma-separated list of packages.

Example:

gcloud composer environments update example-environment \
    --location us-central1 \
    --remove-pypi-packages scipy,scikit-learn

API

Construct an environments.patch API request.

In this request:

  1. In the updateMask parameter, specify the mask:

    • Use config.softwareConfig.pypiPackages mask to replace all existing packages with the specified packages. Packages that you do not specify are deleted.
    • Use config.softwareConfig.envVariables.PACKAGE_NAME to add or update a specific package. To add or update several packages, specify several masks with commas.
  2. In the request body, specify packages and values for versions and extras:

    {
      "config": {
        "softwareConfig": {
          "pypiPackages": {
            "PACKAGE_NAME": "EXTRAS_AND_VERSION"
          }
        }
      }
    }
    

    Replace:

    • PACKAGE_NAME with the name of the package.
    • EXTRAS_AND_VERSION with the optional version and extras specifier. To omit versions and extras, specify an empty value.
    • To add more than one package, add extra entries for packages to pypiPackages.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.softwareConfig.pypiPackages.EXAMPLE_PACKAGE,
// config.softwareConfig.pypiPackages.ANOTHER_PACKAGE
{
  "config": {
    "softwareConfig": {
      "pypiPackages": {
        "EXAMPLE_PACKAGE": "",
        "ANOTHER_PACKAGE": ">=1.10.3"
      }
    }
  }
}

Terraform

The env_variables block in the software_config block specifies environment variables.

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    software_config {

      env_variables = {
        VAR_NAME = "VAR_VALUE"
      }

    }
  }
}

Replace:

  • ENVIRONMENT_NAME with the name of the environment.
  • LOCATION with the region where the environment is located.
  • VAR_NAME with the name of the environment variable.
  • VAR_VALUE with the value of the environment variable.
  • To add more than one variable, add extra entries for variables to env_variables.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

      env_variables = {
        EXAMPLE_VARIABLE = "True"
        ANOTHER_VARIABLE = "test"
      }

  }
}

Install from a private repository

You can install packages hosted in package repositories available on the public internet.

The packages must be properly configured, so that the default pip tool can install it.

To install from a private package repository that has a public address:

  1. Create a pip.conf file and include the following information in the file, if applicable:

    • URL of the repository
    • Access credentials for the repository
    • Non-default pip installation options

    Example:

    [global]
    extra-index-url=https://example.com/
    
  2. Upload this pip.conf file to the /config/pip/ folder in your environment's bucket. For example: gs://us-central1-example-bucket/config/pip/pip.conf

Install to a private IP environment

A private IP environment restricts access to the public internet, so installing Python dependencies may require additional steps.

When installing dependencies from a PyPI repository, no special configuration is required. Follow the procedure described in Install a package from PyPI. You can also request packages from a private repository with a public address.

As an alternative, you can host a private PyPI repository in your project's network. When installing dependencies, Cloud Composer runs the operation within the cluster that hosts your environment, without accessing any public IP address through Cloud Build.

To install packages from a private repository hosted in your project's network:

  1. The service account for your Cloud Composer environment must have the iam.serviceAccountUser role.

  2. Create a pip.conf file and include the following information in the file, if applicable:

    • IP address of the repository in your project's network
    • Access credentials for the repository
    • Non-default pip installation options

    Example:

    [global]
    extra-index-url=https://192.0.2.10/
    
  3. Upload this pip.conf file to the /config/pip/ folder in your environment's bucket. For example: gs://us-central1-example-bucket/config/pip/pip.conf

Install to a private IP environment under resource location restrictions

Keeping your project in line with Resource Location Restriction requirements prohibits the use of some tools. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.

To install Python dependencies in such an environment, follow the guidance for a private IP environments with a VPC Service Controls perimeter.

Installing a Python dependency to a private IP environment in a VPC Service Controls perimeter

Protecting your project with a VPC Service Controls perimeter results in further security restrictions. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.

To install Python dependencies for a private IP environment inside a perimeter, you can:

  • Use a private PyPI repository hosted in your VPC network.
  • Use a proxy server VM in your VPC network to connect to a PyPI repository on the public internet. Specify the proxy address in the /config/pip/pip.conf file in your environment's bucket.
  • If your security policy permits access to your VPC network from external IP addresses, you can enable this by configuring Cloud NAT.
  • Put the Python dependencies into the dags folder in the Cloud Storage bucket to install them as local libraries. This may not be a good option if the dependency tree is large.

Install a local Python library

To install an in-house or local Python library:

  1. Place the dependencies within a subdirectory in the dags/ folder in your environment's bucket. To import a module from a subdirectory, each subdirectory in the module's path must contain an __init__.py package marker file.

    In the following example, the dependency is coin_module.py:

    dags/
      use_local_deps.py  # A DAG file.
      dependencies/
        __init__.py
        coin_module.py
    
  2. Import the dependency from the DAG definition file.

    For example:

from dependencies import coin_module

Use packages that depend on shared object libraries

Certain PyPI packages depend on system-level libraries. While Cloud Composer does not support system libraries, you can use the following options:

  • Use the KubernetesPodOperator. Set the Operator image to a custom build image. If you experience packages that fail during installation due to an unmet system dependency, use this option.

  • Upload the shared object libraries to your environment's bucket. If your PyPI packages have installed successfully but fail at runtime, use this option.

    1. Manually find the shared object libraries for the PyPI dependency (an .so file).
    2. Upload the shared object libraries to the /plugins folder in your environment's bucket.
    3. Set the following environment variable: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/airflow/gcs/plugins

What's next