Cloud Composer 1 | Cloud Composer 2
This page describes how to install Python packages to your environment.
About preinstalled and custom PyPI packages
Preinstalled PyPI packages are packages that are included in the Cloud Composer image of your environment. Each Cloud Composer image contains PyPI packages that are specific for your version of Cloud Composer and Airflow.
Custom PyPI packages are packages that you can install in your environment in addition to preinstalled packages.
Options for managing Python packages
Option | Use if |
---|---|
Install from PyPI | The package has no external dependencies and does not conflict with Cloud Composer dependencies. |
Install from a private repository | The package is hosted in a private package repository available on the internet. |
Install as a local Python library |
The package cannot be found in the Python Package Index, and the library
does not have any external dependencies, such as dist-packages . |
Install a plugin | The package provides plugin-specific functionality, such as modifying the Airflow web interface. |
PythonVirtualenvOperator | You do not want the package to be installed for all workers, or the dependency conflicts with preinstalled packages. The package can be found in the Python Package Index and has no external dependencies. |
KubernetesPodOperator |
You require external dependencies that cannot be installed from pip,
such as dist-packages , or are on an internal pip server. This
option requires more setup and maintenance and should generally
be considered only if other options do not work. |
Before you begin
- You must have a role that can trigger environment update operations. In addition, the service account of the environment must have a role that has enough permissions to perform update operations. For more information, see Access control.
- If your environment is protected by a VPC Service Controls perimeter, then before installing PyPI dependencies you must grant additional user identities with access to services that the service perimeter protects and enable support for a private PyPI repository.
- Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.
- PyPI dependency updates generate Docker images in Artifact Registry. Do not modify or delete the images.
- If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.
View the list of packages
You can get the list of packages for your environment in several formats.
Preinstalled packages
To view the list of preinstalled packages for your environment, see the list of packages for the Cloud Composer image of your environment.
All packages
To view all packages (both preinstalled and custom) in your environment:
gcloud
The following gcloud CLI command returns the result of
the python -m pip list
command for an Airflow worker in your environment.
As an alternative, you can use the --tree
argument to get the result of
the python -m pipdeptree --warn
command.
gcloud beta composer environments list-packages \
ENVIRONMENT_NAME \
--location LOCATION
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.
Custom PyPI packages
Console
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your environment. The Environment details page opens.
Go to the PYPI Packages tab.
gcloud
gcloud composer environments describe ENVIRONMENT_NAME \
--location LOCATION \
--format="value(config.softwareConfig.pypiPackages)"
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.
Install a package from PyPI
A package can be installed from Python Package Index if it has no external dependencies or conflicts with preinstalled packages.
To add, update, or delete the Python dependencies for your environment:
Console
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your environment. The Environment details page opens.
Go to the Environment configuration tab.
Click Edit.
In the PyPI packages section, specify package names, with optional version specifiers and extras.
For example:
scikit-learn
scipy
,>=0.13.3
nltk
,[machine_learning]
gcloud
gcloud CLI has several agruments for working with custom PyPI packages:
--update-pypi-packages-from-file
replaces replaces all existing custom PyPI packages with the specified packages. Packages that you do not specify are removed.--update-pypi-package
updates or installs one package.--remove-pypi-packages
removes specified packages.--clear-pypi-packages
removes all packages.
Installing requirements from a file
The requirements.txt
file must have each
requirement specifier on a separate
line.
For example:
scipy>=0.13.3
scikit-learn
nltk[machine_learning]
Update your environment, and specify the requirements.txt
file in
the --update-pypi-packages-from-file
argument:
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--update-pypi-packages-from-file requirements.txt
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.
Installing one package
Update your environment, and specify the package, version, and extras in
the --update-pypi-package
argument:
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--update-pypi-package PACKAGE_NAMEEXTRAS_AND_VERSION
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.PACKAGE_NAME
with the name of the package.EXTRAS_AND_VERSION
with the optional version and extras specifier. To omit versions and extras, specify an empty value.
Example:
gcloud composer environments update example-environment \
--location us-central1 \
--update-pypi-package scipy>=0.13.3
Removing packages
Update your environment, and specify the packages that you want to delete in the --remove-pypi-packages
argument:
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--remove-pypi-packages PACKAGE_NAMES
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.PACKAGE_NAMES
with a comma-separated list of packages.
Example:
gcloud composer environments update example-environment \
--location us-central1 \
--remove-pypi-packages scipy,scikit-learn
API
Construct an environments.patch
API request.
In this request:
In the
updateMask
parameter, specify the mask:- Use
config.softwareConfig.pypiPackages
mask to replace all existing packages with the specified packages. Packages that you do not specify are deleted. - Use
config.softwareConfig.envVariables.PACKAGE_NAME
to add or update a specific package. To add or update several packages, specify several masks with commas.
- Use
In the request body, specify packages and values for versions and extras:
{ "config": { "softwareConfig": { "pypiPackages": { "PACKAGE_NAME": "EXTRAS_AND_VERSION" } } } }
Replace:
PACKAGE_NAME
with the name of the package.EXTRAS_AND_VERSION
with the optional version and extras specifier. To omit versions and extras, specify an empty value.- To add more than one package, add extra entries for packages
to
pypiPackages
.
Example:
// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.softwareConfig.pypiPackages.EXAMPLE_PACKAGE,
// config.softwareConfig.pypiPackages.ANOTHER_PACKAGE
{
"config": {
"softwareConfig": {
"pypiPackages": {
"EXAMPLE_PACKAGE": "",
"ANOTHER_PACKAGE": ">=1.10.3"
}
}
}
}
Terraform
The env_variables
block in the software_config
block specifies
environment variables.
resource "google_composer_environment" "example" {
name = "ENVIRONMENT_NAME"
region = "LOCATION"
config {
software_config {
env_variables = {
VAR_NAME = "VAR_VALUE"
}
}
}
}
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.VAR_NAME
with the name of the environment variable.VAR_VALUE
with the value of the environment variable.- To add more than one variable, add extra entries for variables
to
env_variables
.
Example:
resource "google_composer_environment" "example" {
name = "example-environment"
region = "us-central1"
config {
env_variables = {
EXAMPLE_VARIABLE = "True"
ANOTHER_VARIABLE = "test"
}
}
}
Install from a private repository
You can install packages hosted in package repositories available on the public internet.
The packages must be properly configured, so that the default pip
tool can install it.
To install from a private package repository that has a public address:
Create a pip.conf file and include the following information in the file, if applicable:
- URL of the repository
- Access credentials for the repository
- Non-default pip installation options
Example:
[global] extra-index-url=https://example.com/
Upload this pip.conf file to the
/config/pip/
folder in your environment's bucket. For example:gs://us-central1-example-bucket/config/pip/pip.conf
Install to a private IP environment
A private IP environment restricts access to the public internet, so installing Python dependencies may require additional steps.
When installing dependencies from a PyPI repository, no special configuration is required. Follow the procedure described in Install a package from PyPI. You can also request packages from a private repository with a public address.
As an alternative, you can host a private PyPI repository in your project's network. When installing dependencies, Cloud Composer runs the operation within the cluster that hosts your environment, without accessing any public IP address through Cloud Build.
To install packages from a private repository hosted in your project's network:
The service account for your Cloud Composer environment must have the
iam.serviceAccountUser
role.Create a pip.conf file and include the following information in the file, if applicable:
- IP address of the repository in your project's network
- Access credentials for the repository
- Non-default pip installation options
Example:
[global] extra-index-url=https://192.0.2.10/
Upload this pip.conf file to the
/config/pip/
folder in your environment's bucket. For example:gs://us-central1-example-bucket/config/pip/pip.conf
Install to a private IP environment under resource location restrictions
Keeping your project in line with Resource Location Restriction requirements prohibits the use of some tools. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.
To install Python dependencies in such an environment, follow the guidance for a private IP environments with a VPC Service Controls perimeter.
Installing a Python dependency to a private IP environment in a VPC Service Controls perimeter
Protecting your project with a VPC Service Controls perimeter results in further security restrictions. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.
To install Python dependencies for a private IP environment inside a perimeter, you can:
- Use a private PyPI repository hosted in your VPC network.
- Use a proxy server VM in your VPC network to connect to a
PyPI repository on the public internet. Specify the proxy address in the
/config/pip/pip.conf
file in your environment's bucket. - If your security policy permits access to your VPC network from external IP addresses, you can enable this by configuring Cloud NAT.
- Put the Python dependencies into the
dags
folder in the Cloud Storage bucket to install them as local libraries. This may not be a good option if the dependency tree is large.
Install a local Python library
To install an in-house or local Python library:
Place the dependencies within a subdirectory in the
dags/
folder in your environment's bucket. To import a module from a subdirectory, each subdirectory in the module's path must contain an__init__.py
package marker file.In the following example, the dependency is
coin_module.py
:dags/ use_local_deps.py # A DAG file. dependencies/ __init__.py coin_module.py
Import the dependency from the DAG definition file.
For example:
Use packages that depend on shared object libraries
Certain PyPI packages depend on system-level libraries. While Cloud Composer does not support system libraries, you can use the following options:
Use the KubernetesPodOperator. Set the Operator image to a custom build image. If you experience packages that fail during installation due to an unmet system dependency, use this option.
Upload the shared object libraries to your environment's bucket. If your PyPI packages have installed successfully but fail at runtime, use this option.
- Manually find the shared object libraries for the PyPI dependency (an .so file).
- Upload the shared object libraries to the
/plugins
folder in your environment's bucket. - Set the following environment variable:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/airflow/gcs/plugins