This page describes how to install Python packages and connect to your Cloud Composer environment from a few common applications.
Dependencies are installed with the existing Python dependencies that are included in the base environment.
If your environment requires a specific package, we recommend that you explicitly install the package to avoid issues due to package changes across Cloud Composer image versions. Do not rely on the pre-installed packages in the Cloud Composer version that is running in your environment.
Options for managing dependencies
If your Python dependency has no external dependencies and does not conflict with Cloud Composer's dependencies, you can install Python dependencies from the Python Package Index. You can also install a Python dependency from private package repository.
For other requirements, here are a few options.
Option | Use if ... |
---|---|
Local Python library | Your Python dependency can't be found the Python Package Index, and the library does not have any external dependencies, such as dist-packages. |
Plugins feature | You want to use plugin-specific functionality, such as modifying the Airflow web interface. |
PythonVirtualenvOperator | Your Python dependency can be found on the Python Package Index and has no external dependencies. However, you don't want your Python dependency to be installed for all workers, or the dependency conflicts with dependencies required for Cloud Composer. |
KubernetesPodOperator | You require external dependencies that can't be installed from pip, such as dist-packages, or are on an internal pip server. This option requires more setup and maintenance and should generally be considered if the other options do not work. |
Before you begin
- The following permission is required to install Python packages in
the Cloud Composer environment:
composer.environments.update
. For more information, see Cloud Composer Access Control. - If your environment is protected by a VPC Service Controls perimeter, before installing PyPI dependencies you must grant additional user identities with access to services that the service perimeter protects and enable support for a private PyPI repository.
- Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.
- When you install custom Python dependencies by using the API, all Cloud Composer processes run with newly-installed PyPI dependencies.
- Custom PyPI dependencies might cause conflicts with dependencies that Airflow requires, causing instability.
- Before deploying to production, we recommend that you test your PyPI packages locally in an Airflow worker container.
Viewing installed Python packages
To see the installed Python packages in your environment:
Determine the Cloud Composer environment's GKE cluster and zone.
a. Use the
gcloud composer
command to show the properties of a Cloud Composer environment:gcloud composer environments describe ENVIRONMENT_NAME \ --location ENVIRONMENT_LOCATION
In this command:
ENVIRONMENT_NAME
is the name of the environment.ENVIRONMENT_LOCATION
is the Compute Engine region where the environment is located.
b. In the output, the cluster is listed as the
gkeCluster
.c. The zone where the cluster is deployed is listed at the last part of the
location
property (config
>nodeConfig
>location
). For example,us-central1-b
.Use the
gcloud composer
command to connect thekubectl
command to the cluster.gcloud container clusters get-credentials GKE_CLUSTER --zone GKE_LOCATION
In this command:
GKE_CLUSTER
is the cluster.GKE_LOCATION
is the zone where the cluster is deployed.
View and choose an Airflow worker pod.
kubectl get pods --all-namespaces
Look for a pod with a name like
airflow-worker-1a2b3c-x0yz
.Connect to a remote shell in an Airflow worker container.
kubectl -n composer-1-14-4-airflow-example-namespace \ exec -it airflow-worker-1a2b3c-x0yz -c airflow-worker -- /bin/bash
While connected to the remote shell, your command prompt shows the name of the Airflow worker pod, such as
airflow-worker-1a2b3c-x0yz
.Run
pip freeze
.For example:
pip freeze absl-py==0.7.1 adal==1.2.0 asn1crypto==0.24.0 astor==0.8.0 attrs==19.1.0 autopep8==1.4.4 ...
Installing a Python dependency from PyPI
Your Python dependency must not have external dependencies or conflict with Cloud Composer's dependencies to install Python dependencies from the Python Package Index.
To add, update, or delete the Python dependencies for your environment:
Console
Specify the package name and version specifiers as shown:
"pi-python-client", "==1.1.post1"
"go-api-python-client", "==1.0.0.dev187"
For a package without the version specifier, use an empty string for the value, such as "glob2", " "
.
To access an environment's Python dependencies, navigate to the PyPI dependencies page using the following steps:
Open the Environments page in the Google Cloud Platform Console.
Click the Name of the environment you want to install, update, or delete Python dependencies for.
Select the PyPI dependencies tab.
Click the Edit button.
To add a new dependency:
Click the Add dependency button.
Enter the name and version of your library in the Name and Version fields.
To update an existing dependency:
Select the Name and/or Version field of the library you want to update.
Enter a new value.
To delete a dependency:
Hover over the name of the dependency to delete.
Click the trash can icon that appears.
gcloud
Pass a requirements.txt
file to the gcloud
command-line tool.
Format the file with each requirement specifier on a separate line.
Sample requirements.txt
file:
scipy>=0.13.3
scikit-learn
nltk[machine_learning]
Pass the requirements.txt
file to the environments.set-python-dependencies
command to set your installation dependencies.
gcloud composer environments update ENVIRONMENT-NAME \\ --update-pypi-packages-from-file requirements.txt \\ --location LOCATION
The command terminates when
the operation is finished. To avoid waiting, use the --async
flag.
If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.
rest
Use the projects.locations.environments.patch
method,
specifying config.softwareConfig.pypiPackages
as the prefix for the updateMask query parameter.
Installing a Python dependency from a private repository
You can install packages hosted in private package repositories available on the public internet. The packages must be properly configured packages that the default pip tool can install.
To install from a private package repository with a public address:
Create a pip.conf file and include the following information in the file if applicable:
- Access credentials for the repository
Non-default pip installation options
Example:
[global] extra-index-url=https://my-example-private-repo.com/
Upload the pip.conf file to your environment's Cloud Storage bucket and place it in the folder
/config/pip/
, for example: gs://us-central1-b1-6efannnn-bucket/config/pip/pip.conf
Installing a Python dependency to a private IP environment
A private IP environment restricts access to the public internet, so installing Python dependencies may require additional steps.
When installing dependencies from a public PyPI repository, no special configuration is required. You can follow the normal process described above. You can also request packages from a private repository with a public address.
Alternatively, you can host a private PyPI repository in your VPC network. When installing dependencies, Cloud Composer will run the operation within the private IP GKE cluster hosting your environment, without accessing any public IP address through Cloud Build.
To install packages from a private repository hosted in your VPC network:
If the service account for your Cloud Composer environment does not have the
project.editor
role, grant it theiam.serviceAccountUser
role.Specify the private IP address of the repository in the
pip.conf
file uploaded to the/config/pip/
folder in the Cloud Storage bucket.
Installing a Python dependency to a private IP environment under resource location restrictions
Keeping your project in line with Resource Location Restriction requirements prohibits the use of some tools. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.
To install Python dependencies in such an environment you should follow one of the approaches outlined below for a private IP environments with a VPC-SC perimeter.
Installing a Python dependency to a private IP environment in a VPC Service Controls perimeter
Protecting your project with a VPC Service Controls perimeter results in further security restrictions. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.
To install Python dependencies for a private IP Composer environment inside a perimeter, you have some options:
- Use a private PyPI repository hosted in your VPC network (as described in the section above).
- Use a proxy server in your VPC network to connect to a PyPI repository on
the public internet. Specify the proxy address in the
/config/pip/pip.conf
file in the Cloud Storage bucket. - If your security policy permits access to your VPC network from external IP addresses, you can enable this by configuring Cloud NAT.
- Vendor the Python dependencies into the
dags
folder in the Cloud Storage bucket to install them as local libraries. This may not be a good option if the dependency tree is large.
Installing a local Python library
To install an in-house or local Python library:
Place the dependencies within a subdirectory in the
dags/
folder. To import a module from a subdirectory, each subdirectory in the module's path must contain a__init__.py
package marker file.In this example, the dependency is
coin_module.py
:dags/ use_local_deps.py # A DAG file. dependencies/ __init__.py coin_module.py
Import the dependency from the DAG definition file.
For example:
Using Python packages that depend on shared object libraries
Certain PyPI packages depend on system-level libraries. While Cloud Composer does not support system libraries, you can use the following options:
Use the KubernetesPodOperator. Set the Operator image to a custom build image. If you experience packages that fail during installation due to an unmet system dependency, use this option.
Upload the shared object libraries to your environment's Cloud Storage bucket.
- Manually find the shared object libraries for the PyPI dependency (an .so file).
- Upload the shared object libraries to
/home/airflow/gcs/plugins
. - Set the following Cloud Composer environment variable:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/airflow/gcs/plugins
If your PyPI packages have installed successfully but fail at runtime, this is an option.
Connecting to the Flower web interface
Flower is a web-based tool for working with Celery clusters. Flower is pre-installed in your environment. You can use its web UI to monitor the Apache Airflow workers for your environment.
To access Flower:
To determine the Kubernetes Engine cluster, view your environment:
gcloud composer environments describe ENVIRONMENT-NAME \ --location LOCATION
The cluster is listed as the
gkeCluster
. The zone where the cluster is deployed is listed as thelocation
.For example:
gcloud composer environments describe environment-name --location us-central1 config: airflowUri: https://uNNNNe0aNNbcd3fff-tp.appspot.com dagGcsPrefix: gs://us-central1-may18-test-00a47695-bucket/dags gkeCluster: projects/example-project/zones/us-central1-a/clusters/us-central1-environment-name-00a47695-gke nodeConfig: diskSizeGb: 100 location: projects/example-project/zones/us-central1-a
In the example, the cluster is
us-central1-environment-name-00a47695-gke
, and the zone isus-central1-a
. This information is also available on the Environment details page in the Cloud Console.Connect to the Kubernetes Engine cluster:
gcloud container clusters get-credentials CLUSTER_NAME \ --zone CLUSTER_ZONE
For example:
gcloud container clusters get-credentials us-central1-environment-name-00a47695-gke --zone us-central1-a Fetching cluster endpoint and auth data. kubeconfig entry generated for us-central1-environment-name-00a47695-gke.
View the worker pods and select the pod to run Flower on:
kubectl get pods --all-namespaces | grep worker
For example:
kubectl get pods --all-namespaces | grep worker airflow-worker-89696c45f-49rkb 2/2 Running 1 29d airflow-worker-89696c45f-gccmm 2/2 Running 1 29d airflow-worker-89696c45f-llnnx 2/2 Running 0 29d
The pod names match the regex
"airflow-(worker|scheduler)-[-a-f0-9]+")
.Run Flower on the worker pod:
kubectl exec -n NAMESPACE -it POD_NAME -c airflow-worker -- airflow flower
For example:
kubectl exec -n composer-1-6-0-airflow-1-10-1-9670c487 -it airflow-worker-89696c45f-llnnx / -c airflow-worker -- airflow flower [I 180601 20:35:55 command:139] Visit me at http://0.0.0.0:5555 [I 180601 20:35:55 command:144] Broker: redis://airflow-redis-service.default.svc.cluster.local:6379/0
In a separate terminal session, use
kubectl
to forward a port on your local machine to the pod running the Flower UI:kubectl -n NAMESPACE port-forward POD_NAME 5555
For example:
kubectl -n composer-1-6-0-airflow-1-10-1-9670c487 port-forward airflow-worker-c5c4b58c7-zhmkv 5555 Forwarding from 127.0.0.1:5555 -> 5555
To access the web UI, go to
http://localhost:5555
in your local browser.
Installing SQLAlchemy to access the Airflow database
SQLAlchemy is a Python SQL toolkit and
Object Relational Mapper (ORM). You can install SQLAlchemy and use it to access
the Cloud SQL instance for Cloud Composer. During installation, Cloud Composer
configures the Airflow environment variable AIRFLOW__CORE__SQL_ALCHEMY_CONN
.
To install SQL Alchemy:
Install
sqlalchemy
in your environment.gcloud composer environments update ENVIRONMENT-NAME / --location LOCATION / --update-pypi-package "sqlalchemy"
To determine the Kubernetes Engine cluster, view your environment:
gcloud composer environments describe ENVIRONMENT-NAME / --location LOCATION
Connect to the Kubernetes Engine cluster:
gcloud container clusters get-credentials CLUSTER_NAME / --zone CLUSTER_LOCATION
View the worker pods and select the pod to connect to:
kubectl get pods --all-namespaces | grep worker
SSH to the worker pod:
kubectl -n NAMESPACE exec -it POD_NAME -c airflow-worker -- /bin/bash
For example:
kubectl -n composer-1-6-0-airflow-1-10-1-9670c487 / exec -it airflow-worker-54c6b57789-66pnr -c airflow-worker -- /bin/bash airflow@airflow-worker-54c6b57789-66pnr:~$
Use the
sqlalchemy
library to interact with the Airflow database:python import airflow.configuration as config config.conf.get('core', 'sql_alchemy_conn')