Installing Python Dependencies

This page describes how to install Python packages and connect to your Cloud Composer environment from a few common applications.

Options

Dependencies are installed with the existing Python dependencies that are included in the base environment.

If your Python dependency has no external dependencies and does not conflict with Cloud Composer's dependencies, you can install Python dependencies from the Python Package Index by using the GCP Console, Cloud SDK, or Cloud Composer API.

If you have other requirements, here are a few options.

Option Use if ...
Local Python library Your Python dependency can't be found the Python Package Index, and the library does not have any external dependencies, such as dist-packages.
Plugins feature You want to use plugin-specific functionality, such as modifying the Airflow web interface.
PythonVirtualenvOperator Your Python dependency can be found on the Python Package Index and has no external dependencies. However, you don't want your Python dependency to be installed for all workers, or the dependency conflicts with dependencies required for Cloud Composer.
KubernetesPodOperator

You require external dependencies that can't be installed from pip, such as dist-packages.

This option requires more setup and maintenance and should generally be considered if the other options do not work.

Before you begin

  • The following permission is required to install Python packages in the Cloud Composer environment: composer.environments.update. For more information, see Cloud Composer Access Control.
  • Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.
  • When you install custom Python dependencies by using the API, all Cloud Composer processes run with newly-installed PyPI dependencies.
  • Custom PyPI dependencies might cause conflicts with dependencies that Airflow requires, causing instability.

Installing a Python dependency from PyPi

To add, update, or delete the Python dependencies for your environment:

Console

Specify the package name and version specifiers as shown:

  • "pi-python-client", "==1.1.post1"
  • "go-api-python-client", "==1.0.0.dev187"

For a package without the version specifier, use an empty string for the value, such as "glob2", " ".

To access an environment's Python dependencies, navigate to the PyPi dependencies page using the following steps:

  1. Open the Environments page in the Google Cloud Platform Console.

    Open the Environments page

  2. Click the Name of the environment you want to install, update, or delete Python dependencies for.

  3. Select the PyPi dependencies tab.

  4. Click the Edit button.

  5. To add a new dependency:

    1. Click the Add dependency button.

    2. Enter the name and version of your library in the Name and Version fields.

  6. To update an existing dependency:

    1. Select the Name and/or Version field of the library you want to update.

    2. Enter a new value.

  7. To delete a dependency:

    1. Hover over the name of the dependency to delete.

    2. Click the trash can icon that appears.

gcloud

Pass a requirements.txt file to the gcloud command-line tool. Format the file with each requirement specifier on a separate line.

Sample requirements.txt file:

scipy>=0.13.3
scikit-learn
nltk[machine_learning]

Pass the requirements.txt file to the environments.set-python-dependencies command to set your installation dependencies.

gcloud composer environments update ENVIRONMENT-NAME \
--update-pypi-packages-from-file requirements.txt \
--location LOCATION

The command terminates when the operation is finished. To avoid waiting, use the --async flag.

If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.

rest

Use the projects.locations.environments.patch method, specifying config.softwareConfig.pypiPackages as the prefix for the updateMask query parameter.

Installing a local Python library

To install an in-house or local Python library:

  1. Place the dependencies within a subdirectory in the dags/ folder. To import a module from a subdirectory, each subdirectory in the module's path must contain a __init__.py package marker file.

    In this example, the dependency is coin_module.py:

    dags/
      use_local_deps.py  # A DAG file.
      dependencies/
        __init__.py
        coin_module.py
    
  2. Import the dependency from the DAG definition file.

    For example:

    from dependencies import coin_module

Connecting to the Flower web interface

Flower is a web-based tool for working with Celery clusters. Flower is pre-installed in your environment. You can use its web UI to monitor the Apache Airflow workers for your environment.

To access Flower:

  1. To determine the Kubernetes Engine cluster, view your environment:

    gcloud composer environments describe ENVIRONMENT-NAME /
        --location LOCATION

    The cluster is listed as the gkeCluster. The zone where the cluster is deployed is listed as the location.

    For example:

          gcloud composer environments describe environment-name --location us-central1
          config:
            airflowUri: https://uNNNNe0aNNbcd3fff-tp.appspot.com
            dagGcsPrefix: gs://us-central1-may18-test-00a47695-bucket/dags
            gkeCluster: projects/example-project/zones/us-central1-a/clusters/us-central1-environment-name-00a47695-gke
            nodeConfig:
              diskSizeGb: 100
              location: projects/example-project/zones/us-central1-a

    In the example, the cluster is us-central1-environment-name-00a47695-gke, and the zone is us-central1-a. This information is also available on the Environment details page in the GCP Console.

  2. Connect to the Kubernetes Engine cluster:

    gcloud container clusters get-credentials CLUSTER_NAME /
    --zone CLUSTER_ZONE

    For example:

    gcloud container clusters get-credentials us-central1-environment-name-00a47695-gke --zone us-central1-a
    Fetching cluster endpoint and auth data.
    kubeconfig entry generated for us-central1-environment-name-00a47695-gke.

  3. View the worker pods and select the pod to run Flower on:

    kubectl get pods

    For example:

    kubectl get pods
    NAME                                 READY     STATUS    RESTARTS   AGE
    airflow-redis-67f555bdb8-n6m9k       1/1       Running   0          13d
    airflow-scheduler-6cdf4f4ff7-dm4dm   2/2       Running   0          1h
    airflow-sqlproxy-54497bd557-nlqtg    1/1       Running   0          13d
    airflow-worker-c5c4b58c7-bl5bf       2/2       Running   0          1h
    airflow-worker-c5c4b58c7-szqhm       2/2       Running   0          1h
    airflow-worker-c5c4b58c7-zhmkv       2/2       Running   0          1h

    The pod names match the regex "airflow-(worker|scheduler)-[-a-f0-9]+").

  4. Run Flower on the worker pod:

    kubectl exec -it POD_NAME -c airflow-worker -- celery flower /
        --broker=redis://airflow-redis-service:6379/0 --port=5555

    For example:

    kubectl exec -it airflow-worker-c5c4b58c7-zhmkv -c airflow-worker -- celery flower
    --broker=redis://airflow-redis-service:6379/0 --port=5555
    [I 180601 20:35:55 command:139] Visit me at http://localhost:5555
    [I 180601 20:35:55 command:144] Broker: redis://airflow-redis-service:6379/0

  5. In a separate terminal session, forward the local port to Flower:

    kubectl port-forward POD_NAME 5555

    For example:

    kubectl port-forward airflow-worker-c5c4b58c7-zhmkv 5555
    Forwarding from 127.0.0.1:5555 -> 5555

  6. To access the web UI, go to http://localhost:5555 in your local browser.

Installing SQLAlchemy to access the Airflow database

SQLAlchemy is a Python SQL toolkit and Object Relational Mapper (ORM). You can install SQLAlchemy and use it to access the Cloud SQL instance for Cloud Composer. During installation, Cloud Composer configures the Airflow environment variable AIRFLOW__CORE__SQL_ALCHEMY_CONN.

To install SQL Alchemy:

  1. Install sqlalchemy in your environment.

    gcloud composer environments update ENVIRONMENT-NAME /
        --location LOCATION /
        --update-pypi-package "sqlalchemy"
    

  2. To determine the Kubernetes Engine cluster, view your environment:

    gcloud composer environments describe ENVIRONMENT-NAME /
        --location LOCATION

  3. Connect to the Kubernetes Engine cluster:

    gcloud container clusters get-credentials CLUSTER_NAME /
        --zone CLUSTER_LOCATION

  4. View the worker pods and select the pod to connect to:

    kubectl get pods

  5. SSH to the worker pod:

    kubectl exec -it POD_NAME -- /bin/bash

    For example:

    kubectl exec -it airflow-worker-54c6b57789-66pnr -- /bin/bash
    Defaulting container name to airflow-worker.
    Use 'kubectl describe pod/airflow-worker-54c6b57789-66pnr' to see all of the containers in this pod.
    airflow@airflow-worker-54c6b57789-66pnr:/$

  6. Use the sqlalchemy library to interact with the Airflow database:

    python
    import airflow.configuration as config
    config.conf.get('core', 'sql_alchemy_conn')

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Composer