Installing Python dependencies

This page describes how to install Python packages and connect to your Cloud Composer environment from a few common applications.

Dependencies are installed with the existing Python dependencies that are included in the base environment.

If your environment requires a specific package, we recommend that you explicitly install the package to avoid issues due to package changes across Cloud Composer image versions. Do not rely on the pre-installed packages in the Cloud Composer version that is running in your environment.

Options for managing dependencies

If your Python dependency has no external dependencies and does not conflict with Cloud Composer's dependencies, you can install Python dependencies from the Python Package Index. You can also install a Python dependency from private package repository.

For other requirements, here are a few options.

Option Use if ...
Local Python library Your Python dependency can't be found the Python Package Index, and the library does not have any external dependencies, such as dist-packages.
Plugins feature You want to use plugin-specific functionality, such as modifying the Airflow web interface.
PythonVirtualenvOperator Your Python dependency can be found on the Python Package Index and has no external dependencies. However, you don't want your Python dependency to be installed for all workers, or the dependency conflicts with dependencies required for Cloud Composer.
KubernetesPodOperator

You require external dependencies that can't be installed from pip, such as dist-packages, or are on an internal pip server.

This option requires more setup and maintenance and should generally be considered if the other options do not work.

Before you begin

  • The following permission is required to install Python packages in the Cloud Composer environment: composer.environments.update. For more information, see Cloud Composer Access Control.
  • Currently, VPC Service Controls does not support Cloud Composer. To install PyPI dependencies, you need to grant additional user identities with access to services that the service perimeter protects and enable support for a private PyPI repository.
  • Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.
  • When you install custom Python dependencies by using the API, all Cloud Composer processes run with newly-installed PyPI dependencies.
  • Custom PyPI dependencies might cause conflicts with dependencies that Airflow requires, causing instability.
  • Before deploying to production, we recommend that you test your PyPI packages locally in an Airflow worker container.

Viewing installed Python packages

To see the installed Python packages in your environment:

  1. Connect to the GKE cluster for your environment.
  2. Connect to a pod. To access pods in the GKE cluster, use namespace-aware kubectl commands. To view all namespaces, use kubectl get pods --all-namespaces.
  3. Run pip freeze.

For example:

gcloud container clusters get-credentials projects/composer-test-1/zones/us-central1-f/clusters/us-central1-1223232-gke --zone us-central1-f
Fetching cluster endpoint and auth data.
kubeconfig entry generated for us-central1-quickstart-f5da909c-gke.
~ (composer-test-1)$ kubectl exec -itn composer-1-7-2-airflow-1-9-0-0a9f265b airflow-worker-7858d4fb79-2dr9j -- /bin/bash
...
~ (composer-test-1)$ pip freeze

absl-py==0.7.1
adal==1.2.0
asn1crypto==0.24.0
astor==0.8.0
attrs==19.1.0
autopep8==1.4.4
...

Installing a Python dependency from PyPi

Your Python dependency must not have external dependencies or conflict with Cloud Composer's dependencies to install Python dependencies from the Python Package Index.

To add, update, or delete the Python dependencies for your environment:

Console

Specify the package name and version specifiers as shown:

  • "pi-python-client", "==1.1.post1"
  • "go-api-python-client", "==1.0.0.dev187"

For a package without the version specifier, use an empty string for the value, such as "glob2", " ".

To access an environment's Python dependencies, navigate to the PyPi dependencies page using the following steps:

  1. Open the Environments page in the Google Cloud Platform Console.

    Open the Environments page

  2. Click the Name of the environment you want to install, update, or delete Python dependencies for.

  3. Select the PyPi dependencies tab.

  4. Click the Edit button.

  5. To add a new dependency:

    1. Click the Add dependency button.

    2. Enter the name and version of your library in the Name and Version fields.

  6. To update an existing dependency:

    1. Select the Name and/or Version field of the library you want to update.

    2. Enter a new value.

  7. To delete a dependency:

    1. Hover over the name of the dependency to delete.

    2. Click the trash can icon that appears.

gcloud

Pass a requirements.txt file to the gcloud command-line tool. Format the file with each requirement specifier on a separate line.

Sample requirements.txt file:

scipy>=0.13.3
scikit-learn
nltk[machine_learning]

Pass the requirements.txt file to the environments.set-python-dependencies command to set your installation dependencies.

gcloud composer environments update ENVIRONMENT-NAME \\
--update-pypi-packages-from-file requirements.txt \\
--location LOCATION

The command terminates when the operation is finished. To avoid waiting, use the --async flag.

If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.

rest

Use the projects.locations.environments.patch method, specifying config.softwareConfig.pypiPackages as the prefix for the updateMask query parameter.

Installing a Python dependency from private repository

You can install packages hosted in private package repositories. The packages must be properly configured packages that the default pip tool can install.

To install from a private package repository:

  1. If the service account for your Cloud Composer environment does not have a project.editor role, grant the iam.serviceAccountUser role to the service account.

  2. Create a pip.conf file and include the following information in the file if applicable:

    • Access credentials for the repository
    • Non-default pip installation options

      Example:

      [global]
      extra-index-url=https://my-example-private-repo.com/
      

  3. Upload the pip.conf file to your environment's Cloud Storage bucket and place in the directory path /config/pip/, for example: gs://us-central1-b1-6efannnn-bucket/config/pip/pip.conf

Installing a local Python library

To install an in-house or local Python library:

  1. Place the dependencies within a subdirectory in the dags/ folder. To import a module from a subdirectory, each subdirectory in the module's path must contain a __init__.py package marker file.

    In this example, the dependency is coin_module.py:

    dags/
      use_local_deps.py  # A DAG file.
      dependencies/
        __init__.py
        coin_module.py
    
  2. Import the dependency from the DAG definition file.

    For example:

    from dependencies import coin_module

Using Python packages that depend on shared object libraries

Certain PyPI packages depend on system-level libraries. While Cloud Composer does not support system libraries, you can use the following options:

  1. Use the KubernetesPodOperator. Set the Operator image to a custom build image. If you experience packages that fail during installation due to an unmet system dependency, use this option.

  2. Upload the shared object libraries to your environment's Cloud Storage bucket.

    1. Manually find the shared object libraries for the PyPI dependency (an .so file).
    2. Upload the shared object libraries to /home/airflow/gcs/plugins.
    3. Set the following Cloud Composer environment variable: LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/airflow/gcs/plugins

    If your PyPI packages have installed successfully but fail at runtime, this is an option.

Connecting to the Flower web interface

Flower is a web-based tool for working with Celery clusters. Flower is pre-installed in your environment. You can use its web UI to monitor the Apache Airflow workers for your environment.

To access Flower:

  1. To determine the Kubernetes Engine cluster, view your environment:

    gcloud composer environments describe ENVIRONMENT-NAME /
        --location LOCATION

    The cluster is listed as the gkeCluster. The zone where the cluster is deployed is listed as the location.

    For example:

          gcloud composer environments describe environment-name --location us-central1
          config:
            airflowUri: https://uNNNNe0aNNbcd3fff-tp.appspot.com
            dagGcsPrefix: gs://us-central1-may18-test-00a47695-bucket/dags
            gkeCluster: projects/example-project/zones/us-central1-a/clusters/us-central1-environment-name-00a47695-gke
            nodeConfig:
              diskSizeGb: 100
              location: projects/example-project/zones/us-central1-a

    In the example, the cluster is us-central1-environment-name-00a47695-gke, and the zone is us-central1-a. This information is also available on the Environment details page in the GCP Console.

  2. Connect to the Kubernetes Engine cluster:

    gcloud container clusters get-credentials CLUSTER_NAME /
        --zone CLUSTER_ZONE

    For example:

    gcloud container clusters get-credentials us-central1-environment-name-00a47695-gke --zone us-central1-a
    
    Fetching cluster endpoint and auth data.
    kubeconfig entry generated for us-central1-environment-name-00a47695-gke.
  3. View the worker pods and select the pod to run Flower on:

    kubectl get pods --all-namespaces | grep worker

    For example:

    kubectl get pods --all-namespaces | grep worker
    
    airflow-worker-89696c45f-49rkb      2/2       Running   1          29d
    airflow-worker-89696c45f-gccmm      2/2       Running   1          29d
    airflow-worker-89696c45f-llnnx      2/2       Running   0          29d

    The pod names match the regex "airflow-(worker|scheduler)-[-a-f0-9]+").

  4. Run Flower on the worker pod:

    kubectl exec -n NAMESPACE -it POD_NAME -c airflow-worker -- celery flower /
        --broker=redis://airflow-redis-service:6379/0 --port=5555

    For example:

    kubectl exec -n composer-1-6-0-airflow-1-10-1-9670c487 -it airflow-worker-89696c45f-llnnx /
        -c airflow-worker -- celery flower --broker=redis://airflow-redis-service:6379/0 --port=5555
    
    [I 180601 20:35:55 command:139] Visit me at http://localhost:5555
    [I 180601 20:35:55 command:144] Broker: redis://airflow-redis-service:6379/0
  5. In a separate terminal session, forward the local port to Flower:

    kubectl -n NAMESPACE port-forward POD_NAME 5555

    For example:

    kubectl -n composer-1-6-0-airflow-1-10-1-9670c487 port-forward airflow-worker-c5c4b58c7-zhmkv 5555
    
    Forwarding from 127.0.0.1:5555 -> 5555
  6. To access the web UI, go to http://localhost:5555 in your local browser.

Installing SQLAlchemy to access the Airflow database

SQLAlchemy is a Python SQL toolkit and Object Relational Mapper (ORM). You can install SQLAlchemy and use it to access the Cloud SQL instance for Cloud Composer. During installation, Cloud Composer configures the Airflow environment variable AIRFLOW__CORE__SQL_ALCHEMY_CONN.

To install SQL Alchemy:

  1. Install sqlalchemy in your environment.

    gcloud composer environments update ENVIRONMENT-NAME /
        --location LOCATION /
        --update-pypi-package "sqlalchemy"
    
  2. To determine the Kubernetes Engine cluster, view your environment:

    gcloud composer environments describe ENVIRONMENT-NAME /
        --location LOCATION
  3. Connect to the Kubernetes Engine cluster:

    gcloud container clusters get-credentials CLUSTER_NAME /
        --zone CLUSTER_LOCATION
  4. View the worker pods and select the pod to connect to:

    kubectl get pods --all-namespaces | grep worker
  5. SSH to the worker pod:

    kubectl -n NAMESPACE exec -it POD_NAME -c airflow-worker -- /bin/bash

    For example:

    kubectl -n composer-1-6-0-airflow-1-10-1-9670c487 /
        exec -it airflow-worker-54c6b57789-66pnr -c airflow-worker -- /bin/bash
    airflow@airflow-worker-54c6b57789-66pnr:~$

  6. Use the sqlalchemy library to interact with the Airflow database:

    python
    import airflow.configuration as config
    config.conf.get('core', 'sql_alchemy_conn')
Σας βοήθησε αυτή η σελίδα; Πείτε μας τη γνώμη σας:

Αποστολή σχολίων σχετικά με…

Αυτή η σελίδα