This page describes how to deploy the Airflow web server to a Cloud Composer environment's Kubernetes cluster. Use this guide if you:
- Require control over where the Airflow web server is deployed.
- Have a DAG that must be imported from a consistent set of IP addresses, such as for authentication with on-premises systems.
Before you begin
- Be familiar with How to deploy a workload to Google Kubernetes Engine.
- Install the Cloud SDK.
- Create a Cloud Composer environment.
- You'll be deploying the Airflow web server to a worker machine type, which has fewer vCPUs than the default web server and is shared with the Airflow workers. Depending on load size, you might need to increase the number of worker nodes.
- Cloud Composer garbage collection can remove older images. As a best practice, synchronize the web server image path/tag with the image that the scheduler and workers are running—each time you install a package or upgrade versions. To do so, retrieve the image name from the scheduler pod configuration and use that value to update your self-managed web server.
Determine the Cloud Composer environment's GKE cluster
Use the gcloud composer environments describe
command to show the
properties of a Cloud Composer environment, including the
GKE cluster.
The cluster is listed as the gkeCluster
.
Also take note of the zone where the cluster is deployed, for example
us-central1-b
, by looking at the last part of the location
property (config
> nodeConfig
> location
).
gcloud composer environments describe ENVIRONMENT_NAME \ --location LOCATION
where:
ENVIRONMENT_NAME
is the name of the environment.LOCATION
is the Compute Engine region where the environment is located.
This document now refers to the cluster as ${GKE_CLUSTER}
and the zone as ${GKE_LOCATION}
.
Connect to the GKE cluster
Use gcloud
to connect the kubectl
command to the cluster.
gcloud container clusters get-credentials ${GKE_CLUSTER} --zone ${GKE_LOCATION}
Get the pod configuration for the scheduler
The Airflow web server uses the same Docker image as the Airflow scheduler, so get the configuration of the scheduler pod to use as a starting point.
kubectl get pods --all-namespaces
Look for a pod with a name like airflow-scheduler-1a2b3c-x0yz
. Get the
configuration for the scheduler pod and write it to airflow-webserver.yaml
.
kubectl get pod -n NAMESPACE airflow-scheduler-1a2b3c-x0yz -o yaml > airflow-webserver.yaml
where NAMESPACE
is the namespace in which the scheduler pod runs, such as
composer-1-7-2-airflow-1-9-0-4d5e6f
.
Create the web server deployment configuration
Modify airflow-webserver.yaml
in a plain text editor to create a web server
deployment configuration.
Replace the
apiVersion
,kind
, andmetadata
sections with the following deployment configuration. Do not delete the originalspec
section. You use it at a later step.apiVersion: apps/v1 kind: Deployment metadata: name: airflow-webserver labels: run: airflow-webserver spec: replicas: 1 selector: matchLabels: run: airflow-webserver strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: labels: run: airflow-webserver
Replace
airflow-scheduler
withairflow-webserver
in labels and names. Note: the web server container image does not change. The same image is used for workers, the scheduler, and the web server.Delete the
status
section and all sections that are nested inside it.Indent the original
spec
section so thatspec
is a key for thetemplate
section.Replace
- scheduler
with- webserver
in the- args:
section.Replace the
livenessProbe
section with one that polls the health endpoint.livenessProbe: exec: command: - curl - localhost:8080/_ah/health
Create the web server service configuration
Create a service configuration file called airflow-webserver-service.yaml
.
apiVersion: v1 kind: Service metadata: name: airflow-webserver-service labels: run: airflow-webserver spec: ports: - port: 8080 protocol: TCP targetPort: 8080 selector: run: airflow-webserver sessionAffinity: None type: ClusterIP
Deploy the web server
Deploy the web server pod.
kubectl create -n NAMESPACE -f airflow-webserver.yaml
Deploy the web server service.
kubectl create -n NAMESPACE -f airflow-webserver-service.yaml
Connect to the web server
Because the deployment uses ClusterIP
, the web server is not accessible from
outside the Kubernetes cluster without using a proxy.
Find the web server pod.
kubectl get pods --all-namespaces
Forward the web server port to your local machine.
kubectl -n NAMESPACE port-forward airflow-webserver-1a2b3cd-0x9yz 8080:8080
Open the Airflow web server in your web browser at http://localhost:8080/admin/.