Apache Airflow includes a web interface that you can use to manage workflows (DAGs), manage the Airflow environment, and perform administrative actions. For example, you can use the web interface to review the progress of a DAG, set up a new data connection, or review logs from previous DAG runs.
Airflow web server
Each Cloud Composer environment has a web server that runs the Airflow web interface. The web server is separate from your environment's GKE cluster and runs on an App Engine instance with a fixed machine type.
The web server parses the DAG definition files
dags/ folder and must
be able to access a DAG's data and resources to load the DAG and serve HTTP requests.
The web server refreshes the DAGs every 60 seconds, which is the default
in Cloud Composer. A web server error can occur if the web server cannot
parse all the DAGs within the refresh interval.
Exceeding 60 seconds to load DAGs can occur if there are a large number of DAG files or
there is a non-trivial workload to load the DAG files. To ensure that web server
remains accessible regardless of DAG load time, you can
configure asynchronous DAG loading to parse and load DAGs
in the background at a pre-configured interval (available in
composer-1.7.1-airflow-1.10.2 and later versions).
This configuration can also reduce DAG refresh time.
Other than exceeding the worker refresh interval, the web server can gracefully handle DAG loading failures in most cases. DAGs that cause the web server to crash or exit might cause errors to be returned in the browser. For information, see Troubleshooting DAGs.
If you continue to experience web server issues due to DAG parsing, we recommend that you use asynchronous DAG loading.
Restarting the web server (Preview)
When debugging or troubleshooting Cloud Composer environments, some issues
may be resolved by restarting the Airflow web server. You can restart the web
server using the restartWebServer API
restart-web-server gcloud command:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION
Before you begin
The following permission is required to access the Airflow web server in the Cloud Composer environment:
composer.environments.get. For more information, see Cloud Composer Access Control.
During environment creation, Cloud Composer configures the URL for the web server that runs the Airflow web interface. The URL is non-customizable.
The Roles-Based Access Control (RBAC) feature for the Airflow web interface is supported for Cloud Composer environments running Composer version 1.13.4 or newer, Airflow version 1.10.10 or newer, and Python 3.
Accessing the web interface
The Airflow web server service is deployed to the
appspot.com domain and provides access to the Airflow web interface. Identity-Aware Proxy
protects the interface, guarding access based on user identities.
After creating a new Cloud Composer environment, it takes up to 25 minutes for the web interface to finish hosting and become accessible.
Accessing the web interface via the Google Cloud Console
To access the Airflow web interface from the Google Cloud Console:
- To view your existing Cloud Composer environments, open the Environments page.
- In the Airflow webserver column, click the new window icon for the environment whose Airflow web interface you want to view.
- Log in with the Google account that has the appropriate permissions.
Limiting access to the Airflow web server
Composer environments let you to limit access to the Airflow web server.
You can block all access, or allow access from specific IPv4 or IPv6 external IP ranges.
Currently you cannot configure the allowed IP ranges using private IP addresses.
Retrieving the web interface URL via the
gcloud command-line tool
You can access the Airflow web interface from any web browser. To get the URL for the
web interface, enter the following
gcloud composer environments describe ENVIRONMENT_NAME \ --location LOCATION
ENVIRONMENT_NAMEis the name of the environment.
LOCATIONis the Compute Engine region where the environment is located.
gcloud command shows the properties of a Cloud Composer
environment, including the URL for the web interface. The URL is
airflowUri: https://uexamplebcd3fff-tp.appspot.com/ dagGcsPrefix: gs://us-central1-example-environment-00a47695-bucket/dags gkeCluster: projects/example-project/zones/us-central1-a/clusters/us-central1-example-environment-00a47695-gke nodeConfig: diskSizeGb: 100 location: projects/example-project/zones/us-central1-a machineType: projects/example-project/zones/us-central1-a/machineTypes/n1-standard-1 network: projects/example-project/global/networks/default oauthScopes: - https://www.googleapis.com/auth/cloud-platform serviceAccount: N13597NNN465firstname.lastname@example.org nodeCount: 3 softwareConfig: imageVersion: composer-0.5.1-airflow-1.9.0 createTime: '2018-05-19T02:13:36.749Z' name: projects/example-project/locations/us-central1/environments/example-environment state: RUNNING updateTime: '2018-05-19T02:30:21.387Z' uuid: 66bd6a28-5b48-4da3-a0aa-898199b569da
Configuring asynchronous DAG loading
With asynchronous DAG loading (
webserver-async_dagbag_loader), the web server
creates a new process. The process loads DAGs in the background,
sends newly loaded DAGs (
dagbag_sync_interval), and then sleeps.
The process wakes up periodically to reload DAGs (
composer-1.7.1-airflow-1.10.2 or later.
To configure asynchronous DAG loading, override the following Airflow configurations:
|Section and Configuration||Notes|
|webserver-async_dagbag_loader = True||The default is False.|
|webserver-collect_dags_interval = 30||The default is 30. Use a smaller value for faster refreshes.|
|webserver-dagbag_sync_interval = 10||The default is 10.|
|webserver-worker_refresh_interval = 3600||The default is 60. With asynchronous DAG loading, you can use a larger refresh interval.|
Please note that the DAG serialization feature must be disabled when using asynchronous DAG loading.