Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3
Apache Airflow includes a web user interface (UI) that you can use to manage workflows (DAGs), manage the Airflow environment, and perform administrative actions. For example, you can use the web interface to review the progress of a DAG, set up a new data connection, or review logs from previous DAG runs.
Airflow web server
Each Cloud Composer environment has a web server that runs the Airflow web interface. The web server is a part of Cloud Composer environment architecture.
The web server parses the DAG definition files
in the dags/
folder and must
be able to access a DAG's data and resources to load the DAG and serve HTTP requests.
The web server refreshes the DAGs every 60 seconds, which is the default
worker_refresh_interval
in Cloud Composer. A web server error can
occur if the web server cannot parse all the DAGs within the refresh interval.
Exceeding 60 seconds to load DAGs can occur if there are a large number of DAG
files or there is a non-trivial workload to load the DAG files. To ensure that
web server remains accessible regardless of DAG load time, you can
configure asynchronous DAG loading to parse and load DAGs
in the background at a pre-configured interval (available in
composer-1.7.1-airflow-1.10.2
and later versions).
This configuration can also reduce DAG refresh time.
Other than exceeding the worker refresh interval, the web server can gracefully handle DAG loading failures in most cases. DAGs that cause the web server to crash or exit might cause errors to be returned in the browser. For information, see Troubleshooting DAGs.
If you continue to experience web server issues due to DAG parsing, we recommend that you use asynchronous DAG loading.
Before you begin
You must have a role that can view Cloud Composer environments. For more information, see Access control.
During the environment creation, Cloud Composer configures the URL for the web server that runs the Airflow web interface. The URL is non-customizable.
- The Airflow UI Access Control (Airflow Role-Based Access Control) feature for the Airflow web interface is supported for Cloud Composer environments running Composer version 1.13.4 or later, Airflow version 1.10.10 or later, and Python 3.
Accessing the Airflow web interface
The Airflow web server service is deployed to the appspot.com
domain and
provides access to the Airflow web interface. Cloud Composer 1 provides
access to the interface based on user identities and IAM
policy bindings defined for users. Cloud Composer 1 uses Identity-Aware Proxy
for this purpose.
After creating a new Cloud Composer environment, it takes up to 25 minutes for the web interface to finish hosting and become accessible.
Accessing the web interface from the Google Cloud console
To access the Airflow web interface from the Google Cloud console:
In the Google Cloud console, go to the Environments page.
In the Airflow webserver column, follow the Airflow link for your environment.
Log in with the Google account that has the appropriate permissions.
Limiting access to the Airflow web server
Composer environments let you limit access to the Airflow web server:
- You can block all access, or allow access from specific IPv4 or IPv6 external IP ranges.
- It's not possible to configure the allowed IP ranges using private IP addresses.
Retrieving the web interface URL via the gcloud
command-line tool
You can access the Airflow web interface from any web browser. To get the URL
for the web interface, enter the following gcloud
command:
gcloud composer environments describe ENVIRONMENT_NAME \
--location LOCATION
Replace the following:
ENVIRONMENT_NAME
: the name of your environment.LOCATION
: the region where the environment is located.
The gcloud
command shows the properties of a Cloud Composer
environment, including the URL for the web interface. The URL is
listed as airflowUri
.
config:
airflowUri: https://example-tp.appspot.com
Configuring asynchronous DAG loading
When asynchronous DAG loading is enabled, the Airflow web server
creates a new process. This process loads DAGs in the background,
sends newly loaded DAGs on intervals defined by the dagbag_sync_interval
option, and then sleeps.
The process wakes up periodically to reload DAGs, the interval is defined by the collect_dags_interval
option.
To enable asynchronous DAG loading:
Disable DAG serialization. Asynchronous DAG loading cannot be used with DAG serialization. Using
async_dagbag_loader
andstore_serialized_dags
Airflow configuration options produces HTTP 503 errors and breaks your environment.Override the following Airflow configuration options:
Section Key Value Notes webserver
async_dagbag_loader
True
The default is False
.webserver
collect_dags_interval
30
The default is 30
. Use a smaller value for faster refreshes.webserver
dagbag_sync_interval
10
The default is 10
.webserver
worker_refresh_interval
3600
The default is 60
. With asynchronous DAG loading, you can use a longer refresh interval.
Restarting the web server
When debugging or troubleshooting Cloud Composer environments, some issues
may be resolved by restarting the Airflow web server. You can restart the web
server using the restartWebServer API
or the restart-web-server
gcloud command:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME \
--location=LOCATION
Configuring web server network access
The Airflow web server access parameters don't depend on the type of your environment. Instead, you configure web server access separately. For example, a Private IP environment can still have the Airflow UI accessible from the internet.
It is not possible to configure the allowed IP ranges using private IP addresses.
Console
In the Google Cloud console, go to the Environments page.
In the list of environments, click the name of your environment. The Environment details page opens.
Go to the Environment configuration tab.
In the Network configuration section, find the Web server access control item and click Edit.
In the Web server network access control dialog:
To provide access to the Airflow web server from all IP addresses, select Allow access from all IP addresses.
To restrict access only to specific IP ranges, select Allow access only from specific IP addresses. In the IP range field, specify an IP range in the CIDR notation. In the Description field, specify an optional description for this range. If you want to specify more than one range, click Add IP range.
To forbid access for all IP addresses, select Allow access only from specific IP addresses and click Delete item next to the empty range entry.
gcloud
When you update an environment, the following arguments control web server access parameters:
--web-server-allow-all
provides access to Airflow from all IP addresses. This is the default option.--web-server-allow-ip
restricts access only to specific source IP ranges. To specify several IP ranges, use this argument multiple times.--web-server-deny-all
forbids access for all IP addresses.
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--web-server-allow-ip ip_range=WS_IP_RANGE,description=WS_RANGE_DESCRIPTION
Replace the following:
ENVIRONMENT_NAME
: the name of your environment.LOCATION
: the region where the environment is located.WS_IP_RANGE
: the IP range, in the CIDR notation, that can access the Airflow UI.WS_RANGE_DESCRIPTION
: the description of the IP range.
Example:
gcloud composer environments update example-environment \
--location us-central1 \
--web-server-allow-ip ip_range=192.0.2.0/24,description="office net 1" \
--web-server-allow-ip ip_range=192.0.4.0/24,description="office net 3"
API
Construct an [
environments.patch
][api-patch] API request.In this request:
In the
updateMask
parameter, specify theconfig.webServerNetworkAccessControl
mask.In the request body, specify how Airflow task logs must be saved:
To provide access to Airflow from all IP addresses, specify an empty
config
element (thewebServerNetworkAccessControl
element must not be present).To restrict access only to specific IP ranges, specify one or more ranges in
allowedIpRanges
.To forbid access for all IP addresses, specify an empty
webServerNetworkAccessControl
element. ThewebServerNetworkAccessControl
element must be present, but must not contain anallowedIpRanges
element.
{
"config": {
"webServerNetworkAccessControl": {
"allowedIpRanges": [
{
"value": "WS_IP_RANGE",
"description": "WS_RANGE_DESCRIPTION"
}
]
}
}
}
Replace the following:
WS_IP_RANGE
: the IP range, in the CIDR notation, that can access the Airflow UI.WS_RANGE_DESCRIPTION
: the description of the IP range.
Example:
// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.webServerNetworkAccessControl
{
"config": {
"webServerNetworkAccessControl": {
"allowedIpRanges": [
{
"value": "192.0.2.0/24",
"description": "office net 1"
},
{
"value": "192.0.4.0/24",
"description": "office net 3"
}
]
}
}
}
Terraform
In the allowed_ip_range
block, in the web_server_network_access_control
specify IP ranges that can access web server.
resource "google_composer_environment" "example" {
provider = google-beta
name = "ENVIRONMENT_NAME"
region = "LOCATION"
config {
web_server_network_access_control {
allowed_ip_range {
value = "WS_IP_RANGE"
description = "WS_RANGE_DESCRIPTION"
}
}
}
}
Replace:
WS_IP_RANGE
with the IP range, in the CIDR notation, that can access Airflow UI.WS_RANGE_DESCRIPTION
with the description of the IP range.
Example:
resource "google_composer_environment" "example" {
provider = google-beta
name = "example-environment"
region = "us-central1"
config {
web_server_network_access_control {
allowed_ip_range {
value = "192.0.2.0/24"
description = "office net 1"
},
allowed_ip_range {
value = "192.0.4.0/24"
description = "office net 3"
}
}
}