Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3
This page describes how to troubleshoot issues with tasks run by KubernetesExecutor and provides solutions for common issues.
General approach to troubleshooting KubernetesExecutor
To troubleshoot issues with a task executed with KubernetesExecutor, do the following actions in the listed order:
Check logs of the task in the DAG UI or Airflow UI.
Check scheduler logs in Google Cloud console:
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your environment. The Environment details page opens.
Go to the Logs tab and check the Airflow logs > Scheduler section.
For a given time range, inspect the KubernetesExecutor worker pod that was running the task. If the pod no longer exists, skip this step. The pod has the
airflow-k8s-worker
prefix and a DAG or a task name in its name. Look for any reported issues such as a failed task or the task being unschedulable.
Common troubleshooting scenarios for KubernetesExecutor
This section lists common troublehooting scenarions that you might encounter with KubernetesExecutor.
The task gets to the Running
state, then fails during the execution.
Symptoms:
- There are logs for the task in Airflow UI and on the Logs tab in the Workers section.
Solution: The task logs indicate the problem.
Task instance gets to the queued
state, then it is marked as UP_FOR_RETRY
or FAILED
after some time.
Symptoms:
- There are no logs for task in Airflow UI and on the Logs tab in the Workers section.
- There are logs on the Logs tab in the Scheduler section with a
message that the task is marked as
UP_FOR_RETRY
orFAILED
. - The
airflow-k8s-worker-*.*
pod with the name of DAG/task inside the name of pod hasFailed
/Pending
state or it is absent.
Solution:
- Inspect scheduler logs for any details of the issue.
Possible causes:
If the scheduler logs contain the
Adopted tasks were still pending after...
message followed by the printed task instance, check that CeleryKubernetesExecutor is enabled in your environment.
The task instance gets to the Queued
state and is immediately marked as UP_FOR_RETRY
or FAILED
Symptoms:
- There are no logs for the task in Airflow UI and on the Logs tab in the Workers section.
- The scheduler logs on the Logs tab in the Scheduler section has
the
Pod creation failed with reason ... Failing task
message, and the message that the task is marked asUP_FOR_RETRY
orFAILED
.
Solution:
- Check scheduler logs for the exact response and failure reason.
Possible reason:
If the error message is quantities must match the regular expression ...
,
then the issue is most-likely caused by a custom values set for k8s
resources (requests/limits) of task worker pods.