Troubleshooting KubernetesExecutor tasks

Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3

This page describes how to troubleshoot issues with tasks run by KubernetesExecutor and provides solutions for common issues.

General approach to troubleshooting KubernetesExecutor

To troubleshoot issues with a task executed with KubernetesExecutor, do the following actions in the listed order:

  1. Check logs of the task in the DAG UI or Airflow UI.

  2. Check scheduler logs in Google Cloud console:

    1. In Google Cloud console, go to the Environments page.

      Go to Environments

    2. In the list of environments, click the name of your environment. The Environment details page opens.

    3. Go to the Logs tab and check the Airflow logs > Scheduler section.

    4. For a given time range, inspect the KubernetesExecutor worker pod that was running the task. If the pod no longer exists, skip this step. The pod has the airflow-k8s-worker prefix and a DAG or a task name in its name. Look for any reported issues such as a failed task or the task being unschedulable.

Common troubleshooting scenarios for KubernetesExecutor

This section lists common troublehooting scenarions that you might encounter with KubernetesExecutor.

The task gets to the Running state, then fails during the execution.

Symptoms:

  • There are logs for the task in Airflow UI and on the Logs tab in the Workers section.

Solution: The task logs indicate the problem.

Task instance gets to the queued state, then it is marked as UP_FOR_RETRY or FAILED after some time.

Symptoms:

  • There are no logs for task in Airflow UI and on the Logs tab in the Workers section.
  • There are logs on the Logs tab in the Scheduler section with a message that the task is marked as UP_FOR_RETRY or FAILED.
  • The airflow-k8s-worker-*.* pod with the name of DAG/task inside the name of pod has Failed/Pending state or it is absent.

Solution:

  1. Inspect scheduler logs for any details of the issue.

Possible causes:

  • If the scheduler logs contain the Adopted tasks were still pending after... message followed by the printed task instance, check that CeleryKubernetesExecutor is enabled in your environment.

The task instance gets to the Queued state and is immediately marked as UP_FOR_RETRY or FAILED

Symptoms:

  • There are no logs for the task in Airflow UI and on the Logs tab in the Workers section.
  • The scheduler logs on the Logs tab in the Scheduler section has the Pod creation failed with reason ... Failing task message, and the message that the task is marked as UP_FOR_RETRY or FAILED.

Solution:

  • Check scheduler logs for the exact response and failure reason.

Possible reason:

If the error message is quantities must match the regular expression ..., then the issue is most-likely caused by a custom values set for k8s resources (requests/limits) of task worker pods.

What's next