지정된 기간 동안 태스크를 실행 중이었던 KubernetesExecutor 작업자 포드를 검사합니다. 포드가 더 이상 존재하지 않으면 이 단계를 건너뜁니다. 포드 이름에 airflow-k8s-worker 프리픽스와 DAG 또는 태스크 이름이 있습니다.
실패한 태스크 또는 예약할 수 없는 태스크와 같이 신고된 문제가 있는지 확인합니다.
KubernetesExecutor의 일반적인 문제 해결 시나리오
이 섹션에는 KubernetesExecutor에서 발생할 수 있는 일반적인 문제 해결 시나리오가 나와 있습니다.
태스크가 Running 상태가 된 후 실행 중에 실패합니다.
증상
Airflow UI 및 작업자 섹션의 로그 탭에 태스크 로그가 있습니다.
솔루션: 태스크 로그에 문제가 표시됩니다.
태스크 인스턴스가 queued 상태가 된 후 일정 시간이 지나면 UP_FOR_RETRY 또는 FAILED로 표시됩니다.
증상
Airflow UI 및 작업자 섹션의 로그 탭에 태스크 로그가 없습니다.
스케줄러 섹션의 로그 탭에 태스크가 UP_FOR_RETRY 또는 FAILED로 표시되었다는 메시지가 포함된 로그가 있습니다.
솔루션:
스케줄러 로그에서 문제의 세부정보를 검사합니다.
가능한 원인:
스케줄러 로그에 Adopted tasks were still pending after... 메시지가 포함되고 태스크 인스턴스가 출력된 경우 CeleryKubernetesExecutor가 사용자 환경에서 사용 설정되어 있는지 확인합니다.
태스크 인스턴스가 Queued 상태가 되고 즉시 UP_FOR_RETRY 또는 FAILED로 표시됩니다.
증상
Airflow UI 및 작업자 섹션의 로그 탭에 태스크 로그가 없습니다.
스케줄러 섹션의 로그 탭에 있는 스케줄러 로그에 Pod creation failed with reason ... Failing task 메시지와 태스크가 UP_FOR_RETRY 또는 FAILED로 표시된 메시지가 있습니다.
솔루션:
스케줄러 로그에서 정확한 응답과 실패 이유를 확인합니다.
원인
오류 메시지가 quantities must match the regular expression ...인 경우 태스크 작업자 포드의 k8s 리소스(요청/한도)에 설정된 커스텀 값이 원인일 가능성이 높습니다.
많은 수의 태스크가 실행될 때 KubernetesExecutor 태스크가 로그 없이 실패함
환경에서 KubernetesExecutor 또는 KubernetesPodOperator를 사용하여 동시에 많은 태스크를 실행하는 경우 Cloud Composer 3는 일부 기존 태스크가 완료될 때까지 새 태스크를 수락하지 않습니다. 추가 태스크는 실패로 표시되며, 태스크의 재시도를 정의하는 경우 나중에 Airflow에서 재시도합니다 (Airflow는 기본적으로 이 작업을 실행함).
증상: KubernetesExecutor 또는 KubernetesPodOperator로 실행된 태스크가 Airflow UI 또는 DAG UI의 태스크 로그 없이 실패합니다. 스케줄러 로그에 다음과 유사한 오류 메시지가 표시될 수 있습니다.
환경에서 태스크를 실행할 수 있을 때까지 태스크가 예약된 상태로 유지되도록 하려면 Airflow UI에서 제한된 수의 슬롯이 있는 Airflow 풀을 정의한 다음 모든 컨테이너 기반 태스크를 이 풀과 연결하면 됩니다. 풀의 슬롯 수를 50개 이하로 설정하는 것이 좋습니다. 추가 태스크는 Airflow 풀에 실행할 여유 슬롯이 생길 때까지 예약된 상태로 유지됩니다. 가능한 해결 방법을 적용하지 않고 이 해결 방법을 사용하면 Airflow 풀에 여전히 많은 태스크 대기열이 표시될 수 있습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-08-29(UTC)"],[[["\u003cp\u003eThis page provides troubleshooting guidance for tasks run by KubernetesExecutor in Cloud Composer 3, outlining a step-by-step approach to identify and resolve issues.\u003c/p\u003e\n"],["\u003cp\u003eA common issue addressed is when tasks get stuck in the \u003ccode\u003equeued\u003c/code\u003e state and are then marked as \u003ccode\u003eUP_FOR_RETRY\u003c/code\u003e or \u003ccode\u003eFAILED\u003c/code\u003e, often with no logs, and the solution involves inspecting scheduler logs.\u003c/p\u003e\n"],["\u003cp\u003eAnother issue covered is when tasks fail immediately after entering the \u003ccode\u003eQueued\u003c/code\u003e state, in which case checking scheduler logs for error messages is key to discovering the solution.\u003c/p\u003e\n"],["\u003cp\u003eThe document covers issues that might occur when a large amount of tasks are executed concurrently, leading to tasks failing without logs, with solutions such as adjusting DAG schedules and using Airflow pools.\u003c/p\u003e\n"],["\u003cp\u003eThe page indicates that when tasks get to the running state, and then fail, the solution is found in the task logs, either in the Airflow UI, or in the "Workers" section of the logs tab.\u003c/p\u003e\n"]]],[],null,["\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\n**Cloud Composer 3** \\| Cloud Composer 2 \\| Cloud Composer 1\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nThis page describes how to troubleshoot issues with\n[tasks run by KubernetesExecutor](/composer/docs/composer-3/use-celery-kubernetes-executor) and provides solutions for common\nissues.\n\nGeneral approach to troubleshooting KubernetesExecutor\n\nTo troubleshoot issues with a task executed with KubernetesExecutor, do\nthe following actions in the listed order:\n\n1. Check logs of the task in the [DAG UI](/composer/docs/composer-3/view-dags#runs-history) or\n [Airflow UI](/composer/docs/composer-3/access-airflow-web-interface).\n\n2. Check scheduler logs in Google Cloud console:\n\n 1. In Google Cloud console, go to the **Environments** page.\n\n [Go to Environments](https://console.cloud.google.com/composer/environments)\n 2. In the list of environments, click the name of your environment.\n The **Environment details** page opens.\n\n 3. Go to the **Logs** tab and check the **Airflow logs** \\\u003e\n **Scheduler** section.\n\n 4. For a given time range, inspect the KubernetesExecutor worker pod that was\n running the task. If the pod no longer exists, skip this step. The pod\n has the `airflow-k8s-worker` prefix and a DAG or a task name in its name.\n Look for any reported issues such as a failed task or the task being\n unschedulable.\n\nCommon troubleshooting scenarios for KubernetesExecutor\n\nThis section lists common troublehooting scenarions that you might encounter with KubernetesExecutor.\n\nThe task gets to the `Running` state, then fails during the execution.\n\nSymptoms:\n\n- There are logs for the task in Airflow UI and on the **Logs** tab in the **Workers** section.\n\nSolution: The task logs indicate the problem.\n\nTask instance gets to the `queued` state, then it is marked as `UP_FOR_RETRY` or `FAILED` after some time.\n\nSymptoms:\n\n- There are no logs for task in Airflow UI and on the **Logs** tab in the **Workers** section.\n- There are logs on the **Logs** tab in the **Scheduler** section with a message that the task is marked as `UP_FOR_RETRY` or `FAILED`.\n\nSolution:\n\n- Inspect scheduler logs for any details of the issue.\n\nPossible causes:\n\n- If the scheduler logs contain the `Adopted tasks were still pending after...` message followed by the printed task instance, check that CeleryKubernetesExecutor is enabled in your environment.\n\nThe task instance gets to the `Queued` state and is immediately marked as `UP_FOR_RETRY` or `FAILED`\n\nSymptoms:\n\n- There are no logs for the task in Airflow UI and on the **Logs** tab in the **Workers** section.\n- The scheduler logs on the **Logs** tab in the **Scheduler** section has the `Pod creation failed with reason ... Failing task` message, and the message that the task is marked as `UP_FOR_RETRY` or `FAILED`.\n\nSolution:\n\n- Check scheduler logs for the exact response and failure reason.\n\nPossible reason:\n\nIf the error message is `quantities must match the regular expression ...`,\nthen the issue is most-likely caused by a custom values set for k8s\nresources (requests/limits) of task worker pods.\n\nKubernetesExecutor tasks fail without logs when a large number of tasks is executed\n\nWhen your environment executes a large number of tasks\n[with KubernetesExecutor](/composer/docs/composer-3/use-celery-kubernetes-executor) or [KubernetesPodOperator](/composer/docs/composer-3/use-kubernetes-pod-operator) at the same\ntime, Cloud Composer 3 doesn't accept new tasks until some of the\nexisting tasks are finished. Extra tasks are marked as failed, and Airflow\nretries them later, if you define retries for the tasks (Airflow does this by\ndefault).\n\n**Symptom:** Tasks executed with KubernetesExecutor or KubernetesPodOperator\nfail without task logs in Airflow UI or DAG UI. In the\n[scheduler's logs](/composer/docs/composer-3/view-logs#streaming), you can see error messages similar\nto the following: \n\n pods \\\"airflow-k8s-worker-*\\\" is forbidden: exceeded quota: k8s-resources-quota,\n requested: pods=1, used: pods=*, limited: pods=*\",\"reason\":\"Forbidden\"\n\n**Possible solutions:**\n\n- Adjust the DAG run schedule so that tasks are distributed more evenly over time.\n- Reduce the number of tasks by consolidating small tasks.\n\n**Workaround:**\n\nIf you prefer tasks to stay in the scheduled state until your environment can\nexecute them, you can define an [Airflow pool](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/pools.html) with the\nlimited number of slots in the Airflow UI and then associate all\ncontainer-based tasks with this pool. We recommend to set the number of slots\nin the pool to 50 or less. Extra tasks will stay in the scheduled state until\nthe Airflow pool has a free slot to execute them. If you use this workaround\nwithout applying possible solutions, you can still experience a large queue of\ntasks in the Airflow pool.\n\nWhat's next\n\n- [Use CeleryKubernetesExecutor](/composer/docs/composer-3/use-celery-kubernetes-executor)\n- [Use KubernetesPodOperator](/composer/docs/composer-3/use-kubernetes-pod-operator)\n- [Troubleshooting scheduling](/composer/docs/composer-3/troubleshooting-scheduling)\n- [Troubleshooting DAGs](/composer/docs/composer-3/troubleshooting-dags)"]]