[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[[["\u003cp\u003eThis guide focuses on troubleshooting Dataflow issues specifically related to custom containers that fail to start or prevent workers from functioning correctly.\u003c/p\u003e\n"],["\u003cp\u003eBefore contacting support, it is crucial to test your container image locally, check job and worker logs for errors, and ensure that the Apache Beam SDK and language versions match between the pipeline launch environment and the custom container image.\u003c/p\u003e\n"],["\u003cp\u003eWorker logs can be accessed using the Logs Explorer, specifically checking \u003ccode\u003edataflow.googleapis.com/kubelet\u003c/code\u003e, \u003ccode\u003edataflow.googleapis.com/docker\u003c/code\u003e, \u003ccode\u003edataflow.googleapis.com/worker-startup\u003c/code\u003e, or \u003ccode\u003edataflow.googleapis.com/harness-startup\u003c/code\u003e for container startup errors.\u003c/p\u003e\n"],["\u003cp\u003eCommon issues include problems with image access, dependency conflicts, mismatched SDK versions, incompatible CPU architectures, and errors in custom command arguments or Dockerfile \u003ccode\u003eENTRYPOINT\u003c/code\u003e configurations.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eError Syncing pod...\u003c/code\u003e log messages usually point to a known issue, which further guidance can be found in the provided common error page.\u003c/p\u003e\n"]]],[],null,["# Troubleshoot custom containers in Dataflow\n\nThis document provides instructions for troubleshooting issues that might occur when using\ncustom containers with Dataflow. It focuses on issues with\ncontainers or workers not starting. If your workers are able to start and work\nis progressing, follow the general guidance for [Troubleshooting your pipeline](/dataflow/docs/guides/troubleshooting-your-pipeline).\n\nBefore contacting support, ensure that you have ruled out problems related\nto your container image:\n\n- Follow the steps to [test your container image locally](/dataflow/docs/guides/run-custom-container#testing-locally).\n- Search for errors in the [Job logs](/dataflow/docs/guides/troubleshooting-your-pipeline#check_job_error_messages) or in [Worker logs](#worker-logs), and compare any errors found with the [common error](/dataflow/docs/guides/common-errors) guidance.\n- Make sure that the Apache Beam SDK version and language version that you're using to launch the pipeline match the SDK version on your custom container image.\n- If using Java, make sure that the Java major version you use to launch the pipeline matches the version installed in your container image.\n- If using Python, make sure that the Python major-minor version you use to launch the pipeline matches the version installed in your container image, and that the image does not have conflicting dependencies. You can run [`pip check`](https://pip.pypa.io/en/stable/cli/pip_check/) to confirm.\n\nFind worker logs related to custom containers\n---------------------------------------------\n\nFine the Dataflow worker logs for container-related error messages can\nby using [Logs Explorer](https://console.cloud.google.com/logs/query):\n\n1. Select log names. Custom container startup errors are most likely to be in\n one of the following:\n\n - `dataflow.googleapis.com/kubelet`\n - `dataflow.googleapis.com/docker`\n - `dataflow.googleapis.com/worker-startup`\n - `dataflow.googleapis.com/harness-startup`\n2. Select the `Dataflow Step` resource and specify the `job_id`.\n\n| **Note:** You can also find the worker logs directly from the **Job** page. Select **Logs \\\u003e Worker Logs\n| \\\u003e Go to Logs Explorer**.\n\nIf you're seeing `Error Syncing pod...` log messages,\nfollow the common [error guidance](/dataflow/docs/guides/common-errors#error-syncing-pod).\nYou can query for these log messages in Dataflow worker logs by using\n[Logs Explorer](https://console.cloud.google.com/logs/query) with the following query: \n\n resource.type=\"dataflow_step\" AND jsonPayload.message:(\"\u003cvar translate=\"no\"\u003eIMAGE_URI\u003c/var\u003e\") AND severity=\"ERROR\"\n\nCommon Issues\n-------------\n\nThe following are some common issues when using custom containers.\n\n### Job has errors or failed because container image cannot be pulled\n\nDataflow workers must be able to access custom container images.\nIf the worker is unable to pull the image due to invalid URLs,\nmisconfigured credentials, or missing network access, the worker fails to\nstart.\n\nFor batch jobs where no work has started and several workers are unable to start\nsequentially, Dataflow fails the job. Otherwise,\nDataflow logs errors but does not take further action to avoid\ndestroying long-running job state.\n\nFor information about how to fix this issue, see\n[Image pull request failed with error](/dataflow/docs/guides/common-errors#error-pulling-container-image)\nin the Troubleshoot Dataflow errors page.\n\n### Workers are not starting or work is not progressing\n\nSometimes, if the SDK container fails to start due to an error,\nDataflow is unable to determine whether the error is permanent or\nfatal. Dataflow then continuously attempts to restart the worker.\n\nIf there are no obvious errors but you see `[topologymanager] RemoveContainer`\n`INFO`-level logs in `dataflow.googleapis.com/kubelet`, these logs indicate that the\ncustom container image is exiting early and did not start the long-running\nworker SDK process.\n\nIf workers have started successfully but no work is happening, an error might\nbe preventing the SDK container from starting. In this case, the following\nerror appears in the diagnostic recommendations: \n\n Failed to start container\n\nIn addition, the worker logs don't contain lines such as the following: \n\n Executing: python -m apache_beam.runners.worker.sdk_worker_main or Executing: java ... FnHarness\n\nFind specific errors in [Worker logs](#worker-logs) and check\n[common error guidance](/dataflow/docs/guides/common-errors).\n\nCommon causes for these issues include the following:\n\n- Problems with package installation, such as `pip` installation errors due to dependency issues. See [Error syncing pod ... failed to \"StartContainer\"](/dataflow/docs/guides/common-errors#error-syncing-pod).\n- If the container used is not compatible with the worker VM's CPU architecture, you might see errors like `exec format error`. For more information, see [Error syncing pod ... failed to \"StartContainer\"](/dataflow/docs/guides/common-errors#error-syncing-pod).\n- Errors with the custom command arguments or with the `ENTRYPOINT` set in the Dockerfile. For example, a custom `ENTRYPOINT` does not start the default boot script `/opt/apache/beam/boot` or does not pass arguments appropriately to this script. For more information, see [Modifying the container entrypoint](/dataflow/docs/guides/build-container-image#custom-entrypoint).\n- Errors when the Apache Beam SDK version is mismatched between the launch environment and the runtime environment. In one failure mode, the default values that are set in the Apache Beam SDK pipeline options might become unrecognized. For example, you might see errors such as `sdk_worker_main.py: error: argument\n --flink_version: invalid choice: '1.16' (choose\n from '1.12', '1.13', '1.14', '1.15')` in the worker logs. To remediate, install the same version of the Apache Beam SDK in the container image as you use to launch the pipeline. For more information, see [Make the launch environment compatible with the runtime environment](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies#make-the-launch-environment-compatible-with-the-runtime-environment).\n\n### The container cannot be configured to execute as a custom user\n\nThe user for container execution is selected by the Dataflow\nservice. For more information, see [Runtime environment](/dataflow/docs/concepts/security-and-permissions#runtime_environment)."]]