Cloud Dataflow job stuck in the 'WriteFiles' step while writing to Cloud Storage
Stay organized with collections
Save and categorize content based on your preferences.
Problem
Dataflow job stuck when writing data to Cloud Storage and below messages are observed in worker logs.
Processing stuck in step Write File(s)...
Operation ongoing in step Write File(s)...
Environment
Solution
- Set withNumShards() in File-based Write I/O transform equal to number of worker machines to increase the write parallelism.
Cause
This issue is observed when there are fewer shards writing to sink or available workers can not keep up the incoming load. In such cases, users should increase the number of shards equal to the max worker pool size so that the write parallelism is increased. More information can be found here.
Example:
FileIO.Write.withNumShards(5)
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-12-12 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-12-12 UTC."],[],[]]