[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003eThe DataprocFileOutputCommitter is an enhanced version of FileOutputCommitter, designed to enable concurrent writes by Apache Spark jobs to an output location.\u003c/p\u003e\n"],["\u003cp\u003eThis feature is available for Dataproc Compute Engine clusters running image versions 2.1.10 and higher, or 2.0.62 and higher.\u003c/p\u003e\n"],["\u003cp\u003eTo utilize DataprocFileOutputCommitter, set \u003ccode\u003espark.hadoop.mapreduce.outputcommitter.factory.class\u003c/code\u003e to \u003ccode\u003eorg.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory\u003c/code\u003e and \u003ccode\u003espark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs\u003c/code\u003e to \u003ccode\u003efalse\u003c/code\u003e when submitting a Spark job.\u003c/p\u003e\n"],["\u003cp\u003eWhen using the Dataproc file output committer, it is required that \u003ccode\u003espark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs\u003c/code\u003e is set to false in order to prevent conflicts with the created success marker files.\u003c/p\u003e\n"]]],[],null,["The **DataprocFileOutputCommitter** feature is an enhanced\nversion of the open source `FileOutputCommitter`. It\nenables concurrent writes by Apache Spark jobs to an output location.\n\nLimitations\n\nThe `DataprocFileOutputCommitter` feature supports Spark jobs run on\nDataproc Compute Engine clusters created with\nthe following image versions:\n\n- 2.1 image versions 2.1.10 and higher\n\n- 2.0 image versions 2.0.62 and higher\n\nUse `DataprocFileOutputCommitter`\n\nTo use this feature:\n\n1. [Create a Dataproc on Compute Engine cluster](/dataproc/docs/guides/create-cluster#creating_a_cloud_dataproc_cluster)\n using image versions `2.1.10` or `2.0.62` or higher.\n\n2. Set `spark.hadoop.mapreduce.outputcommitter.factory.class=org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory` and `spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=false`\n as a job property when you [submit a Spark job](/dataproc/docs/guides/submit-job#how_to_submit_a_job)\n to the cluster.\n\n - Google Cloud CLI example:\n\n ```\n gcloud dataproc jobs submit spark \\\n --properties=spark.hadoop.mapreduce.outputcommitter.factory.class=org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory,spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=false \\\n --region=REGION \\\n other args ...\n ```\n - Code example:\n\n ```\n sc.hadoopConfiguration.set(\"spark.hadoop.mapreduce.outputcommitter.factory.class\",\"org.apache.hadoop.mapreduce.lib.output.DataprocFileOutputCommitterFactory\")\n sc.hadoopConfiguration.set(\"spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs\",\"false\")\n ```\n | The Dataproc file output committer must set `spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs=false` to avoid conflicts between success marker files created during concurrent writes. You can also set this property in `spark-defaults.conf`."]]