Required. The HCFS URI of the main Python file to use as the driver. Must be a .py file.
args[]
string
Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
pythonFileUris[]
string
Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
jarFileUris[]
string
Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Python driver and tasks.
fileUris[]
string
Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
archiveUris[]
string
Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
properties
map (key: string, value: string)
Optional. A mapping of property names to values, used to configure PySpark. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code.
An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-06-20 UTC."],[[["\u003cp\u003eThis document describes a Dataproc job for running Apache PySpark applications on YARN, detailing the configuration options available.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003emainPythonFileUri\u003c/code\u003e field is required and specifies the HCFS URI of the main Python driver file, which must be a .py file.\u003c/p\u003e\n"],["\u003cp\u003eSeveral optional fields allow you to include additional resources such as Python files (\u003ccode\u003epythonFileUris\u003c/code\u003e), JAR files (\u003ccode\u003ejarFileUris\u003c/code\u003e), regular files (\u003ccode\u003efileUris\u003c/code\u003e), and archives (\u003ccode\u003earchiveUris\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eproperties\u003c/code\u003e field allows you to define key-value pairs to configure PySpark, noting that conflicts with the Dataproc API may result in overwrites.\u003c/p\u003e\n"],["\u003cp\u003eAn optional \u003ccode\u003eloggingConfig\u003c/code\u003e object allows you to define the runtime log configuration for job execution.\u003c/p\u003e\n"]]],[],null,["# PySparkJob\n\n- [JSON representation](#SCHEMA_REPRESENTATION)\n\nA Dataproc job for running [Apache PySpark](https://spark.apache.org/docs/latest/api/python/index.html#pyspark-overview) applications on YARN."]]