PySparkJob

JSON representation

A Dataproc job for running Apache PySpark applications on YARN.

JSON representation

JSON representation
{ "mainPythonFileUri": string, "args": [ string ], "pythonFileUris": [ string ], "jarFileUris": [ string ], "fileUris": [ string ], "archiveUris": [ string ], "properties": { string: string, ... }, "loggingConfig": { object (`LoggingConfig`) } }

{
  "mainPythonFileUri": string,
  "args": [
    string
  ],
  "pythonFileUris": [
    string
  ],
  "jarFileUris": [
    string
  ],
  "fileUris": [
    string
  ],
  "archiveUris": [
    string
  ],
  "properties": {
    string: string,
    ...
  },
  "loggingConfig": {
    object (LoggingConfig)
  }
}

Fields
`mainPythonFileUri`	`string` Required. The HCFS URI of the main Python file to use as the driver. Must be a .py file.
`args[]`	`string` Optional. The arguments to pass to the driver. Do not include arguments, such as `--conf`, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
`pythonFileUris[]`	`string` Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
`jarFileUris[]`	`string` Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Python driver and tasks.
`fileUris[]`	`string` Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
`archiveUris[]`	`string` Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
`properties`	`map (key: string, value: string)` Optional. A mapping of property names to values, used to configure PySpark. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code. An object containing a list of `"key": value` pairs. Example: `{ "name": "wrench", "mass": "1.3kg", "count": "3" }`.
`loggingConfig`	`object (LoggingConfig)` Optional. The runtime log config for job execution.

PySparkJob Stay organized with collections Save and categorize content based on your preferences.

PySparkJob