Package google.cloud.dataproc.v1

Index

BatchController

The BatchController provides methods to manage batch workloads.

CreateBatch

rpc CreateBatch(CreateBatchRequest) returns (Operation)

Creates a batch workload that executes asynchronously.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

DeleteBatch

rpc DeleteBatch(DeleteBatchRequest) returns (Empty)

Deletes the batch workload resource. If the batch is not in terminal state, the delete fails and the response returns FAILED_PRECONDITION.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

GetBatch

rpc GetBatch(GetBatchRequest) returns (Batch)

Gets the batch workload resource representation.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

ListBatches

rpc ListBatches(ListBatchesRequest) returns (ListBatchesResponse)

Lists batch workloads.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

Batch

A representation of a batch workload in the service.

Fields
name

string

Output only. The resource name of the batch.

uuid

string

Output only. A batch UUID (Unique Universal Identifier). The service generates this value when it creates the batch.

create_time

Timestamp

Output only. The time when the batch was created.

runtime_info

RuntimeInfo

Output only. Runtime information about batch execution.

state

State

Output only. The state of the batch.

state_message

string

Output only. Batch state details, such as a failure description if the state is FAILED.

state_time

Timestamp

Output only. The time when the batch entered a current state.

creator

string

Output only. The email address of the user who created the batch.

labels

map<string, string>

Optional. The labels to associate with this batch. Label keys must contain 1 to 63 characters, and must conform to RFC 1035. Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035. No more than 32 labels can be associated with a batch.

runtime_config

RuntimeConfig

Optional. Runtime configuration for the batch execution.

environment_config

EnvironmentConfig

Optional. Environment configuration for the batch execution.

operation

string

Output only. The resource name of the operation associated with this batch.

state_history[]

StateHistory

Output only. Historical state information for the batch.

Union field batch_config. The application/framework-specific portion of the batch configuration. batch_config can be only one of the following:
pyspark_batch

PySparkBatch

Optional. PySpark batch config.

spark_batch

SparkBatch

Optional. Spark batch config.

spark_r_batch

SparkRBatch

Optional. SparkR batch config.

spark_sql_batch

SparkSqlBatch

Optional. SparkSql batch config.

State

The batch state.

Enums
STATE_UNSPECIFIED The batch state is unknown.
PENDING The batch is created before running.
RUNNING The batch is running.
CANCELLING The batch is cancelling.
CANCELLED The batch cancellation was successful.
SUCCEEDED The batch completed successfully.
FAILED The batch is no longer running due to an error.

StateHistory

Historical state information.

Fields
state

State

Output only. The state of the batch at this point in history.

state_message

string

Output only. Details about the state at this point in history.

state_start_time

Timestamp

Output only. The time when the batch entered the historical state.

BatchOperationMetadata

Metadata describing the Batch operation.

Fields
batch

string

Name of the batch for the operation.

batch_uuid

string

Batch UUID for the operation.

create_time

Timestamp

The time when the operation was created.

done_time

Timestamp

The time when the operation finished.

operation_type

BatchOperationType

The operation type.

description

string

Short description of the operation.

labels

map<string, string>

Labels associated with the operation.

warnings[]

string

Warnings encountered during operation execution.

BatchOperationType

Operation type for Batch resources

Enums
BATCH_OPERATION_TYPE_UNSPECIFIED Batch operation type is unknown.
BATCH Batch operation type.

CreateBatchRequest

A request to create a batch workload.

Fields
parent

string

Required. The parent resource where this batch will be created.

Authorization requires the following IAM permission on the specified resource parent:

  • dataproc.batches.create
batch

Batch

Required. The batch to create.

batch_id

string

Optional. The ID to use for the batch, which will become the final component of the batch's resource name.

This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/.

request_id

string

Optional. A unique ID used to identify the request. If the service receives two CreateBatchRequests with the same request_id, the second request is ignored and the Operation that corresponds to the first Batch created and stored in the backend is returned.

Recommendation: Set this value to a UUID.

The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

DeleteBatchRequest

A request to delete a batch workload.

Fields
name

string

Required. The name of the batch resource to delete.

Authorization requires the following IAM permission on the specified resource name:

  • dataproc.batches.delete

DiagnoseClusterResults

The location of diagnostic output.

Fields
output_uri

string

Output only. The Cloud Storage URI of the diagnostic output. The output report is a plain text file with a summary of collected diagnostics.

EnvironmentConfig

Environment configuration for a workload.

Fields
execution_config

ExecutionConfig

Optional. Execution configuration for a workload.

peripherals_config

PeripheralsConfig

Optional. Peripherals configuration that workload has access to.

ExecutionConfig

Execution configuration for a workload.

Fields
service_account

string

Optional. Service account that used to execute workload.

network_tags[]

string

Optional. Tags used for network traffic control.

kms_key

string

Optional. The Cloud KMS key to use for encryption.

Union field network. Network configuration for workload execution. network can be only one of the following:
network_uri

string

Optional. Network URI to connect workload to.

subnetwork_uri

string

Optional. Subnetwork URI to connect workload to.

GetBatchRequest

A request to get the resource representation for a batch workload.

Fields
name

string

Required. The name of the batch to retrieve.

Authorization requires the following IAM permission on the specified resource name:

  • dataproc.batches.get

ListBatchesRequest

A request to list batch workloads in a project.

Fields
parent

string

Required. The parent, which owns this collection of batches.

Authorization requires the following IAM permission on the specified resource parent:

  • dataproc.batches.list
page_size

int32

Optional. The maximum number of batches to return in each response. The service may return fewer than this value. The default page size is 20; the maximum page size is 1000.

page_token

string

Optional. A page token received from a previous ListBatches call. Provide this token to retrieve the subsequent page.

ListBatchesResponse

A list of batch workloads.

Fields
batches[]

Batch

The batches from the specified collection.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

PeripheralsConfig

Auxiliary services configuration for a workload.

Fields
metastore_service

string

Optional. Resource name of an existing Dataproc Metastore service.

Example:

  • projects/[project_id]/locations/[region]/services/[service_id]
spark_history_server_config

SparkHistoryServerConfig

Optional. The Spark History Server configuration for the workload.

PySparkBatch

A configuration for running an Apache PySpark batch workload.

Fields
main_python_file_uri

string

Required. The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file.

args[]

string

Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.

python_file_uris[]

string

Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.

jar_file_uris[]

string

Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.

file_uris[]

string

Optional. HCFS URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

RuntimeConfig

Runtime configuration for a workload.

Fields
version

string

Optional. Version of the batch runtime.

container_image

string

Optional. Optional custom container image for the job runtime environment. If not specified, a default container image will be used.

properties

map<string, string>

Optional. A mapping of property names to values, which are used to configure workload execution.

RuntimeInfo

Runtime information about workload execution.

Fields
endpoints

map<string, string>

Output only. Map of remote access endpoints (such as web interfaces and APIs) to their URIs.

output_uri

string

Output only. A URI pointing to the location of the stdout and stderr of the workload.

diagnostic_output_uri

string

Output only. A URI pointing to the location of the diagnostics tarball.

SparkBatch

A configuration for running an Apache Spark batch workload.

Fields
args[]

string

Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.

jar_file_uris[]

string

Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.

file_uris[]

string

Optional. HCFS URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

Union field driver. The specification of the main method to call to drive the Spark workload. Specify either the jar file that contains the main class or the main class name. To pass both a main jar and a main class in that jar, add the jar to jar_file_uris, and then specify the main class name in main_class. driver can be only one of the following:
main_jar_file_uri

string

Optional. The HCFS URI of the jar file that contains the main class.

main_class

string

Optional. The name of the driver main class. The jar file that contains the class must be in the classpath or specified in jar_file_uris.

SparkHistoryServerConfig

Spark History Server configuration for the workload.

Fields
dataproc_cluster

string

Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.

Example:

  • projects/[project_id]/regions/[region]/clusters/[cluster_name]

SparkRBatch

A configuration for running an Apache SparkR batch workload.

Fields
main_r_file_uri

string

Required. The HCFS URI of the main R file to use as the driver. Must be a .R or .r file.

args[]

string

Optional. The arguments to pass to the Spark driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.

file_uris[]

string

Optional. HCFS URIs of files to be placed in the working directory of each executor.

archive_uris[]

string

Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

SparkSqlBatch

A configuration for running Apache Spark SQL queries as a batch workload.

Fields
query_file_uri

string

Required. The HCFS URI of the script that contains Spark SQL queries to execute.

query_variables

map<string, string>

Optional. Mapping of query variable names to values (equivalent to the Spark SQL command: SET name="value";).

jar_file_uris[]

string

Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.