Package types (2.4.0)

API documentation for dataproc_v1.types package.

Classes

AcceleratorConfig

Specifies the type and number of accelerator cards attached to the instances of an instance. See GPUs on Compute Engine <https://cloud.google.com/compute/docs/gpus/>__.

AutoscalingConfig

Autoscaling Policy config associated with the cluster. .. attribute:: policy_uri

Optional. The autoscaling policy used by the cluster.

Only resource names including projectid and location (region) are valid. Examples:

  • https://www.googleapis.com/compute/v1/projects/[project_id]/locations/[dataproc_region]/autoscalingPolicies/[policy_id]
  • projects/[project_id]/locations/[dataproc_region]/autoscalingPolicies/[policy_id]

    Note that the policy must be in the same project and Dataproc region.

    :type: str

AutoscalingPolicy

Describes an autoscaling policy for Dataproc cluster autoscaler.

BasicAutoscalingAlgorithm

Basic algorithm for autoscaling. .. attribute:: yarn_config

Required. YARN autoscaling configuration.

:type: google.cloud.dataproc_v1.types.BasicYarnAutoscalingConfig

BasicYarnAutoscalingConfig

Basic autoscaling configurations for YARN. .. attribute:: graceful_decommission_timeout

Required. Timeout for YARN graceful decommissioning of Node Managers. Specifies the duration to wait for jobs to complete before forcefully removing workers (and potentially interrupting jobs). Only applicable to downscaling operations.

Bounds: [0s, 1d].

:type: google.protobuf.duration_pb2.Duration

CancelJobRequest

A request to cancel a job. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the job belongs to.

:type: str

Cluster

Describes the identifying information, config, and status of a cluster of Compute Engine instances.

ClusterConfig

The cluster config. .. attribute:: config_bucket

Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging bucket <https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket>__). This field requires a Cloud Storage bucket name, not a URI to a Cloud Storage bucket.

:type: str

ClusterMetrics

Contains cluster daemon metrics, such as HDFS and YARN stats.

Beta Feature: This report is available for testing purposes only. It may be changed before final release.

ClusterOperation

The cluster operation triggered by a workflow. .. attribute:: operation_id

Output only. The id of the cluster operation.

:type: str

ClusterOperationMetadata

Metadata describing the operation. .. attribute:: cluster_name

Output only. Name of the cluster for the operation.

:type: str

ClusterOperationStatus

The status of the operation. .. attribute:: state

Output only. A message containing the operation state.

:type: google.cloud.dataproc_v1.types.ClusterOperationStatus.State

ClusterSelector

A selector that chooses target cluster for jobs based on metadata.

ClusterStatus

The status of a cluster and its instances. .. attribute:: state

Output only. The cluster's state.

:type: google.cloud.dataproc_v1.types.ClusterStatus.State

CreateAutoscalingPolicyRequest

A request to create an autoscaling policy. .. attribute:: parent

Required. The "resource name" of the region or location, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.autoscalingPolicies.create, the resource name of the region has the following format: projects/{project_id}/regions/{region}

  • For projects.locations.autoscalingPolicies.create, the resource name of the location has the following format: projects/{project_id}/locations/{location}

    :type: str

CreateClusterRequest

A request to create a cluster. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the cluster belongs to.

:type: str

CreateWorkflowTemplateRequest

A request to create a workflow template. .. attribute:: parent

Required. The resource name of the region or location, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.workflowTemplates,create, the resource name of the region has the following format: projects/{project_id}/regions/{region}

  • For projects.locations.workflowTemplates.create, the resource name of the location has the following format: projects/{project_id}/locations/{location}

    :type: str

DeleteAutoscalingPolicyRequest

A request to delete an autoscaling policy. Autoscaling policies in use by one or more clusters will not be deleted.

DeleteClusterRequest

A request to delete a cluster. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the cluster belongs to.

:type: str

DeleteJobRequest

A request to delete a job. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the job belongs to.

:type: str

DeleteWorkflowTemplateRequest

A request to delete a workflow template. Currently started workflows will remain running.

DiagnoseClusterRequest

A request to collect cluster diagnostic information. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the cluster belongs to.

:type: str

DiagnoseClusterResults

The location of diagnostic output. .. attribute:: output_uri

Output only. The Cloud Storage URI of the diagnostic output. The output report is a plain text file with a summary of collected diagnostics.

:type: str

DiskConfig

Specifies the config of disk options for a group of VM instances.

EncryptionConfig

Encryption settings for the cluster. .. attribute:: gce_pd_kms_key_name

Optional. The Cloud KMS key name to use for PD disk encryption for all instances in the cluster.

:type: str

EndpointConfig

Endpoint config for this cluster .. attribute:: http_ports

Output only. The map of port descriptions to URLs. Will only be populated if enable_http_port_access is true.

:type: Sequence[google.cloud.dataproc_v1.types.EndpointConfig.HttpPortsEntry]

GceClusterConfig

Common config settings for resources of Compute Engine cluster instances, applicable to all instances in the cluster.

GetAutoscalingPolicyRequest

A request to fetch an autoscaling policy. .. attribute:: name

Required. The "resource name" of the autoscaling policy, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.autoscalingPolicies.get, the resource name of the policy has the following format: projects/{project_id}/regions/{region}/autoscalingPolicies/{policy_id}

  • For projects.locations.autoscalingPolicies.get, the resource name of the policy has the following format: projects/{project_id}/locations/{location}/autoscalingPolicies/{policy_id}

    :type: str

GetClusterRequest

Request to get the resource representation for a cluster in a project.

GetJobRequest

A request to get the resource representation for a job in a project.

GetWorkflowTemplateRequest

A request to fetch a workflow template. .. attribute:: name

Required. The resource name of the workflow template, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.workflowTemplates.get, the resource name of the template has the following format: projects/{project_id}/regions/{region}/workflowTemplates/{template_id}

  • For projects.locations.workflowTemplates.get, the resource name of the template has the following format: projects/{project_id}/locations/{location}/workflowTemplates/{template_id}

    :type: str

GkeClusterConfig

The GKE config for this cluster. .. attribute:: namespaced_gke_deployment_target

Optional. A target for the deployment.

:type: google.cloud.dataproc_v1.types.GkeClusterConfig.NamespacedGkeDeploymentTarget

HadoopJob

A Dataproc job for running Apache Hadoop MapReduce <https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html> jobs on Apache Hadoop YARN <https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YARN.html>.

HiveJob

A Dataproc job for running Apache Hive <https://hive.apache.org/>__ queries on YARN.

IdentityConfig

Identity related configuration, including service account based secure multi-tenancy user mappings.

InstanceGroupAutoscalingPolicyConfig

Configuration for the size bounds of an instance group, including its proportional size to other groups.

InstanceGroupConfig

The config settings for Compute Engine resources in an instance group, such as a master or worker group.

InstantiateInlineWorkflowTemplateRequest

A request to instantiate an inline workflow template. .. attribute:: parent

Required. The resource name of the region or location, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.workflowTemplates,instantiateinline, the resource name of the region has the following format: projects/{project_id}/regions/{region}

  • For projects.locations.workflowTemplates.instantiateinline, the resource name of the location has the following format: projects/{project_id}/locations/{location}

    :type: str

InstantiateWorkflowTemplateRequest

A request to instantiate a workflow template. .. attribute:: name

Required. The resource name of the workflow template, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.workflowTemplates.instantiate, the resource name of the template has the following format: projects/{project_id}/regions/{region}/workflowTemplates/{template_id}

  • For projects.locations.workflowTemplates.instantiate, the resource name of the template has the following format: projects/{project_id}/locations/{location}/workflowTemplates/{template_id}

    :type: str

Job

A Dataproc job resource. .. attribute:: reference

Optional. The fully qualified reference to the job, which can be used to obtain the equivalent REST path of the job resource. If this property is not specified when a job is created, the server generates a job_id.

:type: google.cloud.dataproc_v1.types.JobReference

JobMetadata

Job Operation metadata. .. attribute:: job_id

Output only. The job id.

:type: str

JobPlacement

Dataproc job config. .. attribute:: cluster_name

Required. The name of the cluster where the job will be submitted.

:type: str

JobReference

Encapsulates the full scoping used to reference a job. .. attribute:: project_id

Optional. The ID of the Google Cloud Platform project that the job belongs to. If specified, must match the request project ID.

:type: str

JobScheduling

Job scheduling options. .. attribute:: max_failures_per_hour

Optional. Maximum number of times per hour a driver may be restarted as a result of driver exiting with non-zero code before job is reported failed.

A job may be reported as thrashing if driver exits with non-zero code 4 times within 10 minute window.

Maximum value is 10.

:type: int

JobStatus

Dataproc job status. .. attribute:: state

Output only. A state message specifying the overall job state.

:type: google.cloud.dataproc_v1.types.JobStatus.State

KerberosConfig

Specifies Kerberos related configuration. .. attribute:: enable_kerberos

Optional. Flag to indicate whether to Kerberize the cluster (default: false). Set this field to true to enable Kerberos on a cluster.

:type: bool

LifecycleConfig

Specifies the cluster auto-delete schedule configuration. .. attribute:: idle_delete_ttl

Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days (see JSON representation of Duration <https://developers.google.com/protocol-buffers/docs/proto3#json>__).

:type: google.protobuf.duration_pb2.Duration

ListAutoscalingPoliciesRequest

A request to list autoscaling policies in a project. .. attribute:: parent

Required. The "resource name" of the region or location, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.autoscalingPolicies.list, the resource name of the region has the following format: projects/{project_id}/regions/{region}

  • For projects.locations.autoscalingPolicies.list, the resource name of the location has the following format: projects/{project_id}/locations/{location}

    :type: str

ListAutoscalingPoliciesResponse

A response to a request to list autoscaling policies in a project.

ListClustersRequest

A request to list the clusters in a project. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the cluster belongs to.

:type: str

ListClustersResponse

The list of all clusters in a project. .. attribute:: clusters

Output only. The clusters in the project.

:type: Sequence[google.cloud.dataproc_v1.types.Cluster]

ListJobsRequest

A request to list jobs in a project. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the job belongs to.

:type: str

ListJobsResponse

A list of jobs in a project. .. attribute:: jobs

Output only. Jobs list.

:type: Sequence[google.cloud.dataproc_v1.types.Job]

ListWorkflowTemplatesRequest

A request to list workflow templates in a project. .. attribute:: parent

Required. The resource name of the region or location, as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.workflowTemplates,list, the resource name of the region has the following format: projects/{project_id}/regions/{region}

  • For projects.locations.workflowTemplates.list, the resource name of the location has the following format: projects/{project_id}/locations/{location}

    :type: str

ListWorkflowTemplatesResponse

A response to a request to list workflow templates in a project.

LoggingConfig

The runtime logging config of the job. .. attribute:: driver_log_levels

The per-package log levels for the driver. This may include "root" package name to configure rootLogger. Examples: 'com.google = FATAL', 'root = INFO', 'org.apache = DEBUG'

:type: Sequence[google.cloud.dataproc_v1.types.LoggingConfig.DriverLogLevelsEntry]

ManagedCluster

Cluster that is managed by the workflow. .. attribute:: cluster_name

Required. The cluster name prefix. A unique cluster name will be formed by appending a random suffix. The name must contain only lower-case letters (a-z), numbers (0-9), and hyphens (-). Must begin with a letter. Cannot begin or end with hyphen. Must consist of between 2 and 35 characters.

:type: str

ManagedGroupConfig

Specifies the resources used to actively manage an instance group.

MetastoreConfig

Specifies a Metastore configuration. .. attribute:: dataproc_metastore_service

Required. Resource name of an existing Dataproc Metastore service.

Example:

  • projects/[project_id]/locations/[dataproc_region]/services/[service-name]

    :type: str

NodeGroupAffinity

Node Group Affinity for clusters using sole-tenant node groups.

NodeInitializationAction

Specifies an executable to run on a fully configured node and a timeout period for executable completion.

OrderedJob

A job executed by the workflow. .. attribute:: step_id

Required. The step id. The id must be unique among all jobs within the template.

The step id is used as prefix for job id, as job goog-dataproc-workflow-step-id label, and in prerequisiteStepIds field from other steps.

The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). Cannot begin or end with underscore or hyphen. Must consist of between 3 and 50 characters.

:type: str

ParameterValidation

Configuration for parameter validation. .. attribute:: regex

Validation based on regular expressions.

:type: google.cloud.dataproc_v1.types.RegexValidation

PigJob

A Dataproc job for running Apache Pig <https://pig.apache.org/>__ queries on YARN.

PrestoJob

A Dataproc job for running Presto <https://prestosql.io/> queries. IMPORTANT: The Dataproc Presto Optional Component <https://cloud.google.com/dataproc/docs/concepts/components/presto> must be enabled when the cluster is created to submit a Presto job to the cluster.

PySparkJob

A Dataproc job for running Apache PySpark <https://spark.apache.org/docs/0.9.0/python-programming-guide.html>__ applications on YARN.

QueryList

A list of queries to run on a cluster. .. attribute:: queries

Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob:

::

   "hiveJob": {
     "queryList": {
       "queries": [
         "query1",
         "query2",
         "query3;query4",
       ]
     }
   }

:type: Sequence[str]

RegexValidation

Validation based on regular expressions. .. attribute:: regexes

Required. RE2 regular expressions used to validate the parameter's value. The value must match the regex in its entirety (substring matches are not sufficient).

:type: Sequence[str]

ReservationAffinity

Reservation Affinity for consuming Zonal reservation. .. attribute:: consume_reservation_type

Optional. Type of reservation to consume

:type: google.cloud.dataproc_v1.types.ReservationAffinity.Type

SecurityConfig

Security related configuration, including encryption, Kerberos, etc.

ShieldedInstanceConfig

Shielded Instance Config for clusters using Compute Engine Shielded VMs <https://cloud.google.com/security/shielded-cloud/shielded-vm>__.

SoftwareConfig

Specifies the selection and config of software inside the cluster.

SparkJob

A Dataproc job for running Apache Spark <http://spark.apache.org/>__ applications on YARN.

SparkRJob

A Dataproc job for running Apache SparkR <https://spark.apache.org/docs/latest/sparkr.html>__ applications on YARN.

SparkSqlJob

A Dataproc job for running Apache Spark SQL <http://spark.apache.org/sql/>__ queries.

StartClusterRequest

A request to start a cluster. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project the cluster belongs to.

:type: str

StopClusterRequest

A request to stop a cluster. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project the cluster belongs to.

:type: str

SubmitJobRequest

A request to submit a job. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the job belongs to.

:type: str

TemplateParameter

A configurable parameter that replaces one or more fields in the template. Parameterizable fields:

  • Labels
  • File uris
  • Job properties
  • Job arguments
  • Script variables
  • Main class (in HadoopJob and SparkJob)
  • Zone (in ClusterSelector)

UpdateAutoscalingPolicyRequest

A request to update an autoscaling policy. .. attribute:: policy

Required. The updated autoscaling policy.

:type: google.cloud.dataproc_v1.types.AutoscalingPolicy

UpdateClusterRequest

A request to update a cluster. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project the cluster belongs to.

:type: str

UpdateJobRequest

A request to update a job. .. attribute:: project_id

Required. The ID of the Google Cloud Platform project that the job belongs to.

:type: str

UpdateWorkflowTemplateRequest

A request to update a workflow template. .. attribute:: template

Required. The updated workflow template.

The template.version field must match the current version.

:type: google.cloud.dataproc_v1.types.WorkflowTemplate

ValueValidation

Validation based on a list of allowed values. .. attribute:: values

Required. List of allowed values for the parameter.

:type: Sequence[str]

WorkflowGraph

The workflow graph. .. attribute:: nodes

Output only. The workflow nodes.

:type: Sequence[google.cloud.dataproc_v1.types.WorkflowNode]

WorkflowMetadata

A Dataproc workflow template resource. .. attribute:: template

Output only. The resource name of the workflow template as described in https://cloud.google.com/apis/design/resource_names.

  • For projects.regions.workflowTemplates, the resource name of the template has the following format: projects/{project_id}/regions/{region}/workflowTemplates/{template_id}

  • For projects.locations.workflowTemplates, the resource name of the template has the following format: projects/{project_id}/locations/{location}/workflowTemplates/{template_id}

    :type: str

WorkflowNode

The workflow node. .. attribute:: step_id

Output only. The name of the node.

:type: str

WorkflowTemplate

A Dataproc workflow template resource. .. attribute:: id

:type: str

WorkflowTemplatePlacement

Specifies workflow execution target.

Either managed_cluster or cluster_selector is required.

YarnApplication

A YARN application created by a job. Application information is a subset of org.apache.hadoop.yarn.proto.YarnProtos.ApplicationReportProto.

Beta Feature: This report is available for testing purposes only. It may be changed before final release.