Class WorkerPool (0.8.6rc0)

WorkerPool(mapping=None, *, ignore_unknown_fields=False, **kwargs)

Describes one particular pool of Cloud Dataflow workers to be instantiated by the Cloud Dataflow service in order to perform the computations required by a job. Note that a workflow job may use multiple pools, in order to match the various computational requirements of the various stages of the job.

Attributes

NameDescription
kind str
The kind of the worker pool; currently only harness and shuffle are supported.
num_workers int
Number of Google Compute Engine workers in this pool needed to execute the job. If zero or unspecified, the service will attempt to choose a reasonable default.
packages MutableSequence[google.cloud.dataflow_v1beta3.types.Package]
Packages to be installed on workers.
default_package_set google.cloud.dataflow_v1beta3.types.DefaultPackageSet
The default package set to install. This allows the service to select a default set of packages which are useful to worker harnesses written in a particular language.
machine_type str
Machine type (e.g. "n1-standard-1"). If empty or unspecified, the service will attempt to choose a reasonable default.
teardown_policy google.cloud.dataflow_v1beta3.types.TeardownPolicy
Sets the policy for determining when to turndown worker pool. Allowed values are: TEARDOWN_ALWAYS, TEARDOWN_ON_SUCCESS, and TEARDOWN_NEVER. TEARDOWN_ALWAYS means workers are always torn down regardless of whether the job succeeds. TEARDOWN_ON_SUCCESS means workers are torn down if the job succeeds. TEARDOWN_NEVER means the workers are never torn down. If the workers are not torn down by the service, they will continue to run and use Google Compute Engine VM resources in the user's project until they are explicitly terminated by the user. Because of this, Google recommends using the TEARDOWN_ALWAYS policy except for small, manually supervised test jobs. If unknown or unspecified, the service will attempt to choose a reasonable default.
disk_size_gb int
Size of root disk for VMs, in GB. If zero or unspecified, the service will attempt to choose a reasonable default.
disk_type str
Type of root disk for VMs. If empty or unspecified, the service will attempt to choose a reasonable default.
disk_source_image str
Fully qualified source image for disks.
zone str
Zone to run the worker pools in. If empty or unspecified, the service will attempt to choose a reasonable default.
taskrunner_settings google.cloud.dataflow_v1beta3.types.TaskRunnerSettings
Settings passed through to Google Compute Engine workers when using the standard Dataflow task runner. Users should ignore this field.
on_host_maintenance str
The action to take on host maintenance, as defined by the Google Compute Engine API.
data_disks MutableSequence[google.cloud.dataflow_v1beta3.types.Disk]
Data disks that are used by a VM in this workflow.
metadata MutableMapping[str, str]
Metadata to set on the Google Compute Engine VMs.
autoscaling_settings google.cloud.dataflow_v1beta3.types.AutoscalingSettings
Settings for autoscaling of this WorkerPool.
pool_args google.protobuf.any_pb2.Any
Extra arguments for this worker pool.
network str
Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".
subnetwork str
Subnetwork to which VMs will be assigned, if desired. Expected to be of the form "regions/REGION/subnetworks/SUBNETWORK".
worker_harness_container_image str
Required. Docker container image that executes the Cloud Dataflow worker harness, residing in Google Container Registry. Deprecated for the Fn API path. Use sdk_harness_container_images instead.
num_threads_per_worker int
The number of threads per worker harness. If empty or unspecified, the service will choose a number of threads (according to the number of cores on the selected machine type for batch, or 1 by convention for streaming).
ip_configuration google.cloud.dataflow_v1beta3.types.WorkerIPAddressConfiguration
Configuration for VM IPs.
sdk_harness_container_images MutableSequence[google.cloud.dataflow_v1beta3.types.SdkHarnessContainerImage]
Set of SDK harness containers needed to execute this pipeline. This will only be set in the Fn API path. For non-cross-language pipelines this should have only one entry. Cross-language pipelines will have two or more entries.

Classes

MetadataEntry

MetadataEntry(mapping=None, *, ignore_unknown_fields=False, **kwargs)

The abstract base class for a message.

Parameters
NameDescription
kwargs dict

Keys and values corresponding to the fields of the message.

mapping Union[dict, .Message]

A dictionary or message to be used to determine the values for this message.

ignore_unknown_fields Optional(bool)

If True, do not raise errors for unknown fields. Only applied if mapping is a mapping type or there are keyword parameters.