API documentation for dataflow_v1beta3.types
package.
Classes
AutoscalingAlgorithm
Specifies the algorithm used to determine the number of worker processes to run at any given point in time, based on the amount of data left to process, the number of workers, and how quickly existing workers are processing data.
Values: AUTOSCALING_ALGORITHM_UNKNOWN (0): The algorithm is unknown, or unspecified. AUTOSCALING_ALGORITHM_NONE (1): Disable autoscaling. AUTOSCALING_ALGORITHM_BASIC (2): Increase worker count over time to reduce job execution time.
AutoscalingEvent
A structured message reporting an autoscaling decision made by the Dataflow service.
AutoscalingSettings
Settings for WorkerPool autoscaling.
BigQueryIODetails
Metadata for a BigQuery connector used by the job.
BigTableIODetails
Metadata for a Cloud Bigtable connector used by the job.
CheckActiveJobsRequest
Request to check is active jobs exists for a project
CheckActiveJobsResponse
Response for CheckActiveJobsRequest.
ComputationTopology
All configuration data for a particular Computation.
ContainerSpec
Container Spec.
CreateJobFromTemplateRequest
A request to create a Cloud Dataflow job from a template.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
CreateJobRequest
Request to create a Cloud Dataflow job.
CustomSourceLocation
Identifies the location of a custom souce.
DataDiskAssignment
Data disk assignment for a given VM instance.
DatastoreIODetails
Metadata for a Datastore connector used by the job.
DebugOptions
Describes any options that have an effect on the debugging of pipelines.
DefaultPackageSet
The default set of packages to be staged on a pool of workers.
Values: DEFAULT_PACKAGE_SET_UNKNOWN (0): The default set of packages to stage is unknown, or unspecified. DEFAULT_PACKAGE_SET_NONE (1): Indicates that no packages should be staged at the worker unless explicitly specified by the job. DEFAULT_PACKAGE_SET_JAVA (2): Stage packages typically useful to workers written in Java. DEFAULT_PACKAGE_SET_PYTHON (3): Stage packages typically useful to workers written in Python.
DeleteSnapshotRequest
Request to delete a snapshot.
DeleteSnapshotResponse
Response from deleting a snapshot.
Disk
Describes the data disk used by a workflow job.
DisplayData
Data provided with a pipeline or transform to provide descriptive info.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
DynamicTemplateLaunchParams
Params which should be passed when launching a dynamic template.
Environment
Describes the environment in which a Dataflow Job runs.
ExecutionStageState
A message describing the state of a particular execution stage.
ExecutionStageSummary
Description of the composing transforms, names/ids, and input/outputs of a stage of execution. Some composing transforms and sources may have been generated by the Dataflow service during execution planning.
ExecutionState
The state of some component of job execution.
Values: EXECUTION_STATE_UNKNOWN (0): The component state is unknown or unspecified. EXECUTION_STATE_NOT_STARTED (1): The component is not yet running. EXECUTION_STATE_RUNNING (2): The component is currently running. EXECUTION_STATE_SUCCEEDED (3): The component succeeded. EXECUTION_STATE_FAILED (4): The component failed. EXECUTION_STATE_CANCELLED (5): Execution of the component was cancelled.
FailedLocation
Indicates which regional endpoint failed to respond to a request for data.
FileIODetails
Metadata for a File connector used by the job.
FlexResourceSchedulingGoal
Specifies the resource to optimize for in Flexible Resource Scheduling.
Values: FLEXRS_UNSPECIFIED (0): Run in the default mode. FLEXRS_SPEED_OPTIMIZED (1): Optimize for lower execution time. FLEXRS_COST_OPTIMIZED (2): Optimize for lower cost.
FlexTemplateRuntimeEnvironment
The environment values to be set at runtime for flex template.
GetJobExecutionDetailsRequest
Request to get job execution details.
GetJobMetricsRequest
Request to get job metrics.
GetJobRequest
Request to get the state of a Cloud Dataflow job.
GetSnapshotRequest
Request to get information about a snapshot
GetStageExecutionDetailsRequest
Request to get information about a particular execution stage of a job. Currently only tracked for Batch jobs.
GetTemplateRequest
A request to retrieve a Cloud Dataflow job template.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
GetTemplateResponse
The response to a GetTemplate request.
InvalidTemplateParameters
Used in the error_details field of a google.rpc.Status message, this indicates problems with the template parameter.
Job
Defines a job to be run by the Cloud Dataflow service.
JobExecutionDetails
Information about the execution of a job.
JobExecutionInfo
Additional information about how a Cloud Dataflow job will be executed that isn't contained in the submitted job.
JobExecutionStageInfo
Contains information about how a particular
google.dataflow.v1beta3.Step][google.dataflow.v1beta3.Step]
will be
executed.
JobMessage
A particular message pertaining to a Dataflow job.
JobMessageImportance
Indicates the importance of the message.
Values: JOB_MESSAGE_IMPORTANCE_UNKNOWN (0): The message importance isn't specified, or is unknown. JOB_MESSAGE_DEBUG (1): The message is at the 'debug' level: typically only useful for software engineers working on the code the job is running. Typically, Dataflow pipeline runners do not display log messages at this level by default. JOB_MESSAGE_DETAILED (2): The message is at the 'detailed' level: somewhat verbose, but potentially useful to users. Typically, Dataflow pipeline runners do not display log messages at this level by default. These messages are displayed by default in the Dataflow monitoring UI. JOB_MESSAGE_BASIC (5): The message is at the 'basic' level: useful for keeping track of the execution of a Dataflow pipeline. Typically, Dataflow pipeline runners display log messages at this level by default, and these messages are displayed by default in the Dataflow monitoring UI. JOB_MESSAGE_WARNING (3): The message is at the 'warning' level: indicating a condition pertaining to a job which may require human intervention. Typically, Dataflow pipeline runners display log messages at this level by default, and these messages are displayed by default in the Dataflow monitoring UI. JOB_MESSAGE_ERROR (4): The message is at the 'error' level: indicating a condition preventing a job from succeeding. Typically, Dataflow pipeline runners display log messages at this level by default, and these messages are displayed by default in the Dataflow monitoring UI.
JobMetadata
Metadata available primarily for filtering jobs. Will be included in the ListJob response and Job SUMMARY view.
JobMetrics
JobMetrics contains a collection of metrics describing the detailed progress of a Dataflow job. Metrics correspond to user-defined and system-defined metrics in the job.
This resource captures only the most recent values of each metric; time-series data can be queried for them (under the same metric names) from Cloud Monitoring.
JobState
Describes the overall state of a
google.dataflow.v1beta3.Job][google.dataflow.v1beta3.Job]
.
Values:
JOB_STATE_UNKNOWN (0):
The job's run state isn't specified.
JOB_STATE_STOPPED (1):
JOB_STATE_STOPPED
indicates that the job has not yet
started to run.
JOB_STATE_RUNNING (2):
JOB_STATE_RUNNING
indicates that the job is currently
running.
JOB_STATE_DONE (3):
JOB_STATE_DONE
indicates that the job has successfully
completed. This is a terminal job state. This state may be
set by the Cloud Dataflow service, as a transition from
JOB_STATE_RUNNING
. It may also be set via a Cloud
Dataflow UpdateJob
call, if the job has not yet reached
a terminal state.
JOB_STATE_FAILED (4):
JOB_STATE_FAILED
indicates that the job has failed. This
is a terminal job state. This state may only be set by the
Cloud Dataflow service, and only as a transition from
JOB_STATE_RUNNING
.
JOB_STATE_CANCELLED (5):
JOB_STATE_CANCELLED
indicates that the job has been
explicitly cancelled. This is a terminal job state. This
state may only be set via a Cloud Dataflow UpdateJob
call, and only if the job has not yet reached another
terminal state.
JOB_STATE_UPDATED (6):
JOB_STATE_UPDATED
indicates that the job was
successfully updated, meaning that this job was stopped and
another job was started, inheriting state from this one.
This is a terminal job state. This state may only be set by
the Cloud Dataflow service, and only as a transition from
JOB_STATE_RUNNING
.
JOB_STATE_DRAINING (7):
JOB_STATE_DRAINING
indicates that the job is in the
process of draining. A draining job has stopped pulling from
its input sources and is processing any data that remains
in-flight. This state may be set via a Cloud Dataflow
UpdateJob
call, but only as a transition from
JOB_STATE_RUNNING
. Jobs that are draining may only
transition to JOB_STATE_DRAINED
,
JOB_STATE_CANCELLED
, or JOB_STATE_FAILED
.
JOB_STATE_DRAINED (8):
JOB_STATE_DRAINED
indicates that the job has been
drained. A drained job terminated by stopping pulling from
its input sources and processing any data that remained
in-flight when draining was requested. This state is a
terminal state, may only be set by the Cloud Dataflow
service, and only as a transition from
JOB_STATE_DRAINING
.
JOB_STATE_PENDING (9):
JOB_STATE_PENDING
indicates that the job has been
created but is not yet running. Jobs that are pending may
only transition to JOB_STATE_RUNNING
, or
JOB_STATE_FAILED
.
JOB_STATE_CANCELLING (10):
JOB_STATE_CANCELLING
indicates that the job has been
explicitly cancelled and is in the process of stopping. Jobs
that are cancelling may only transition to
JOB_STATE_CANCELLED
or JOB_STATE_FAILED
.
JOB_STATE_QUEUED (11):
JOB_STATE_QUEUED
indicates that the job has been created
but is being delayed until launch. Jobs that are queued may
only transition to JOB_STATE_PENDING
or
JOB_STATE_CANCELLED
.
JOB_STATE_RESOURCE_CLEANING_UP (12):
JOB_STATE_RESOURCE_CLEANING_UP
indicates that the batch
job's associated resources are currently being cleaned up
after a successful run. Currently, this is an opt-in
feature, please reach out to Cloud support team if you are
interested.
JobType
Specifies the processing model used by a [google.dataflow.v1beta3.Job], which determines the way the Job is managed by the Cloud Dataflow service (how workers are scheduled, how inputs are sharded, etc).
Values: JOB_TYPE_UNKNOWN (0): The type of the job is unspecified, or unknown. JOB_TYPE_BATCH (1): A batch job with a well-defined end point: data is read, data is processed, data is written, and the job is done. JOB_TYPE_STREAMING (2): A continuously streaming job with no end: data is read, processed, and written continuously.
JobView
Selector for how much information is returned in Job responses.
Values:
JOB_VIEW_UNKNOWN (0):
The job view to return isn't specified, or is unknown.
Responses will contain at least the JOB_VIEW_SUMMARY
information, and may contain additional information.
JOB_VIEW_SUMMARY (1):
Request summary information only:
Project ID, Job ID, job name, job type, job
status, start/end time, and Cloud SDK version
details.
JOB_VIEW_ALL (2):
Request all information available for this
job.
JOB_VIEW_DESCRIPTION (3):
Request summary info and limited job
description data for steps, labels and
environment.
KeyRangeDataDiskAssignment
Data disk assignment information for a specific key-range of a sharded computation. Currently we only support UTF-8 character splits to simplify encoding into JSON.
KeyRangeLocation
Location information for a specific key-range of a sharded computation. Currently we only support UTF-8 character splits to simplify encoding into JSON.
KindType
Type of transform or stage operation.
Values: UNKNOWN_KIND (0): Unrecognized transform type. PAR_DO_KIND (1): ParDo transform. GROUP_BY_KEY_KIND (2): Group By Key transform. FLATTEN_KIND (3): Flatten transform. READ_KIND (4): Read transform. WRITE_KIND (5): Write transform. CONSTANT_KIND (6): Constructs from a constant value, such as with Create.of. SINGLETON_KIND (7): Creates a Singleton view of a collection. SHUFFLE_KIND (8): Opening or closing a shuffle session, often as part of a GroupByKey.
LaunchFlexTemplateParameter
Launch FlexTemplate Parameter.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
LaunchFlexTemplateRequest
A request to launch a Cloud Dataflow job from a FlexTemplate.
LaunchFlexTemplateResponse
Response to the request to launch a job from Flex Template.
LaunchTemplateParameters
Parameters to provide to the template being launched.
LaunchTemplateRequest
A request to launch a template.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
LaunchTemplateResponse
Response to the request to launch a template.
ListJobMessagesRequest
Request to list job messages. Up to max_results messages will be returned in the time range specified starting with the oldest messages first. If no time range is specified the results with start with the oldest message.
ListJobMessagesResponse
Response to a request to list job messages.
ListJobsRequest
Request to list Cloud Dataflow jobs.
ListJobsResponse
Response to a request to list Cloud Dataflow jobs in a project. This might be a partial response, depending on the page size in the ListJobsRequest. However, if the project does not have any jobs, an instance of ListJobsResponse is not returned and the requests's response body is empty {}.
ListSnapshotsRequest
Request to list snapshots.
ListSnapshotsResponse
List of snapshots.
MetricStructuredName
Identifies a metric, by describing the source which generated the metric.
MetricUpdate
Describes the state of a metric.
MountedDataDisk
Describes mounted data disk.
Package
The packages that must be installed in order for a worker to run the steps of the Cloud Dataflow job that will be assigned to its worker pool.
This is the mechanism by which the Cloud Dataflow SDK causes code to be loaded onto the workers. For example, the Cloud Dataflow Java SDK might use this to install jars containing the user's code and all of the various dependencies (libraries, data files, etc.) required in order for that code to run.
ParameterMetadata
Metadata for a specific parameter.
ParameterType
ParameterType specifies what kind of input we need for this parameter.
Values: DEFAULT (0): Default input type. TEXT (1): The parameter specifies generic text input. GCS_READ_BUCKET (2): The parameter specifies a Cloud Storage Bucket to read from. GCS_WRITE_BUCKET (3): The parameter specifies a Cloud Storage Bucket to write to. GCS_READ_FILE (4): The parameter specifies a Cloud Storage file path to read from. GCS_WRITE_FILE (5): The parameter specifies a Cloud Storage file path to write to. GCS_READ_FOLDER (6): The parameter specifies a Cloud Storage folder path to read from. GCS_WRITE_FOLDER (7): The parameter specifies a Cloud Storage folder to write to. PUBSUB_TOPIC (8): The parameter specifies a Pub/Sub Topic. PUBSUB_SUBSCRIPTION (9): The parameter specifies a Pub/Sub Subscription.
PipelineDescription
A descriptive representation of submitted pipeline as well as the executed form. This data is provided by the Dataflow service for ease of visualizing the pipeline and interpreting Dataflow provided metrics.
ProgressTimeseries
Information about the progress of some component of job execution.
PubSubIODetails
Metadata for a Pub/Sub connector used by the job.
PubsubLocation
Identifies a pubsub location to use for transferring data into or out of a streaming Dataflow job.
PubsubSnapshotMetadata
Represents a Pubsub snapshot.
RuntimeEnvironment
The environment values to set at runtime.
RuntimeMetadata
RuntimeMetadata describing a runtime environment.
SDKInfo
SDK Information.
SdkHarnessContainerImage
Defines a SDK harness container for executing Dataflow pipelines.
SdkVersion
The version of the SDK used to run the job.
ShuffleMode
Specifies the shuffle mode used by a [google.dataflow.v1beta3.Job], which determines the approach data is shuffled during processing. More details in: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-shuffle
Values: SHUFFLE_MODE_UNSPECIFIED (0): Shuffle mode information is not available. VM_BASED (1): Shuffle is done on the worker VMs. SERVICE_BASED (2): Shuffle is done on the service side.
Snapshot
Represents a snapshot of a job.
SnapshotJobRequest
Request to create a snapshot of a job.
SnapshotState
Snapshot state.
Values: UNKNOWN_SNAPSHOT_STATE (0): Unknown state. PENDING (1): Snapshot intent to create has been persisted, snapshotting of state has not yet started. RUNNING (2): Snapshotting is being performed. READY (3): Snapshot has been created and is ready to be used. FAILED (4): Snapshot failed to be created. DELETED (5): Snapshot has been deleted.
SpannerIODetails
Metadata for a Spanner connector used by the job.
StageExecutionDetails
Information about the workers and work items within a stage.
StageSummary
Information about a particular execution stage of a job.
StateFamilyConfig
State family configuration.
Step
Defines a particular step within a Cloud Dataflow job.
A job consists of multiple steps, each of which performs some specific operation as part of the overall job. Data is typically passed from one step to another as part of the job.
Here's an example of a sequence of steps which together implement a Map-Reduce job:
Read a collection of data from some source, parsing the collection's elements.
Validate the elements.
Apply a user-defined function to map each element to some value and extract an element-specific key value.
Group elements with the same key into a single element with that key, transforming a multiply-keyed collection into a uniquely-keyed collection.
Write the elements out to some data sink.
Note that the Cloud Dataflow service may be used to run many different types of jobs, not just Map-Reduce.
StreamLocation
Describes a stream of data, either as input to be processed or as output of a streaming Dataflow job.
This message has oneof
_ fields (mutually exclusive fields).
For each oneof, at most one member field can be set at the same time.
Setting any member of the oneof automatically clears all other
members.
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
StreamingApplianceSnapshotConfig
Streaming appliance snapshot configuration.
StreamingComputationRanges
Describes full or partial data disk assignment information of the computation ranges.
StreamingSideInputLocation
Identifies the location of a streaming side input.
StreamingStageLocation
Identifies the location of a streaming computation stage, for stage-to-stage communication.
StructuredMessage
A rich message format, including a human readable string, a key for identifying the message, and structured data associated with the message for programmatic consumption.
TaskRunnerSettings
Taskrunner configuration settings.
TeardownPolicy
Specifies what happens to a resource when a Cloud Dataflow
google.dataflow.v1beta3.Job][google.dataflow.v1beta3.Job]
has
completed.
Values: TEARDOWN_POLICY_UNKNOWN (0): The teardown policy isn't specified, or is unknown. TEARDOWN_ALWAYS (1): Always teardown the resource. TEARDOWN_ON_SUCCESS (2): Teardown the resource on success. This is useful for debugging failures. TEARDOWN_NEVER (3): Never teardown the resource. This is useful for debugging and development.
TemplateMetadata
Metadata describing a template.
TopologyConfig
Global topology of the streaming Dataflow job, including all computations and their sharded locations.
TransformSummary
Description of the type, names/ids, and input/outputs for a transform.
UpdateJobRequest
Request to update a Cloud Dataflow job.
WorkItemDetails
Information about an individual work item execution.
WorkerDetails
Information about a worker
WorkerIPAddressConfiguration
Specifies how IP addresses should be allocated to the worker machines.
Values: WORKER_IP_UNSPECIFIED (0): The configuration is unknown, or unspecified. WORKER_IP_PUBLIC (1): Workers should have public IP addresses. WORKER_IP_PRIVATE (2): Workers should have private IP addresses.
WorkerPool
Describes one particular pool of Cloud Dataflow workers to be instantiated by the Cloud Dataflow service in order to perform the computations required by a job. Note that a workflow job may use multiple pools, in order to match the various computational requirements of the various stages of the job.
WorkerSettings
Provides data to pass through to the worker harness.