Package types (0.1.5)

API documentation for dataflow_v1beta3.types package.

Classes

AutoscalingAlgorithm

Specifies the algorithm used to determine the number of worker processes to run at any given point in time, based on the amount of data left to process, the number of workers, and how quickly existing workers are processing data.

AutoscalingEvent

A structured message reporting an autoscaling decision made by the Dataflow service.

AutoscalingSettings

Settings for WorkerPool autoscaling. .. attribute:: algorithm

The algorithm to use for autoscaling.

:type: google.cloud.dataflow_v1beta3.types.AutoscalingAlgorithm

BigQueryIODetails

Metadata for a BigQuery connector used by the job. .. attribute:: table

Table accessed in the connection.

:type: str

BigTableIODetails

Metadata for a Cloud BigTable connector used by the job. .. attribute:: project_id

ProjectId accessed in the connection.

:type: str

CheckActiveJobsRequest

Request to check is active jobs exists for a project .. attribute:: project_id

The project which owns the jobs.

:type: str

CheckActiveJobsResponse

Response for CheckActiveJobsRequest. .. attribute:: active_jobs_exist

If True, active jobs exists for project. False otherwise.

:type: bool

ComputationTopology

All configuration data for a particular Computation. .. attribute:: system_stage_name

The system stage name.

:type: str

ContainerSpec

Container Spec. .. attribute:: image

Name of the docker container image. E.g., gcr.io/project/some-image

:type: str

CreateJobFromTemplateRequest

A request to create a Cloud Dataflow job from a template. .. attribute:: project_id

Required. The ID of the Cloud Platform project that the job belongs to.

:type: str

CreateJobRequest

Request to create a Cloud Dataflow job. .. attribute:: project_id

The ID of the Cloud Platform project that the job belongs to.

:type: str

CustomSourceLocation

Identifies the location of a custom souce. .. attribute:: stateful

Whether this source is stateful.

:type: bool

DataDiskAssignment

Data disk assignment for a given VM instance. .. attribute:: vm_instance

VM instance name the data disks mounted to, for example "myproject-1014-104817-4c2-harness-0".

:type: str

DatastoreIODetails

Metadata for a Datastore connector used by the job. .. attribute:: namespace

Namespace used in the connection.

:type: str

DebugOptions

Describes any options that have an effect on the debugging of pipelines.

DefaultPackageSet

The default set of packages to be staged on a pool of workers.

DeleteSnapshotRequest

Request to delete a snapshot. .. attribute:: project_id

The ID of the Cloud Platform project that the snapshot belongs to.

:type: str

DeleteSnapshotResponse

Response from deleting a snapshot.

Disk

Describes the data disk used by a workflow job. .. attribute:: size_gb

Size of disk in GB. If zero or unspecified, the service will attempt to choose a reasonable default.

:type: int

DisplayData

Data provided with a pipeline or transform to provide descriptive info.

DynamicTemplateLaunchParams

Params which should be passed when launching a dynamic template.

Environment

Describes the environment in which a Dataflow Job runs. .. attribute:: temp_storage_prefix

The prefix of the resources the system should use for temporary storage. The system will append the suffix "/temp-{JOBNAME} to this resource prefix, where {JOBNAME} is the value of the job_name field. The resulting bucket and object prefix is used as the prefix of the resources used to store temporary data needed during the job execution. NOTE: This will override the value in taskrunner_settings. The supported resource type is:

Google Cloud Storage:

storage.googleapis.com/{bucket}/{object} bucket.storage.googleapis.com/{object}

:type: str

ExecutionStageState

A message describing the state of a particular execution stage.

ExecutionStageSummary

Description of the composing transforms, names/ids, and input/outputs of a stage of execution. Some composing transforms and sources may have been generated by the Dataflow service during execution planning.

ExecutionState

The state of some component of job execution.

FailedLocation

Indicates which regional endpoint failed to respond to a request for data.

FileIODetails

Metadata for a File connector used by the job. .. attribute:: file_pattern

File Pattern used to access files by the connector.

:type: str

FlexResourceSchedulingGoal

Specifies the resource to optimize for in Flexible Resource Scheduling.

FlexTemplateRuntimeEnvironment

The environment values to be set at runtime for flex template.

GetJobExecutionDetailsRequest

Request to get job execution details. .. attribute:: project_id

A project id.

:type: str

GetJobMetricsRequest

Request to get job metrics. .. attribute:: project_id

A project id.

:type: str

GetJobRequest

Request to get the state of a Cloud Dataflow job. .. attribute:: project_id

The ID of the Cloud Platform project that the job belongs to.

:type: str

GetSnapshotRequest

Request to get information about a snapshot .. attribute:: project_id

The ID of the Cloud Platform project that the snapshot belongs to.

:type: str

GetStageExecutionDetailsRequest

Request to get information about a particular execution stage of a job. Currently only tracked for Batch jobs.

GetTemplateRequest

A request to retrieve a Cloud Dataflow job template. .. attribute:: project_id

Required. The ID of the Cloud Platform project that the job belongs to.

:type: str

GetTemplateResponse

The response to a GetTemplate request. .. attribute:: status

The status of the get template request. Any problems with the request will be indicated in the error_details.

:type: google.rpc.status_pb2.Status

InvalidTemplateParameters

Used in the error_details field of a google.rpc.Status message, this indicates problems with the template parameter.

Job

Defines a job to be run by the Cloud Dataflow service. nextID: 26

JobExecutionDetails

Information about the execution of a job. .. attribute:: stages

The stages of the job execution.

:type: Sequence[google.cloud.dataflow_v1beta3.types.StageSummary]

JobExecutionInfo

Additional information about how a Cloud Dataflow job will be executed that isn't contained in the submitted job.

JobExecutionStageInfo

Contains information about how a particular google.dataflow.v1beta3.Step][google.dataflow.v1beta3.Step] will be executed.

JobMessage

A particular message pertaining to a Dataflow job. .. attribute:: id

Deprecated.

:type: str

JobMessageImportance

Indicates the importance of the message.

JobMetadata

Metadata available primarily for filtering jobs. Will be included in the ListJob response and Job SUMMARY view.

JobMetrics

JobMetrics contains a collection of metrics describing the detailed progress of a Dataflow job. Metrics correspond to user- defined and system-defined metrics in the job.

This resource captures only the most recent values of each metric; time-series data can be queried for them (under the same metric names) from Cloud Monitoring.

JobState

Describes the overall state of a google.dataflow.v1beta3.Job][google.dataflow.v1beta3.Job].

JobType

Specifies the processing model used by a [google.dataflow.v1beta3.Job], which determines the way the Job is managed by the Cloud Dataflow service (how workers are scheduled, how inputs are sharded, etc).

JobView

Selector for how much information is returned in Job responses.

KeyRangeDataDiskAssignment

Data disk assignment information for a specific key-range of a sharded computation. Currently we only support UTF-8 character splits to simplify encoding into JSON.

KeyRangeLocation

Location information for a specific key-range of a sharded computation. Currently we only support UTF-8 character splits to simplify encoding into JSON.

KindType

Type of transform or stage operation.

LaunchFlexTemplateParameter

Launch FlexTemplate Parameter. .. attribute:: job_name

Required. The job name to use for the created job. For update job request, job name should be same as the existing running job.

:type: str

LaunchFlexTemplateRequest

A request to launch a Cloud Dataflow job from a FlexTemplate. .. attribute:: project_id

Required. The ID of the Cloud Platform project that the job belongs to.

:type: str

LaunchFlexTemplateResponse

Response to the request to launch a job from Flex Template. .. attribute:: job

The job that was launched, if the request was not a dry run and the job was successfully launched.

:type: google.cloud.dataflow_v1beta3.types.Job

LaunchTemplateParameters

Parameters to provide to the template being launched. .. attribute:: job_name

Required. The job name to use for the created job.

:type: str

LaunchTemplateRequest

A request to launch a template. .. attribute:: project_id

Required. The ID of the Cloud Platform project that the job belongs to.

:type: str

LaunchTemplateResponse

Response to the request to launch a template. .. attribute:: job

The job that was launched, if the request was not a dry run and the job was successfully launched.

:type: google.cloud.dataflow_v1beta3.types.Job

ListJobMessagesRequest

Request to list job messages. Up to max_results messages will be returned in the time range specified starting with the oldest messages first. If no time range is specified the results with start with the oldest message.

ListJobMessagesResponse

Response to a request to list job messages. .. attribute:: job_messages

Messages in ascending timestamp order.

:type: Sequence[google.cloud.dataflow_v1beta3.types.JobMessage]

ListJobsRequest

Request to list Cloud Dataflow jobs. .. attribute:: filter

The kind of filter to use.

:type: google.cloud.dataflow_v1beta3.types.ListJobsRequest.Filter

ListJobsResponse

Response to a request to list Cloud Dataflow jobs in a project. This might be a partial response, depending on the page size in the ListJobsRequest. However, if the project does not have any jobs, an instance of ListJobsResponse is not returned and the requests's response body is empty {}.

ListSnapshotsRequest

Request to list snapshots. .. attribute:: project_id

The project ID to list snapshots for.

:type: str

ListSnapshotsResponse

List of snapshots. .. attribute:: snapshots

Returned snapshots.

:type: Sequence[google.cloud.dataflow_v1beta3.types.Snapshot]

MetricStructuredName

Identifies a metric, by describing the source which generated the metric.

MetricUpdate

Describes the state of a metric. .. attribute:: name

Name of the metric.

:type: google.cloud.dataflow_v1beta3.types.MetricStructuredName

MountedDataDisk

Describes mounted data disk. .. attribute:: data_disk

The name of the data disk. This name is local to the Google Cloud Platform project and uniquely identifies the disk within that project, for example "myproject-1014-104817-4c2-harness-0-disk-1".

:type: str

Package

The packages that must be installed in order for a worker to run the steps of the Cloud Dataflow job that will be assigned to its worker pool.

This is the mechanism by which the Cloud Dataflow SDK causes code to be loaded onto the workers. For example, the Cloud Dataflow Java SDK might use this to install jars containing the user's code and all of the various dependencies (libraries, data files, etc.) required in order for that code to run.

ParameterMetadata

Metadata for a specific parameter. .. attribute:: name

Required. The name of the parameter.

:type: str

ParameterType

ParameterType specifies what kind of input we need for this parameter.

PipelineDescription

A descriptive representation of submitted pipeline as well as the executed form. This data is provided by the Dataflow service for ease of visualizing the pipeline and interpreting Dataflow provided metrics.

ProgressTimeseries

Information about the progress of some component of job execution.

PubSubIODetails

Metadata for a Pub/Sub connector used by the job. .. attribute:: topic

Topic accessed in the connection.

:type: str

PubsubLocation

Identifies a pubsub location to use for transferring data into or out of a streaming Dataflow job.

PubsubSnapshotMetadata

Represents a Pubsub snapshot. .. attribute:: topic_name

The name of the Pubsub topic.

:type: str

RuntimeEnvironment

The environment values to set at runtime. .. attribute:: num_workers

The initial number of Google Compute Engine instnaces for the job.

:type: int

RuntimeMetadata

RuntimeMetadata describing a runtime environment. .. attribute:: sdk_info

SDK Info for the template.

:type: google.cloud.dataflow_v1beta3.types.SDKInfo

SDKInfo

SDK Information. .. attribute:: language

Required. The SDK Language.

:type: google.cloud.dataflow_v1beta3.types.SDKInfo.Language

SdkHarnessContainerImage

Defines a SDK harness container for executing Dataflow pipelines.

SdkVersion

The version of the SDK used to run the job. .. attribute:: version

The version of the SDK used to run the job.

:type: str

ShuffleMode

Specifies the shuffle mode used by a [google.dataflow.v1beta3.Job], which determines the approach data is shuffled during processing. More details in: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-shuffle

Snapshot

Represents a snapshot of a job. .. attribute:: id

The unique ID of this snapshot.

:type: str

SnapshotJobRequest

Request to create a snapshot of a job. .. attribute:: project_id

The project which owns the job to be snapshotted.

:type: str

SnapshotState

Snapshot state.

SpannerIODetails

Metadata for a Spanner connector used by the job. .. attribute:: project_id

ProjectId accessed in the connection.

:type: str

StageExecutionDetails

Information about the workers and work items within a stage. .. attribute:: workers

Workers that have done work on the stage.

:type: Sequence[google.cloud.dataflow_v1beta3.types.WorkerDetails]

StageSummary

Information about a particular execution stage of a job. .. attribute:: stage_id

ID of this stage

:type: str

StateFamilyConfig

State family configuration. .. attribute:: state_family

The state family value.

:type: str

Step

Defines a particular step within a Cloud Dataflow job.

A job consists of multiple steps, each of which performs some specific operation as part of the overall job. Data is typically passed from one step to another as part of the job.

Here's an example of a sequence of steps which together implement a Map-Reduce job:

  • Read a collection of data from some source, parsing the collection's elements.

  • Validate the elements.

  • Apply a user-defined function to map each element to some value and extract an element-specific key value.

  • Group elements with the same key into a single element with that key, transforming a multiply-keyed collection into a uniquely-keyed collection.

  • Write the elements out to some data sink.

Note that the Cloud Dataflow service may be used to run many different types of jobs, not just Map-Reduce.

StreamLocation

Describes a stream of data, either as input to be processed or as output of a streaming Dataflow job.

StreamingApplianceSnapshotConfig

Streaming appliance snapshot configuration. .. attribute:: snapshot_id

If set, indicates the snapshot id for the snapshot being performed.

:type: str

StreamingComputationRanges

Describes full or partial data disk assignment information of the computation ranges.

StreamingSideInputLocation

Identifies the location of a streaming side input. .. attribute:: tag

Identifies the particular side input within the streaming Dataflow job.

:type: str

StreamingStageLocation

Identifies the location of a streaming computation stage, for stage-to-stage communication.

StructuredMessage

A rich message format, including a human readable string, a key for identifying the message, and structured data associated with the message for programmatic consumption.

TaskRunnerSettings

Taskrunner configuration settings. .. attribute:: task_user

The UNIX user ID on the worker VM to use for tasks launched by taskrunner; e.g. "root".

:type: str

TeardownPolicy

Specifies what happens to a resource when a Cloud Dataflow google.dataflow.v1beta3.Job][google.dataflow.v1beta3.Job] has completed.

TemplateMetadata

Metadata describing a template. .. attribute:: name

Required. The name of the template.

:type: str

TopologyConfig

Global topology of the streaming Dataflow job, including all computations and their sharded locations.

TransformSummary

Description of the type, names/ids, and input/outputs for a transform.

UpdateJobRequest

Request to update a Cloud Dataflow job. .. attribute:: project_id

The ID of the Cloud Platform project that the job belongs to.

:type: str

WorkItemDetails

Information about an individual work item execution. .. attribute:: task_id

Name of this work item.

:type: str

WorkerDetails

Information about a worker .. attribute:: worker_name

Name of this worker

:type: str

WorkerIPAddressConfiguration

Specifies how IP addresses should be allocated to the worker machines.

WorkerPool

Describes one particular pool of Cloud Dataflow workers to be instantiated by the Cloud Dataflow service in order to perform the computations required by a job. Note that a workflow job may use multiple pools, in order to match the various computational requirements of the various stages of the job.

WorkerSettings

Provides data to pass through to the worker harness. .. attribute:: base_url

The base URL for accessing Google Cloud APIs. When workers access Google Cloud APIs, they logically do so via relative URLs. If this field is specified, it supplies the base URL to use for resolving these relative URLs. The normative algorithm used is defined by RFC 1808, "Relative Uniform Resource Locators".

If not specified, the default value is "http://www.googleapis.com/".

:type: str