Index
Pipelines
(interface)CreatePipelineRequest
(message)DataflowJobDetails
(message)DeletePipelineRequest
(message)FlexResourceSchedulingGoal
(enum)FlexTemplateRuntimeEnvironment
(message)GetPipelineRequest
(message)Job
(message)Job.State
(enum)LaunchFlexTemplateParameter
(message)LaunchFlexTemplateRequest
(message)LaunchTemplateParameters
(message)LaunchTemplateRequest
(message)ListJobsRequest
(message)ListJobsResponse
(message)ListPipelinesRequest
(message)ListPipelinesResponse
(message)Pipeline
(message)Pipeline.PipelineType
(enum)Pipeline.State
(enum)RunPipelineRequest
(message)RunPipelineResponse
(message)RuntimeEnvironment
(message)ScheduleSpec
(message)SdkVersion
(message)SdkVersion.SdkSupportStatus
(enum)StopPipelineRequest
(message)UpdatePipelineRequest
(message)WorkerIPAddressConfiguration
(enum)Workload
(message)
Pipelines
Provides an interface for creating, updating, and managing recurring Data Analytics jobs. The service supports existing executors like Dataflow for processing, and it enables launching jobs automatically through an internal scheduler or directly through invocation by an external user.
CreatePipeline |
---|
Creates a pipeline. For a batch pipeline, you can pass scheduler information. Data Pipelines uses the scheduler information to create an internal scheduler that runs jobs periodically. If the internal scheduler is not configured, you can use
|
DeletePipeline |
---|
Deletes a pipeline. If a scheduler job is attached to the pipeline, it will be deleted.
|
GetPipeline |
---|
Looks up a single pipeline. Returns a "NOT_FOUND" error if no such pipeline exists. Returns a "FORBIDDEN" error if the caller doesn't have permission to access it.
|
ListJobs |
---|
Lists jobs for a given pipeline. Throws a "FORBIDDEN" error if the caller doesn't have permission to access it.
|
ListPipelines |
---|
Lists pipelines. Returns a "FORBIDDEN" error if the caller doesn't have permission to access it.
|
RunPipeline |
---|
Creates a job for the specified pipeline directly. You can use this method when the internal scheduler is not configured and you want to trigger the job directly or through an external system. Returns a "NOT_FOUND" error if the pipeline doesn't exist. Returns a "FORBIDDEN" error if the user doesn't have permission to access the pipeline or run jobs for the pipeline.
|
StopPipeline |
---|
Freezes pipeline execution permanently. If there's a corresponding scheduler entry, it's deleted, and the pipeline state is changed to "ARCHIVED". However, pipeline metadata is retained.
|
UpdatePipeline |
---|
Updates a pipeline. If successful, the updated Pipeline is returned. Returns If UpdatePipeline does not return successfully, you can retry the UpdatePipeline request until you receive a successful response.
|
CreatePipelineRequest
Request message for CreatePipeline
Fields | |
---|---|
parent |
Required. The location name. For example: |
pipeline |
Required. The pipeline to add. |
DataflowJobDetails
Pipeline job details specific to the Dataflow API. This is encapsulated here to allow for more executors to store their specific details separately.
Fields | |
---|---|
sdk_version |
Output only. The SDK version used to run the job. |
current_workers |
Output only. The current number of workers used to run the jobs. Only set to a value if the job is still running. |
resource_info |
Cached version of all the metrics of interest for the job. This value gets stored here when the job is terminated. As long as the job is running, this field is populated from the Dataflow API. |
DeletePipelineRequest
Request message for deleting a pipeline using DeletePipeline
.
Fields | |
---|---|
name |
Required. The pipeline name. For example: |
FlexResourceSchedulingGoal
Specifies the resource to optimize for in Flexible Resource Scheduling.
Enums | |
---|---|
FLEXRS_UNSPECIFIED |
Run in the default mode. |
FLEXRS_SPEED_OPTIMIZED |
Optimize for lower execution time. |
FLEXRS_COST_OPTIMIZED |
Optimize for lower cost. |
FlexTemplateRuntimeEnvironment
The environment values to be set at runtime for a Flex Template.
Fields | |
---|---|
num_workers |
The initial number of Compute Engine instances for the job. |
max_workers |
The maximum number of Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000. |
zone |
The Compute Engine availability zone for launching worker instances to run your pipeline. In the future, worker_zone will take precedence. |
service_account_email |
The email address of the service account to run the job as. |
temp_location |
The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with |
machine_type |
The machine type to use for the job. Defaults to the value from the template if not specified. |
additional_experiments[] |
Additional experiment flags for the job. |
network |
Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default". |
subnetwork |
Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL. |
additional_user_labels |
Additional user labels to be specified for the job. Keys and values must follow the restrictions specified in the labeling restrictions. An object containing a list of key/value pairs. Example: |
kms_key_name |
Name for the Cloud KMS key for the job. Key format is: projects/ |
ip_configuration |
Configuration for VM IPs. |
worker_region |
The Compute Engine region (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, defaults to the control plane region. |
worker_zone |
The Compute Engine zone (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1-a". Mutually exclusive with worker_region. If neither worker_region nor worker_zone is specified, a zone in the control plane region is chosen based on available capacity. If both |
enable_streaming_engine |
Whether to enable Streaming Engine for the job. |
flexrs_goal |
Set FlexRS goal for the job. https://cloud.google.com/dataflow/docs/guides/flexrs |
GetPipelineRequest
Request message for GetPipeline
.
Fields | |
---|---|
name |
Required. The pipeline name. For example: |
Job
Definition of the job information maintained by the pipeline. Fields in this entity are retrieved from the executor API (e.g. Dataflow API).
Fields | |
---|---|
name |
Required. The fully qualified resource name for the job. |
id |
Output only. The internal ID for the job. |
create_time |
Output only. The time of job creation. |
end_time |
Output only. The time of job termination. This is absent if the job is still running. |
state |
The current state of the job. |
status |
Status capturing any error code or message related to job creation or execution. |
dataflow_job_details |
All the details that are specific to a Dataflow job. |
State
Enum lisitng all the job execution states.
Enums | |
---|---|
STATE_UNSPECIFIED |
The job state isn't specified. |
STATE_PENDING |
The job is waiting to start execution. |
STATE_RUNNING |
The job is executing. |
STATE_DONE |
The job has finished execution successfully. |
STATE_FAILED |
The job has finished execution with a failure. |
STATE_CANCELLED |
The job has been terminated upon user request. |
LaunchFlexTemplateParameter
Launch Flex Template parameter.
Fields | |
---|---|
job_name |
Required. The job name to use for the created job. For an update job request, the job name should be the same as the existing running job. |
parameters |
The parameters for the Flex Template. Example: |
launch_options |
Launch options for this Flex Template job. This is a common set of options across languages and templates. This should not be used to pass job parameters. |
environment |
The runtime environment for the Flex Template job. |
update |
Set this to true if you are sending a request to update a running streaming job. When set, the job name should be the same as the running job. |
transform_name_mappings |
Use this to pass transform name mappings for streaming update jobs. Example: |
container_spec_gcs_path |
Cloud Storage path to a file with a JSON-serialized ContainerSpec as content. |
LaunchFlexTemplateRequest
A request to launch a Dataflow job from a Flex Template.
Fields | |
---|---|
project_id |
Required. The ID of the Cloud Platform project that the job belongs to. |
launch_parameter |
Required. Parameter to launch a job from a Flex Template. |
location |
Required. The regional endpoint to which to direct the request. For example, |
validate_only |
If true, the request is validated but not actually executed. Defaults to false. |
LaunchTemplateParameters
Parameters to provide to the template being launched.
Fields | |
---|---|
job_name |
Required. The job name to use for the created job. |
parameters |
The runtime parameters to pass to the job. |
environment |
The runtime environment for the job. |
update |
If set, replace the existing pipeline with the name specified by jobName with this pipeline, preserving state. |
transform_name_mapping |
Map of transform name prefixes of the job to be replaced to the corresponding name prefixes of the new job. Only applicable when updating a pipeline. |
LaunchTemplateRequest
A request to launch a template.
Fields | |
---|---|
project_id |
Required. The ID of the Cloud Platform project that the job belongs to. |
validate_only |
If true, the request is validated but not actually executed. Defaults to false. |
launch_parameters |
The parameters of the template to launch. This should be part of the body of the POST request. |
location |
The regional endpoint to which to direct the request. |
gcs_path |
A Cloud Storage path to the template from which to create the job. Must be a valid Cloud Storage URL, beginning with 'gs://'. |
ListJobsRequest
Request message for [ListJobs][Datapipelines.ListJobs]
Fields | |
---|---|
parent |
Required. The pipeline name. For example: |
page_size |
The maximum number of entities to return. The service may return fewer than this value, even if there are additional pages. If unspecified, the max limit will be determined by the backend implementation. |
page_token |
A page token, received from a previous When paginating, all other parameters provided to |
ListJobsResponse
Response message for [ListJobs][Datapipelines.ListJobs]
Fields | |
---|---|
jobs[] |
Results that were accessible to the caller. Results are always in descending order of job creation date. |
next_page_token |
A token, which can be sent as |
ListPipelinesRequest
Request message for ListPipelines
Fields | |
---|---|
parent |
Required. The location name. For example: |
filter |
An expression for filtering the results of the request. If unspecified, all pipelines will be returned. Multiple filters can be applied and must be comma separated. Fields eligible for filtering are:
For example, to limit results to active batch processing pipelines:
|
page_size |
The maximum number of entities to return. The service may return fewer than this value, even if there are additional pages. If unspecified, the max limit is yet to be determined by the backend implementation. |
page_token |
A page token, received from a previous When paginating, all other parameters provided to |
ListPipelinesResponse
Response message for ListPipelines
.
Fields | |
---|---|
pipelines[] |
Results that matched the filter criteria and were accessible to the caller. Results are always in descending order of pipeline creation date. |
next_page_token |
A token, which can be sent as |
Pipeline
The main pipeline entity and all the necessary metadata for launching and managing linked jobs.
Fields | |
---|---|
name |
The pipeline name. For example:
|
display_name |
Required. The display name of the pipeline. It can contain only letters ([A-Za-z]), numbers ([0-9]), hyphens (-), and underscores (_). |
type |
Required. The type of the pipeline. This field affects the scheduling of the pipeline and the type of metrics to show for the pipeline. |
state |
Required. The state of the pipeline. When the pipeline is created, the state is set to 'PIPELINE_STATE_ACTIVE' by default. State changes can be requested by setting the state to stopping, paused, or resuming. State cannot be changed through UpdatePipeline requests. |
create_time |
Output only. Immutable. The timestamp when the pipeline was initially created. Set by the Data Pipelines service. |
last_update_time |
Output only. Immutable. The timestamp when the pipeline was last modified. Set by the Data Pipelines service. |
workload |
Workload information for creating new jobs. |
schedule_info |
Internal scheduling information for a pipeline. If this information is provided, periodic jobs will be created per the schedule. If not, users are responsible for creating jobs externally. |
job_count |
Output only. Number of jobs. |
scheduler_service_account_email |
Optional. A service account email to be used with the Cloud Scheduler job. If not specified, the default compute engine service account will be used. |
pipeline_sources |
Immutable. The sources of the pipeline (for example, Dataplex). The keys and values are set by the corresponding sources during pipeline creation. |
PipelineType
The type of a pipeline. For example, batch or streaming.
Enums | |
---|---|
PIPELINE_TYPE_UNSPECIFIED |
The pipeline type isn't specified. |
PIPELINE_TYPE_BATCH |
A batch pipeline. It runs jobs on a specific schedule, and each job will automatically terminate once execution is finished. |
PIPELINE_TYPE_STREAMING |
A streaming pipeline. The underlying job is continuously running until it is manually terminated by the user. This type of pipeline doesn't have a schedule to run on, and the linked job gets created when the pipeline is created. |
State
The current state of pipeline execution.
Enums | |
---|---|
STATE_UNSPECIFIED |
The pipeline state isn't specified. |
STATE_RESUMING |
The pipeline is getting started or resumed. When finished, the pipeline state will be 'PIPELINE_STATE_ACTIVE'. |
STATE_ACTIVE |
The pipeline is actively running. |
STATE_STOPPING |
The pipeline is in the process of stopping. When finished, the pipeline state will be 'PIPELINE_STATE_ARCHIVED'. |
STATE_ARCHIVED |
The pipeline has been stopped. This is a terminal state and cannot be undone. |
STATE_PAUSED |
The pipeline is paused. This is a non-terminal state. When the pipeline is paused, it will hold processing jobs, but can be resumed later. For a batch pipeline, this means pausing the scheduler job. For a streaming pipeline, creating a job snapshot to resume from will give the same effect. |
RunPipelineRequest
Request message for RunPipeline
Fields | |
---|---|
name |
Required. The pipeline name. For example: |
RunPipelineResponse
Response message for RunPipeline
Fields | |
---|---|
job |
Job that was created as part of RunPipeline operation. |
RuntimeEnvironment
The environment values to set at runtime.
Fields | |
---|---|
num_workers |
The initial number of Compute Engine instances for the job. |
max_workers |
The maximum number of Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000. |
zone |
The Compute Engine availability zone for launching worker instances to run your pipeline. In the future, worker_zone will take precedence. |
service_account_email |
The email address of the service account to run the job as. |
temp_location |
The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with |
bypass_temp_dir_validation |
Whether to bypass the safety checks for the job's temporary directory. Use with caution. |
machine_type |
The machine type to use for the job. Defaults to the value from the template if not specified. |
additional_experiments[] |
Additional experiment flags for the job. |
network |
Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default". |
subnetwork |
Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL. |
additional_user_labels |
Additional user labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. An object containing a list of key/value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }. |
kms_key_name |
Name for the Cloud KMS key for the job. The key format is: projects/ |
ip_configuration |
Configuration for VM IPs. |
worker_region |
The Compute Engine region (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, default to the control plane's region. |
worker_zone |
The Compute Engine zone (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1-a". Mutually exclusive with worker_region. If neither worker_region nor worker_zone is specified, a zone in the control plane's region is chosen based on available capacity. If both |
enable_streaming_engine |
Whether to enable Streaming Engine for the job. |
ScheduleSpec
Details of the schedule the pipeline runs on.
Fields | |
---|---|
schedule |
Unix-cron format of the schedule. This information is retrieved from the linked Cloud Scheduler. |
time_zone |
Timezone ID. This matches the timezone IDs used by the Cloud Scheduler API. If empty, UTC time is assumed. |
next_job_time |
Output only. When the next Scheduler job is going to run. |
SdkVersion
The version of the SDK used to run the job.
Fields | |
---|---|
version |
The version of the SDK used to run the job. |
version_display_name |
A readable string describing the version of the SDK. |
sdk_support_status |
The support status for this SDK version. |
SdkSupportStatus
The support status of the SDK used to run the job.
Enums | |
---|---|
UNKNOWN |
Dataflow is unaware of this version. |
SUPPORTED |
This is a known version of an SDK, and is supported. |
STALE |
A newer version of the SDK exists, and an update is recommended. |
DEPRECATED |
This version of the SDK is deprecated and will eventually be unsupported. |
UNSUPPORTED |
Support for this SDK version has ended and it should no longer be used. |
StopPipelineRequest
Request message for StopPipeline
.
Fields | |
---|---|
name |
Required. The pipeline name. For example: |
UpdatePipelineRequest
Request message for [UpdatePipelineSchedule][Pipelines.UpdatePipelineSchedule].
Fields | |
---|---|
pipeline |
Required. The pipeline to update. For example: |
update_mask |
The list of fields to be updated. |
WorkerIPAddressConfiguration
Specifies how IP addresses should be allocated to the worker machines.
Enums | |
---|---|
WORKER_IP_UNSPECIFIED |
The configuration is unknown, or unspecified. |
WORKER_IP_PUBLIC |
Workers should have public IP addresses. |
WORKER_IP_PRIVATE |
Workers should have private IP addresses. |
Workload
Workload details for creating the pipeline jobs.
Fields | |
---|---|
Union field
|
|
dataflow_launch_template_request |
Template information and additional parameters needed to launch a Dataflow job using the standard launch API. |
dataflow_flex_template_request |
Template information and additional parameters needed to launch a Dataflow job using the flex launch API. |