DataflowJob
Property | Value |
---|---|
Google Cloud Service Name | Cloud Dataflow |
Google Cloud Service Documentation | /dataflow/docs/ |
Google Cloud REST Resource Name | v1b3.projects.jobs |
Google Cloud REST Resource Documentation | /dataflow/docs/reference/rest/v1b3/projects.jobs |
Config Connector Resource Short Names | gcpdataflowjob gcpdataflowjobs dataflowjob |
Config Connector Service Name | dataflow.googleapis.com |
Config Connector Resource Fully Qualified Name | dataflowjobs.dataflow.cnrm.cloud.google.com |
Can Be Referenced by IAMPolicy/IAMPolicyMember | No |
Config Connector Default Average Reconcile Interval In Seconds | 600 |
Custom Resource Definition Properties
Annotations
Fields | |
---|---|
cnrm.cloud.google.com/on-delete |
|
cnrm.cloud.google.com/project-id |
|
cnrm.cloud.google.com/skip-wait-on-job-termination |
Spec
Schema
additionalExperiments:
- string
enableStreamingEngine: boolean
ipConfiguration: string
kmsKeyRef:
external: string
name: string
namespace: string
machineType: string
maxWorkers: integer
networkRef:
external: string
name: string
namespace: string
parameters: {}
region: string
resourceID: string
serviceAccountRef:
external: string
name: string
namespace: string
subnetworkRef:
external: string
name: string
namespace: string
tempGcsLocation: string
templateGcsPath: string
transformNameMapping: {}
zone: string
Fields | |
---|---|
Optional |
List of experiments that should be used by the job. An example value is ["enable_stackdriver_agent_metrics"]. |
Optional |
|
Optional |
Indicates if the job should use the streaming engine feature. |
Optional |
The configuration for VM IPs. Options are "WORKER_IP_PUBLIC" or "WORKER_IP_PRIVATE". |
Optional |
The name for the Cloud KMS key for the job. |
Optional |
Allowed value: The `selfLink` field of a `KMSCryptoKey` resource. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Optional |
The machine type to use for the job. |
Optional |
Immutable. The number of workers permitted to work on the job. More workers may improve processing speed at additional cost. |
Optional |
|
Optional |
Allowed value: The `selfLink` field of a `ComputeNetwork` resource. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Optional |
Key/Value pairs to be passed to the Dataflow job (as used in the template). |
Optional |
Immutable. The region in which the created job should run. |
Optional |
Immutable. Optional. The name of the resource. Used for creation and acquisition. When unset, the value of `metadata.name` is used as the default. |
Optional |
|
Optional |
Allowed value: The `email` field of an `IAMServiceAccount` resource. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Optional |
|
Optional |
Allowed value: The `selfLink` field of a `ComputeSubnetwork` resource. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Required |
A writeable location on Google Cloud Storage for the Dataflow job to dump its temporary data. |
Required |
The Google Cloud Storage path to the Dataflow job template. |
Optional |
Only applicable when updating a pipeline. Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job. |
Optional |
Immutable. The zone in which the created job should run. If it is not provided, the provider zone is used. |
Status
Schema
conditions:
- lastTransitionTime: string
message: string
reason: string
status: string
type: string
jobId: string
observedGeneration: integer
state: string
type: string
Fields | |
---|---|
conditions |
Conditions represent the latest available observation of the resource's current state. |
conditions[] |
|
conditions[].lastTransitionTime |
Last time the condition transitioned from one status to another. |
conditions[].message |
Human-readable message indicating details about last transition. |
conditions[].reason |
Unique, one-word, CamelCase reason for the condition's last transition. |
conditions[].status |
Status is the status of the condition. Can be True, False, Unknown. |
conditions[].type |
Type is the type of the condition. |
jobId |
The unique ID of this job. |
observedGeneration |
ObservedGeneration is the generation of the resource that was most recently observed by the Config Connector controller. If this is equal to metadata.generation, then that means that the current reported status reflects the most recent desired state of the resource. |
state |
The current state of the resource, selected from the JobState enum. |
type |
The type of this job, selected from the JobType enum. |
Sample YAML(s)
Batch Dataflow Job
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowJob
metadata:
annotations:
cnrm.cloud.google.com/on-delete: "cancel"
labels:
label-one: "value-one"
name: dataflowjob-sample-batch
spec:
tempGcsLocation: gs://${PROJECT_ID?}-dataflowjob-dep-batch/tmp
# This is a public, Google-maintained Dataflow Job template of a batch job
templateGcsPath: gs://dataflow-templates/2020-02-03-01_RC00/Word_Count
parameters:
# This is a public, Google-maintained text file
inputFile: gs://dataflow-samples/shakespeare/various.txt
output: gs://${PROJECT_ID?}-dataflowjob-dep-batch/output
zone: us-central1-a
machineType: "n1-standard-1"
maxWorkers: 3
ipConfiguration: "WORKER_IP_PUBLIC"
---
apiVersion: storage.cnrm.cloud.google.com/v1beta1
kind: StorageBucket
metadata:
annotations:
cnrm.cloud.google.com/force-destroy: "true"
# StorageBucket names must be globally unique. Replace ${PROJECT_ID?} with your project ID.
name: ${PROJECT_ID?}-dataflowjob-dep-batch
Streaming Dataflow Job
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowJob
metadata:
annotations:
cnrm.cloud.google.com/on-delete: "cancel"
labels:
label-one: "value-one"
name: dataflowjob-sample-streaming
spec:
tempGcsLocation: gs://${PROJECT_ID?}-dataflowjob-dep-streaming/tmp
# This is a public, Google-maintained Dataflow Job template of a streaming job
templateGcsPath: gs://dataflow-templates/2020-02-03-01_RC00/PubSub_to_BigQuery
parameters:
# replace ${PROJECT_ID?} with your project name
inputTopic: projects/${PROJECT_ID?}/topics/dataflowjob-dep-streaming
outputTableSpec: ${PROJECT_ID?}:dataflowjobdepstreaming.dataflowjobdepstreaming
zone: us-central1-a
machineType: "n1-standard-1"
maxWorkers: 3
ipConfiguration: "WORKER_IP_PUBLIC"
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataset
metadata:
name: dataflowjobdepstreaming
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryTable
metadata:
name: dataflowjobdepstreaming
spec:
datasetRef:
name: dataflowjobdepstreaming
---
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubTopic
metadata:
name: dataflowjob-dep-streaming
---
apiVersion: storage.cnrm.cloud.google.com/v1beta1
kind: StorageBucket
metadata:
annotations:
cnrm.cloud.google.com/force-destroy: "true"
# StorageBucket names must be globally unique. Replace ${PROJECT_ID?} with your project ID.
name: ${PROJECT_ID?}-dataflowjob-dep-streaming