Property | Value |
---|---|
Google Cloud Service Name | Cloud Dataflow |
Google Cloud Service Documentation | /dataflow/docs/ |
Google Cloud REST Resource Name | v1b3.projects.jobs |
Google Cloud REST Resource Documentation | /dataflow/docs/reference/rest/v1b3/projects.jobs |
Config Connector Resource Short Names | gcpdataflowjob gcpdataflowjobs dataflowjob |
Config Connector Service Name | dataflow.googleapis.com |
Config Connector Resource Fully Qualified Name | dataflowjobs.dataflow.cnrm.cloud.google.com |
Can Be Referenced by IAMPolicy/IAMPolicyMember | No |
Custom Resource Definition Properties
Annotations
Fields | |
---|---|
cnrm.cloud.google.com/on-delete |
|
cnrm.cloud.google.com/project-id |
Spec
Schema
additionalExperiments:
- string
ipConfiguration: string
machineType: string
maxWorkers: integer
networkRef:
external: string
name: string
namespace: string
parameters: {}
region: string
serviceAccountRef:
external: string
name: string
namespace: string
subnetworkRef:
external: string
name: string
namespace: string
tempGcsLocation: string
templateGcsPath: string
transformNameMapping: {}
zone: string
Fields | |
---|---|
Optional |
List of experiments that should be used by the job. An example value is ["enable_stackdriver_agent_metrics"]. |
Optional |
|
Optional |
The configuration for VM IPs. Options are "WORKER_IP_PUBLIC" or "WORKER_IP_PRIVATE". |
Optional |
The machine type to use for the job. |
Optional |
Immutable. The number of workers permitted to work on the job. More workers may improve processing speed at additional cost. |
Optional |
|
Optional |
The selfLink of a ComputeNetwork. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Optional |
Key/Value pairs to be passed to the Dataflow job (as used in the template). |
Optional |
Immutable. The region in which the created job should run. |
Optional |
|
Optional |
The email of an IAMServiceAccount. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Optional |
|
Optional |
The selfLink of a ComputeSubnetwork. |
Optional |
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names |
Optional |
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/ |
Required |
A writeable location on Google Cloud Storage for the Dataflow job to dump its temporary data. |
Required |
The Google Cloud Storage path to the Dataflow job template. |
Optional |
Only applicable when updating a pipeline. Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job. |
Optional |
Immutable. The zone in which the created job should run. If it is not provided, the provider zone is used. |
Status
Schema
conditions:
- lastTransitionTime: string
message: string
reason: string
status: string
type: string
jobId: string
state: string
type: string
Fields | |
---|---|
conditions |
Conditions represents the latest available observation of the resource's current state. |
conditions.[] |
|
conditions.[].lastTransitionTime |
Last time the condition transitioned from one status to another. |
conditions.[].message |
Human-readable message indicating details about last transition. |
conditions.[].reason |
Unique, one-word, CamelCase reason for the condition's last transition. |
conditions.[].status |
Status is the status of the condition. Can be True, False, Unknown. |
conditions.[].type |
Type is the type of the condition. |
jobId |
The unique ID of this job. |
state |
The current state of the resource, selected from the JobState enum. |
type |
The type of this job, selected from the JobType enum. |
Sample YAML(s)
Batch Dataflow Job
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowJob
metadata:
annotations:
cnrm.cloud.google.com/on-delete: "cancel"
labels:
label-one: "value-one"
name: dataflowjob-sample-batch
spec:
tempGcsLocation: gs://${PROJECT_ID?}-dataflowjob-dep-batch/tmp
# This is a public, Google-maintained Dataflow Job template of a batch job
templateGcsPath: gs://dataflow-templates/2020-02-03-01_RC00/Word_Count
parameters:
# This is a public, Google-maintained text file
inputFile: gs://dataflow-samples/shakespeare/various.txt
output: gs://${PROJECT_ID?}-dataflowjob-dep-batch/output
zone: us-central1-a
machineType: "n1-standard-1"
maxWorkers: 3
ipConfiguration: "WORKER_IP_PUBLIC"
---
apiVersion: storage.cnrm.cloud.google.com/v1beta1
kind: StorageBucket
metadata:
annotations:
cnrm.cloud.google.com/force-destroy: "true"
# StorageBucket names must be globally unique. Replace ${PROJECT_ID?} with your project ID.
name: ${PROJECT_ID?}-dataflowjob-dep-batch
Streaming Dataflow Job
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowJob
metadata:
annotations:
cnrm.cloud.google.com/on-delete: "cancel"
labels:
label-one: "value-one"
name: dataflowjob-sample-streaming
spec:
tempGcsLocation: gs://${PROJECT_ID?}-dataflowjob-dep-streaming/tmp
# This is a public, Google-maintained Dataflow Job template of a streaming job
templateGcsPath: gs://dataflow-templates/2020-02-03-01_RC00/PubSub_to_BigQuery
parameters:
# replace ${PROJECT_ID?} with your project name
inputTopic: projects/${PROJECT_ID?}/topics/dataflowjob-dep-streaming
outputTableSpec: ${PROJECT_ID?}:dataflowjobdepstreaming.dataflowjobdepstreaming
zone: us-central1-a
machineType: "n1-standard-1"
maxWorkers: 3
ipConfiguration: "WORKER_IP_PUBLIC"
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataset
metadata:
name: dataflowjobdepstreaming
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryTable
metadata:
name: dataflowjobdepstreaming
spec:
datasetRef:
name: dataflowjobdepstreaming
---
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubTopic
metadata:
name: dataflowjob-dep-streaming
---
apiVersion: storage.cnrm.cloud.google.com/v1beta1
kind: StorageBucket
metadata:
annotations:
cnrm.cloud.google.com/force-destroy: "true"
# StorageBucket names must be globally unique. Replace ${PROJECT_ID?} with your project ID.
name: ${PROJECT_ID?}-dataflowjob-dep-streaming