DataflowFlexTemplateJob

Property	Value
Google Cloud Service Name	Cloud Dataflow
Google Cloud Service Documentation	/dataflow/docs/
Google Cloud REST Resource Name	v1b3.projects.jobs
Google Cloud REST Resource Documentation	/dataflow/docs/reference/rest/v1b3/projects.jobs
Config Connector Resource Short Names	gcpdataflowflextemplatejob gcpdataflowflextemplatejobs dataflowflextemplatejob
Config Connector Service Name	dataflow.googleapis.com
Config Connector Resource Fully Qualified Name	dataflowflextemplatejobs.dataflow.cnrm.cloud.google.com
Can Be Referenced by IAMPolicy/IAMPolicyMember	No
Config Connector Default Average Reconcile Interval In Seconds	600

Custom Resource Definition Properties

Annotations

Fields
`cnrm.cloud.google.com/on-delete`
`cnrm.cloud.google.com/project-id`
`cnrm.cloud.google.com/skip-wait-on-job-termination`

Spec

Schema

additionalExperiments:
- string
autoscalingAlgorithm: string
containerSpecGcsPath: string
enableStreamingEngine: boolean
ipConfiguration: string
kmsKeyNameRef:
  external: string
  name: string
  namespace: string
launcherMachineType: string
machineType: string
maxWorkers: integer
networkRef:
  external: string
  name: string
  namespace: string
numWorkers: integer
parameters: {}
region: string
sdkContainerImage: string
serviceAccountEmailRef:
  external: string
  name: string
  namespace: string
stagingLocation: string
subnetworkRef:
  external: string
  name: string
  namespace: string
tempLocation: string
transformNameMapping: {}

Fields

Fields
`additionalExperiments` Optional	`list (string)` Additional experiment flags for the job.
`additionalExperiments[]` Optional	`string`
`autoscalingAlgorithm` Optional	`string` The algorithm to use for autoscaling
`containerSpecGcsPath` Required	`string` Cloud Storage path to a file with json serialized ContainerSpec as content.
`enableStreamingEngine` Optional	`boolean` Whether to enable Streaming Engine for the job.
`ipConfiguration` Optional	`string` Configuration for VM IPs.
`kmsKeyNameRef` Optional	`object` The Cloud KMS key for the job.
`kmsKeyNameRef.external` Optional	`string` A reference to an externally managed KMSCryptoKey. Should be in the format `projects/[kms_project_id]/locations/[region]/keyRings/[key_ring_id]/cryptoKeys/[key]`.
`kmsKeyNameRef.name` Optional	`string` The `name` of a `KMSCryptoKey` resource.
`kmsKeyNameRef.namespace` Optional	`string` The `namespace` of a `KMSCryptoKey` resource.
`launcherMachineType` Optional	`string` The machine type to use for launching the job. The default is n1-standard-1.
`machineType` Optional	`string` The machine type to use for the job. Defaults to the value from the template if not specified.
`maxWorkers` Optional	`integer` The maximum number of Google Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000.
`networkRef` Optional	`object` Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".
`networkRef.external` Optional	`string` A reference to an externally managed Compute Network resource. Should be in the format `projects/{{projectID}}/global/networks/{{network}}`.
`networkRef.name` Optional	`string` The `name` field of a `ComputeNetwork` resource.
`networkRef.namespace` Optional	`string` The `namespace` field of a `ComputeNetwork` resource.
`numWorkers` Optional	`integer` The initial number of Google Compute Engine instances for the job.
`parameters` Optional	`object` The parameters for FlexTemplate. Ex. {"num_workers":"5"}
`region` Optional	`string` Immutable. The region in which the created job should run.
`sdkContainerImage` Optional	`string` Docker registry location of container image to use for the 'worker harness. Default is the container for the version of the SDK. Note this field is only valid for portable pipelines.
`serviceAccountEmailRef` Optional	`object` The email address of the service account to run the job as.
`serviceAccountEmailRef.external` Optional	`string` The `email` field of an `IAMServiceAccount` resource.
`serviceAccountEmailRef.name` Optional	`string` Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
`serviceAccountEmailRef.namespace` Optional	`string` Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
`stagingLocation` Optional	`string` The Cloud Storage path for staging local files. Must be a valid Cloud Storage URL, beginning with `gs://`.
`subnetworkRef` Optional	`object` Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL.
`subnetworkRef.external` Optional	`string` The ComputeSubnetwork selflink of form "projects/{{project}}/regions/{{region}}/subnetworks/{{name}}", when not managed by Config Connector.
`subnetworkRef.name` Optional	`string` The `name` field of a `ComputeSubnetwork` resource.
`subnetworkRef.namespace` Optional	`string` The `namespace` field of a `ComputeSubnetwork` resource.
`tempLocation` Optional	`string` The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with `gs://`.
`transformNameMapping` Optional	`object` Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job. Only applicable when updating a pipeline.

additionalExperiments

Optional

list (string)

Additional experiment flags for the job.

additionalExperiments[]

Optional

string

autoscalingAlgorithm

Optional

string

The algorithm to use for autoscaling

containerSpecGcsPath

Required

string

Cloud Storage path to a file with json serialized ContainerSpec as content.

enableStreamingEngine

Optional

boolean

Whether to enable Streaming Engine for the job.

ipConfiguration

Optional

string

Configuration for VM IPs.

kmsKeyNameRef

Optional

object

The Cloud KMS key for the job.

kmsKeyNameRef.external

Optional

string

A reference to an externally managed KMSCryptoKey. Should be in the format `projects/[kms_project_id]/locations/[region]/keyRings/[key_ring_id]/cryptoKeys/[key]`.

kmsKeyNameRef.name

Optional

string

The `name` of a `KMSCryptoKey` resource.

kmsKeyNameRef.namespace

Optional

string

The `namespace` of a `KMSCryptoKey` resource.

launcherMachineType

Optional

string

The machine type to use for launching the job. The default is n1-standard-1.

machineType

Optional

string

The machine type to use for the job. Defaults to the value from the template if not specified.

maxWorkers

Optional

integer

The maximum number of Google Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000.

networkRef

Optional

object

Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".

networkRef.external

Optional

string

A reference to an externally managed Compute Network resource. Should be in the format `projects/{{projectID}}/global/networks/{{network}}`.

networkRef.name

Optional

string

The `name` field of a `ComputeNetwork` resource.

networkRef.namespace

Optional

string

The `namespace` field of a `ComputeNetwork` resource.

numWorkers

Optional

integer

The initial number of Google Compute Engine instances for the job.

parameters

Optional

object

The parameters for FlexTemplate. Ex. {"num_workers":"5"}

region

Optional

string

Immutable. The region in which the created job should run.

sdkContainerImage

Optional

string

Docker registry location of container image to use for the 'worker harness. Default is the container for the version of the SDK. Note this field is only valid for portable pipelines.

serviceAccountEmailRef

Optional

object

The email address of the service account to run the job as.

serviceAccountEmailRef.external

Optional

string

The `email` field of an `IAMServiceAccount` resource.

serviceAccountEmailRef.name

Optional

string

Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names

serviceAccountEmailRef.namespace

Optional

string

Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

stagingLocation

Optional

string

The Cloud Storage path for staging local files. Must be a valid Cloud Storage URL, beginning with `gs://`.

subnetworkRef

Optional

object

Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL.

subnetworkRef.external

Optional

string

The ComputeSubnetwork selflink of form "projects/{{project}}/regions/{{region}}/subnetworks/{{name}}", when not managed by Config Connector.

subnetworkRef.name

Optional

string

The `name` field of a `ComputeSubnetwork` resource.

subnetworkRef.namespace

Optional

string

The `namespace` field of a `ComputeSubnetwork` resource.

tempLocation

Optional

string

The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with `gs://`.

transformNameMapping

Optional

object

Map of transform name prefixes of the job to be replaced with the corresponding name prefixes of the new job. Only applicable when updating a pipeline.

Status

Schema

conditions:
- lastTransitionTime: string
  message: string
  reason: string
  status: string
  type: string
jobId: string
observedGeneration: integer
state: string
type: string

Fields
`conditions`	`list (object)` Conditions represent the latest available observations of the object's current state.
`conditions[]`	`object`
`conditions[].lastTransitionTime`	`string` Last time the condition transitioned from one status to another.
`conditions[].message`	`string` Human-readable message indicating details about last transition.
`conditions[].reason`	`string` Unique, one-word, CamelCase reason for the condition's last transition.
`conditions[].status`	`string` Status is the status of the condition. Can be True, False, Unknown.
`conditions[].type`	`string` Type is the type of the condition.
`jobId`	`string`
`observedGeneration`	`integer` ObservedGeneration is the generation of the resource that was most recently observed by the Config Connector controller. If this is equal to metadata.generation, then that means that the current reported status reflects the most recent desired state of the resource.
`state`	`string` The current state of the job. Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise specified. A job in the `JOB_STATE_RUNNING` state may asynchronously enter a terminal state. After a job has reached a terminal state, no further state updates may be made. This field may be mutated by the Cloud Dataflow service; callers cannot mutate it.
`type`	`string` The type of Cloud Dataflow job.

Sample YAML(s)

Batch Dataflow Flex Template Job

# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowFlexTemplateJob
metadata:
  annotations:
    cnrm.cloud.google.com/on-delete: "cancel"
  name: dataflowflextemplatejob-sample-batch
spec:
  region: us-central1
  # This is a public, Google-maintained Dataflow Job flex template of a batch job
  containerSpecGcsPath: gs://dataflow-templates/2022-10-03-00_RC00/flex/File_Format_Conversion
  parameters:
    inputFileFormat: csv
    outputFileFormat: avro
    # This is a public, Google-maintained csv file expressly for this sample.
    inputFileSpec: gs://config-connector-samples/dataflowflextemplate/numbertest.csv
    # Replace ${PROJECT_ID?} with your project ID.
    outputBucket: gs://${PROJECT_ID?}-dataflowflextemplatejob-dep-batch
    # This is a public, Google-maintained Avro schema file expressly for this sample.
    schema: gs://config-connector-samples/dataflowflextemplate/numbers.avsc
---
apiVersion: storage.cnrm.cloud.google.com/v1beta1
kind: StorageBucket
metadata:
  # StorageBucket names must be globally unique. Replace ${PROJECT_ID?} with your project ID.
  name: ${PROJECT_ID?}-dataflowflextemplatejob-dep-batch

Streaming Dataflow Flex Template Job

# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: dataflow.cnrm.cloud.google.com/v1beta1
kind: DataflowFlexTemplateJob
metadata:
  annotations:
    cnrm.cloud.google.com/on-delete: "drain"
  name: dataflowflextemplatejob-sample-streaming
spec:
  region: us-central1
  # This is a public, Google-maintained Dataflow Job flex template of a streaming job
  containerSpecGcsPath: gs://dataflow-templates/2020-08-31-00_RC00/flex/PubSub_Avro_to_BigQuery
  parameters:
    # This is a public, Google-maintained Avro schema file expressly for this sample.
    schemaPath: gs://config-connector-samples/dataflowflextemplate/numbers.avsc
    # Replace ${PROJECT_ID?} with your project ID.
    inputSubscription: projects/${PROJECT_ID?}/subscriptions/dataflowflextemplatejob-dep-streaming
    outputTopic: projects/${PROJECT_ID?}/topics/dataflowflextemplatejob-dep1-streaming
    outputTableSpec: ${PROJECT_ID?}:dataflowflextemplatejobdepstreaming.dataflowflextemplatejobdepstreaming
    createDisposition: CREATE_NEVER
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataset
metadata:
  name: dataflowflextemplatejobdepstreaming
spec:
  location: us-central1
---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryTable
metadata:
  name: dataflowflextemplatejobdepstreaming
spec:
  datasetRef:
    name: dataflowflextemplatejobdepstreaming
---
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubSubscription
metadata:
  name: dataflowflextemplatejob-dep-streaming
spec:
  topicRef:
    name: dataflowflextemplatejob-dep0-streaming
---
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubTopic
metadata:
  name: dataflowflextemplatejob-dep0-streaming
---
apiVersion: pubsub.cnrm.cloud.google.com/v1beta1
kind: PubSubTopic
metadata:
  name: dataflowflextemplatejob-dep1-streaming