[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2024-12-23 UTC。"],[],[],null,["# Dataflow components\n\nThe Dataflow components let you submit Apache Beam jobs to\nDataflow for execution. In Dataflow, a\n[`Job`](/dataflow/docs/reference/rest/v1b3/projects.jobs#Job)\nresource represents a Dataflow job.\n\nThe Google Cloud SDK includes the\nfollowing operators for creating `Job` resources and monitor their execution:\n\n\n- [`DataflowFlexTemplateJobOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataflow.html#preview.dataflow.DataflowFlexTemplateJobOp)\n- [`DataflowPythonJobOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp)\n\n\u003cbr /\u003e\n\nAdditionally, the Google Cloud SDK includes the\n[`WaitGcpResourcesOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/wait_gcp_resources.html#v1.wait_gcp_resources.WaitGcpResourcesOp)\ncomponent, which you can use to mitigate costs while running\nDataflow jobs.\n\n`DataflowFlexTemplateJobOp`\n---------------------------\n\nThe [`DataflowFlexTemplateJobOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataflow.html#v1.dataflow.DataflowFlexTemplateJobOp)\noperator lets you create a\nVertex AI Pipelines component to launch a\n[Dataflow Flex Template](/dataflow/docs/guides/templates/using-flex-templates).\n\nIn Dataflow, a [`LaunchFlexTemplateParameter`](/dataflow/docs/reference/rest/v1b3/projects.locations.flexTemplates/launch#LaunchFlexTemplateParameter)\nresource represents a Flex Template to launch. This component creates a\n`LaunchFlexTemplateParameter` resource and then requests Dataflow to\ncreate a job by launching the template. If the template is launched\nsuccessfully, Dataflow returns a [`Job`](/dataflow/docs/reference/rest/v1b3/projects.jobs#Job)\nresource.\n\nThe Dataflow Flex Template component terminates upon receiving a `Job`\nresource from Dataflow. The component outputs a `job_id` as a\n[serialized `gcp_resources` proto](https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md#usage). You\ncan pass this parameter to a [`WaitGcpResourcesOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/wait_gcp_resources.html#v1.wait_gcp_resources.WaitGcpResourcesOp)\ncomponent, to wait for the Dataflow job to complete.\n\n`DataflowPythonJobOp`\n---------------------\n\nThe [`DataflowPythonJobOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp)\noperator lets you create a Vertex AI Pipelines component that prepares\ndata by submitting a Python-based Apache Beam job to Dataflow for\nexecution.\n\nThe Python code of the Apache Beam job runs with Dataflow Runner.\nWhen you run your pipeline with the Dataflow service, the runner\nuploads your executable code to the location specified by the `python_module_path` parameter\nand dependencies to a Cloud Storage bucket (specified by `temp_location`), and then creates a\nDataflow job that executes your Apache Beam pipeline on managed resources in Google Cloud.\n\nTo learn more about the Dataflow Runner, see\n[Using the Dataflow Runner](https://beam.apache.org/documentation/runners/dataflow/).\n\nThe Dataflow Python component accepts a list of arguments\nthat are passed using the Beam Runner to your Apache Beam\ncode. These arguments are specified by `args`. For example, you can use these\narguments to set the\n[`apache_beam.options.pipeline_options`](https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.options.pipeline_options.html#apache_beam.options.pipeline_options.PipelineOptions) to\nspecify a network, a subnetwork, customer-managed encryption key (CMEK), and\nother options when you run Dataflow jobs.\n\n`WaitGcpResourcesOp`\n--------------------\n\nDataflow jobs can often take long time to complete. The costs of\na `busy-wait` container (the container that launches Dataflow job and\nwait for the result) can become expensive.\n\nAfter submitting the Dataflow job using the Beam runner,\nthe [`DataflowPythonJobOp`](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp)\ncomponent terminates immediately and returns a `job_id` output parameter as a\n[serialized `gcp_resources` proto](https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md#usage). You\ncan pass this parameter to a `WaitGcpResourcesOp` component, to wait for the\nDataflow job to complete. \n\n```python\n dataflow_python_op = DataflowPythonJobOp(\n project=project_id,\n location=location,\n python_module_path=python_file_path,\n temp_location = staging_dir,\n requirements_file_path = requirements_file_path,\n args = ['--output', OUTPUT_FILE],\n )\n \n dataflow_wait_op = WaitGcpResourcesOp(\n gcp_resources = dataflow_python_op.outputs[\"gcp_resources\"]\n )\n```\n\nVertex AI Pipelines optimizes the `WaitGcpResourcesOp` to execute it in a\nserverless fashion, and has zero cost.\n\nIf `DataflowPythonJobOp` and `DataflowFlexTemplateJobOp` don't meet your\nrequirements, you can also create your own component that outputs the\n`gcp_resources` parameter and pass it to the `WaitGcpResourcesOp` component.\n\nFor more information about how to create `gcp_resources` output parameter, see\n[Write a component to show a Google Cloud console link](/vertex-ai/docs/pipelines/build-own-components#show-console-link).\n\nAPI reference\n-------------\n\n- For component reference, see the\n [Google Cloud SDK reference for Dataflow components](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/api/v1/dataflow.html).\n\n- For Dataflow resource reference, see the following API reference pages:\n\n - [`LaunchFlexTemplateParameter`](/dataflow/docs/reference/rest/v1b3/projects.locations.flexTemplates/launch#LaunchFlexTemplateParameter) resource\n\n - [`Job`](/dataflow/docs/reference/rest/v1b3/projects.jobs#Job) resource\n\n### Tutorials\n\n- [Get started with the Dataflow Flex Template component](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/fe7d3e4b8edc137d90ec061789b879b7cc8d3854/notebooks/community/ml_ops/stage3/get_started_with_dataflow_flex_template_component.ipynb)\n- [Get started with the Dataflow Python Job component](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/139d805c9fac45f3a663d4f4651fbca4bb0932b7/notebooks/community/ml_ops/stage3/get_started_with_dataflow_pipeline_components.ipynb)\n- [Specify a network and subnetwork](/dataflow/docs/guides/specifying-networks#network_parameter)\n- [Using customer-managed encryption keys (CMEK)](/dataflow/docs/guides/customer-managed-encryption-keys)\n\nVersion history and release notes\n---------------------------------\n\nTo learn more about the version history and changes to the Google Cloud Pipeline Components SDK, see the [Google Cloud Pipeline Components SDK Release Notes](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.19.0/release.html).\n\nTechnical support contacts\n--------------------------\n\nIf you have any questions, reach out to\n[kubeflow-pipelines-components@google.com](mailto: kubeflow-pipelines-components@google.com)."]]