[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-09-04 (世界標準時間)。"],[[["\u003cp\u003eMachine learning (ML) workflows often involve multiple steps that form a pipeline, including data pre- and post-processing.\u003c/p\u003e\n"],["\u003cp\u003eML pipelines can be constructed using orchestration frameworks like TensorFlow Extended (TFX) or Kubeflow Pipelines (KFP), or by building custom components.\u003c/p\u003e\n"],["\u003cp\u003eVertex AI Pipelines can be used to manage and orchestrate ML workflows, including those defined by TFX or KFP, while also tracking ML artifacts.\u003c/p\u003e\n"],["\u003cp\u003eKFP provides a way to create reusable, end-to-end ML workflows, with Dataflow integration that allows the use of \u003ccode\u003eDataflowPythonJobOP\u003c/code\u003e or \u003ccode\u003eDataflowFlexTemplateJobOp\u003c/code\u003e operators.\u003c/p\u003e\n"],["\u003cp\u003eTFX pipelines can use Apache Beam and Dataflow without additional configuration, as TFX data processing libraries already utilize Apache Beam directly.\u003c/p\u003e\n"]]],[],null,["# Dataflow ML in ML workflows\n\nTo orchestrate complex machine learning workflows, you can create frameworks that\ninclude data pre- and post-processing steps. You might need to pre-process data\nbefore you can use it to train your model or to post-process data to transform the\noutput of your model.\n\nML workflows often contain many steps that together form a pipeline.\nTo build your machine learning pipeline, you can use one of the following\nmethods.\n\n- Use an orchestration framework that has a built-in integration with Apache Beam and the Dataflow runner, such as TensorFlow Extended (TFX) or Kubeflow Pipelines (KFP). This option is the least complex.\n- Build a custom component in a [Dataflow template](/dataflow/docs/concepts/dataflow-templates) and then call the template from your ML pipeline. The call contains your Apache Beam code.\n- Build a custom component to use in your ML pipeline and put the Python code directly in the component. You define a custom Apache Beam pipeline and use the Dataflow runner within the custom component. This option is the most complex and requires you to manage pipeline dependencies.\n\nAfter you create your machine learning pipeline, you can use an orchestrator to\nchain together the components to create an end-to-end machine learning\nworkflow. To orchestrate the components, you can use a managed service, such as\n[Vertex AI Pipelines](/vertex-ai/docs/pipelines/introduction).\n\nWorkflow orchestration use cases are described in the following sections.\n\n- [I want to use Dataflow with Vertex AI Pipelines](#vertex)\n- [I want to use Dataflow with KFP](#kfp)\n- [I want to use Dataflow with TFX](#tfx)\n\nBoth\n[TFX](https://www.tensorflow.org/tfx)\nand\n[Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/)\n(KFP) use Apache Beam components.\n\nI want to use Dataflow with Vertex AI Pipelines\n-----------------------------------------------\n\nVertex AI Pipelines help you to automate, monitor, and govern your ML\nsystems by orchestrating your ML workflows in a serverless manner. You can use\nVertex AI Pipelines to orchestrate workflow directed acyclic graphs (DAGs)\ndefined by either TFX or\nKFP and to automatically track your ML artifacts using Vertex ML Metadata.\nTo learn how to incorporate Dataflow with TFX and KFP, use the\ninformation in the following sections.\n\n- [Kubeflow Pipelines](#kfp)\n- [TFX pipeline](#tfx)\n\nI want to use Dataflow with Kubeflow Pipelines\n----------------------------------------------\n\nKubeflow is an ML toolkit dedicated to making deployments\nof ML workflows on Kubernetes simple, portable, and scalable. Kubeflow Pipelines\nare reusable end-to-end ML workflows built using the\n[Kubeflow Pipelines SDK](https://www.kubeflow.org/docs/components/pipelines/v1/sdk/sdk-overview/).\n\nThe Kubeflow Pipelines service aims to provide end-to-end orchestration and to\nfacilitate experimentation and reuse. With KFP, you can experiment with orchestration\ntechniques and manage your tests, and you can reuse components and pipelines to\ncreate multiple end-to-end solutions without starting over each time.\n\nWhen using Dataflow with KFP, you can use the\n[`DataflowPythonJobOP`](/vertex-ai/docs/pipelines/dataflow-component#dataflowpythonjobop)\noperator or the\n[`DataflowFlexTemplateJobOp`](/vertex-ai/docs/pipelines/dataflow-component#dataflowflextemplatejobop_preview)\noperator. You can also build a fully custom component. We recommend using the\n`DataflowPythonJobOP` operator.\n\nIf you want to build a fully custom component, see the\n[Dataflow components](/vertex-ai/docs/pipelines/dataflow-component) page\nin the Vertex AI documentation.\n\nI want to use Dataflow with TFX\n-------------------------------\n\nTFX pipeline components are built on TFX libraries,\nand the data processing libraries use Apache Beam directly. For example,\nTensorFlow Transform translates the user's calls to Apache Beam.\nTherefore, you can use Apache Beam and Dataflow with\nTFX pipelines without needing to do extra configuration work.\nTo use TFX with Dataflow, when you build your\nTFX pipeline, use the Dataflow\nrunner. For more information, see the following resources:\n\n- [Apache Beam and TFX](https://www.tensorflow.org/tfx/guide/beam)\n- [TensorFlow Extended (TFX): Using Apache Beam for large scale data processing](https://blog.tensorflow.org/2020/03/tensorflow-extended-tfx-using-apache-beam-large-scale-data-processing.html)"]]