[[["容易理解","easyToUnderstand","thumb-up"],["確實解決了我的問題","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["難以理解","hardToUnderstand","thumb-down"],["資訊或程式碼範例有誤","incorrectInformationOrSampleCode","thumb-down"],["缺少我需要的資訊/範例","missingTheInformationSamplesINeed","thumb-down"],["翻譯問題","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["上次更新時間:2025-08-18 (世界標準時間)。"],[[["\u003cp\u003eDataflow is a fully managed Google Cloud service for unified stream and batch data processing at scale, allowing users to create data pipelines that read from sources, transform the data, and write to a destination.\u003c/p\u003e\n"],["\u003cp\u003eCommon use cases for Dataflow include data movement, ETL workflows for data warehousing, powering BI dashboards, applying machine learning to real-time streaming data, and processing sensor or log data at scale.\u003c/p\u003e\n"],["\u003cp\u003eDataflow uses the same programming model for both batch and streaming analytics, guarantees exactly-once processing by default, and can achieve very low latency with the ability to ingest and process fluctuating volumes of real-time data.\u003c/p\u003e\n"],["\u003cp\u003eBuilt on the open-source Apache Beam project, Dataflow offers portability by allowing users to write pipelines in multiple languages and execute them on different platforms without rewriting code, alongside offering scalability by automatically adjusting resources based on workload demands.\u003c/p\u003e\n"],["\u003cp\u003eDataflow offers multiple ways to create and run pipelines, such as using Apache Beam SDKs, deploying pre-built templates (including Google-provided ones), and utilizing JupyterLab notebooks for iterative development, along with a detailed monitoring interface to track progress and identify issues.\u003c/p\u003e\n"]]],[],null,["Dataflow is a Google Cloud service that provides unified\nstream and batch data processing at scale. Use Dataflow to\ncreate data pipelines that read from one or more sources, transform the data,\nand write the data to a destination.\n\nTypical use cases for Dataflow include the following:\n\n- Data movement: Ingesting data or replicating data across subsystems.\n- [ETL](/learn/what-is-etl) (extract-transform-load) workflows that ingest data into a data warehouse such as BigQuery.\n- Powering BI dashboards.\n- Applying ML in real time to streaming data.\n- Processing sensor data or log data at scale.\n\nDataflow uses the same programming model for both batch and\nstream analytics. Streaming pipelines can achieve very low latency. You can\ningest, process, and analyze fluctuating volumes of real-time data. By default,\nDataflow guarantees\n[exactly-once processing](/dataflow/docs/concepts/exactly-once) of every record.\nFor streaming pipelines that can tolerate duplicates, you can often reduce cost\nand improve latency by enabling\n[at-least-once mode](/dataflow/docs/guides/streaming-modes).\n\nAdvantages of Dataflow\n\nThis section describes some of the advantages of using Dataflow.\n\nManaged\n\nDataflow is a fully managed service. That means Google manages\nall of the resources needed to run Dataflow. When you run a\nDataflow job, the Dataflow service allocates a\npool of worker VMs to execute the pipeline. You don't need to provision or\nmanage these VMs. When the job completes or is cancelled,\nDataflow automatically deletes the VMs. You're billed for the\ncompute resources that your job uses. For more information about costs, see\n[Dataflow pricing](/dataflow/pricing).\n\nScalable\n\nDataflow is designed to support batch and streaming pipelines at\nlarge scale. Data is processed in parallel, so the work is distributed across\nmultiple VMs.\n\nDataflow can autoscale by provisioning extra worker VMs, or by\nshutting down some worker VMs if fewer are needed. It also optimizes the work,\nbased on the characteristics of the pipeline. For example,\nDataflow can\n[dynamically rebalance work](/dataflow/docs/dynamic-work-rebalancing) among the\nVMs, so that parallel work completes more efficiently.\n\nPortable\n\nDataflow is built on the open source\n[Apache Beam](https://beam.apache.org/) project.\nApache Beam lets you write pipelines using a language-specific SDK.\nApache Beam supports Java, Python, and Go SDKs, as well as\n[multi-language pipelines](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines).\n\nDataflow executes Apache Beam pipelines. If you decide later\nto run your pipeline on a different platform, such as Apache Flink or Apache\nSpark, you can do so without rewriting the pipeline code.\n\nFlexible\n\nYou can use Dataflow for relatively simple pipelines, such as\nmoving data. However, it's also suitable for more advanced applications, such as\nreal-time streaming analytics. A solution built on Dataflow can\ngrow with your needs as you move from batch to streaming or encounter more\nadvanced use cases.\n\nDataflow supports several different ways to create and execute\npipelines, depending on your needs:\n\n- Write code using the Apache Beam SDKs.\n\n- Deploy a [Dataflow template](/dataflow/docs/concepts/dataflow-templates).\n Templates let you run predefined pipelines. For example, a developer can\n create a template, and then a data scientist can deploy it on demand.\n\n Google also provides a\n [library](/dataflow/docs/guides/templates/provided-templates) of templates for\n common scenarios. You can deploy these templates without knowing any\n Apache Beam programming concepts.\n- Use [JupyterLab notebooks](/dataflow/docs/guides/interactive-pipeline-development)\n to develop and run pipelines iteratively.\n\nObservable\n\nYou can monitor the status of your Dataflow jobs through the\n[Dataflow monitoring interface](/dataflow/docs/guides/monitoring-overview)\nin the Google Cloud console. The monitoring interface includes a graphical\nrepresentation of your pipeline, showing the progress and\n[execution details](/dataflow/docs/concepts/execution-details) of each pipeline\nstage. The monitoring interface makes it easier to spot problems such as\nbottlenecks or high latency. You can also\n[profile](/dataflow/docs/guides/profiling-a-pipeline) your\nDataflow jobs to monitor CPU usage and memory allocation.\n\nHow it works\n\nDataflow uses a data pipeline model, where data moves through a\nseries of stages. Stages can include reading data from a source, transforming\nand aggregating the data, and writing the results to a destination.\n\nPipelines can range from very simple to more complex processing. For example, a\npipeline might do the following:\n\n- Move data as-is to a destination.\n- Transform data to be more useable by the target system.\n- Aggregate, process, and enrich data for analysis.\n- Join data with other data.\n\nA pipeline that is defined in Apache Beam does not specify *how* the\npipeline is executed. Running the pipeline is the job of a\n[*runner*](https://beam.apache.org/documentation/basics/#runner).\nThe purpose of a runner is to run an Apache Beam pipeline on a specific\nplatform. Apache Beam supports multiple runners, including a\n[Dataflow runner](https://beam.apache.org/documentation/runners/dataflow/).\n\nTo use Dataflow with your Apache Beam pipelines, specify the\nDataflow runner. The runner uploads\nyour executable code and dependencies to a Cloud Storage bucket and creates a\nDataflow *job*. Dataflow then allocates a pool of\nVMs to execute the pipeline.\n\nThe following diagram shows a typical ETL and BI solution using\nDataflow and other Google Cloud services:\n\nThis diagram shows the following stages:\n\n1. Pub/Sub ingests data from an external system.\n2. Dataflow reads the data from Pub/Sub and writes it to BigQuery. During this stage, Dataflow might transform or aggregate the data.\n3. BigQuery acts as a data warehouse, allowing data analysts to run ad hoc queries on the data.\n4. Looker provides real-time BI insights from the data stored in BigQuery.\n\nFor basic data movement scenarios, you might run a Google-provided\ntemplate. Some templates support user-defined functions (UDFs) written in\nJavaScript. UDFs let you add custom processing logic to a template. For more\ncomplex pipelines, start with the Apache Beam SDK.\n\nWhat's next\n\n- For more information about Apache Beam, see [Programming model for Apache Beam](/dataflow/docs/concepts/beam-programming-model).\n- Create your first pipeline by following the [Job builder quickstart](/dataflow/docs/quickstarts/create-pipeline-job-builder) or [Dataflow template quickstart](/dataflow/docs/quickstarts/create-streaming-pipeline-template).\n- Learn how to [use Apache Beam to build pipelines](/dataflow/docs/guides/use-beam)."]]