[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-18。"],[[["\u003cp\u003eUnbounded collections in streaming pipelines, unlike batch pipelines, handle continuously updating data from sources like Pub/Sub.\u003c/p\u003e\n"],["\u003cp\u003eWindowing, watermarks, and triggers are crucial for managing and aggregating elements in unbounded collections, which cannot be grouped by keys alone due to the constant influx of data.\u003c/p\u003e\n"],["\u003cp\u003eStreaming mode in Dataflow is activated via the \u003ccode\u003e--streaming\u003c/code\u003e flag, enabling processing of continuously updated data, but it does not support batch sources, as those have an endpoint to their data.\u003c/p\u003e\n"],["\u003cp\u003eWindowing functions divide unbounded collections into logical, time-based segments, such as tumbling, hopping, and session windows, each serving different aggregation needs.\u003c/p\u003e\n"],["\u003cp\u003eWatermarks in Dataflow indicate the expected completeness of data in a window, with late data being processed separately, and triggers determine when aggregated results are emitted, based on time or data element count.\u003c/p\u003e\n"]]],[],null,["Unbounded [PCollections](/dataflow/docs/concepts/beam-programming-model#basic_concepts),\nor unbounded *collections* , represent data in streaming pipelines. An unbounded\ncollection contains data from a continuously updating data source such as\n[Pub/Sub](/pubsub/docs).\n\nYou cannot use only a key to group elements in an unbounded collection. There might\nbe infinitely many elements for a given key in streaming data because the data\nsource constantly adds new elements. You can use [windows](#windows),\n[watermarks](#watermarks), and [triggers](#triggers) to aggregate elements in\nunbounded collections.\n\nThe concept of windows also applies to bounded PCollections that represent data in batch pipelines.\nFor information on windowing in batch pipelines, see the Apache Beam documentation\nfor [Windowing with bounded PCollections](https://beam.apache.org/documentation/programming-guide/#windowing-bounded-collections).\n\nIf a Dataflow pipeline has a bounded data source, that is, a source\nthat does not contain continuously updating data, and the pipeline is switched to streaming\nmode using the `--streaming` flag, when the bounded source is fully consumed,\nthe pipeline stops running.\n\nUse streaming mode\n\nTo run a pipeline in streaming mode, set the `--streaming` flag in the\n[command line](/dataflow/pipelines/specifying-exec-params#setting-pipelineoptions-from-command-line-arguments)\nwhen you run your pipeline. You can also set the streaming mode\n[programmatically](/dataflow/pipelines/specifying-exec-params#configuring-pipelineoptions-for-execution-on-the-cloud-dataflow-service)\nwhen you construct your pipeline.\n\nBatch sources are not supported in streaming mode.\n\nWhen you update your pipeline with a larger pool of workers, your streaming job\nmight not upscale as expected. For streaming jobs that don't use\n[Streaming Engine](/dataflow/docs/streaming-engine),\nyou cannot scale beyond the original number of workers and Persistent Disk\nresources allocated at the start of your original job. When you\n[update](/dataflow/pipelines/updating-a-pipeline) a\nDataflow job and specify a larger number of workers in the new\njob, you can only specify a number of workers equal to the maximum number of\nworkers that you specified for your original job.\n\nSpecify the maximum number of workers by using the following flags: \n\nJava\n\n`--maxNumWorkers`\n\nPython\n\n`--max_num_workers`\n\nGo\n\n`--max_num_workers`\n\nWindows and windowing functions\n\n*Windowing functions* divide unbounded collections into logical components, or\n*windows*. Windowing functions group unbounded collections by the timestamps of\nthe individual elements. Each window contains a finite number of elements.\n\nYou set the following windows with the [Apache Beam SDK](https://beam.apache.org/documentation/programming-guide/#windowing):\n\n- [Tumbling windows](#tumbling-windows) (called [fixed windows](https://beam.apache.org/documentation/programming-guide/#fixed-time-windows) in Apache Beam)\n- [Hopping windows](#hopping-windows) (called [sliding windows](https://beam.apache.org/documentation/programming-guide/#sliding-time-windows) in Apache Beam)\n- [Session windows](#session-windows)\n\nTumbling windows\n\nA tumbling window represents a consistent, disjoint time interval in the data\nstream.\n\nFor example, if you set to a thirty-second tumbling window, the elements with\ntimestamp values \\[0:00:00-0:00:30) are in the first window. Elements with\ntimestamp values \\[0:00:30-0:01:00) are in the second window.\n\nThe following image illustrates how elements are divided into thirty-second tumbling\nwindows.\n\nHopping windows\n\nA hopping window represents a consistent time interval in the\ndata stream. Hopping windows can overlap, whereas tumbling windows are disjoint.\n\nFor example, a hopping window can start every thirty seconds and capture one\nminute of data. The frequency with which hopping windows begin is\ncalled the *period*. This example has a one-minute window and thirty-second period.\n\nThe following image illustrates how elements are divided into one-minute\nhopping windows with a thirty-second period.\n\nTo take running averages of data, use hopping windows. You can use one-minute\nhopping windows with a thirty-second period to compute a one-minute running\naverage every thirty seconds.\n\nSession windows\n\nA session window contains elements within a *gap duration* of another element.\nThe gap duration is an interval between new data in a data stream. If data\narrives after the gap duration, the data is assigned to a new window.\n\nFor example, session windows can divide a data stream representing user mouse\nactivity. This data stream might have long periods of idle time interspersed\nwith many clicks. A session window can contain the data generated by the clicks.\n\nSession windowing assigns different windows to each data key. Tumbling and\nhopping windows contain all elements in the specified time interval, regardless\nof data keys.\n\nThe following image visualizes how elements are divided into session windows.\n\nWatermarks\n\nA *watermark* is a threshold that indicates when Dataflow expects\nall of the data in a window to have arrived. If the watermark has progressed\npast the end of the window and new data arrives with a timestamp within the\nwindow, the data is considered *late data* . For more information, see\n[Watermarks and late data](https://beam.apache.org/documentation/programming-guide/#watermarks-and-late-data)\nin the Apache Beam documentation.\n\nDataflow tracks watermarks because of the following reasons:\n\n- Data is not guaranteed to arrive in time order or at predictable intervals.\n- Data events are not guaranteed to appear in pipelines in the same order that they were generated.\n\nThe data source determines the watermark. You can\n[allow late data](https://beam.apache.org/documentation/programming-guide/#managing-late-data)\nwith the Apache Beam SDK.\n\nTriggers\n\n*Triggers* determine when to emit aggregated results as data arrives. By\ndefault, results are emitted when the [watermark](#watermarks)\npasses the end of the window.\n\nYou can use the Apache Beam SDK to create or modify triggers for each\ncollection in a streaming pipeline.\n\nThe Apache Beam SDK can [set triggers](https://beam.apache.org/documentation/programming-guide/#triggers) that operate on any combination of the following conditions:\n\n- Event time, as indicated by the timestamp on each data element.\n- Processing time, which is the time that the data element is processed at any given stage in the pipeline.\n- The number of data elements in a collection.\n\nWhat's next\n\n- [Streaming 101: The world beyond batch](https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101) (blog)\n- [Streaming 102: The world beyond batch](https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102) (blog)"]]