Programming Model for Cloud Dataflow SDK 2.x

On the Beam website, you can find the Beam Programming Guide, a complete guide that walks you through the various basic concepts of building Beam SDK-based pipelines. These concepts include:

  • PCollections - the PCollection abstraction represents a potentially distributed, multi-element data set, that acts as the pipeline's data. Beam transforms use PCollection objects as inputs and outputs.
  • Transforms - these are the operations in your pipeline. A transform takes a PCollection (or multiple PCollections) as input, performs an operation that you specify on each element in that collection, and produces a new output PCollection.
  • Pipeline I/O - Beam provides read and write transforms for a number of common data storage types, as well as allows you to create your own.

