Programming Model for Cloud Dataflow SDK 2.x

On the Beam website, you can find the Beam Programming Guide, a complete guide that walks you through the various basic concepts of building Beam SDK-based pipelines. These concepts include:

  • PCollections - the PCollection abstraction represents a potentially distributed, multi-element data set, that acts as the pipeline's data. Beam transforms use PCollection objects as inputs and outputs.
  • Transforms - these are the operations in your pipeline. A transform takes a PCollection (or multiple PCollections) as input, performs an operation that you specify on each element in that collection, and produces a new output PCollection.
  • Pipeline I/O - Beam provides read and write transforms for a number of common data storage types, as well as allows you to create your own.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...

Cloud Dataflow Documentation