Programming Model for Cloud Dataflow SDK 2.x

On the Apache Beam website, you can find the Apache Beam Programming Guide, a complete guide that walks you through the various basic concepts of building pipelines using the Apache Beam SDKs. These concepts include:

  • PCollections - the PCollection abstraction represents a potentially distributed, multi-element data set, that acts as the pipeline's data. Beam transforms use PCollection objects as inputs and outputs.
  • Transforms - these are the operations in your pipeline. A transform takes a PCollection (or multiple PCollections) as input, performs an operation that you specify on each element in that collection, and produces a new output PCollection.
  • Pipeline I/O - Beam provides read and write transforms for a number of common data storage types, as well as allows you to create your own.
Apache Beam is a trademark of The Apache Software Foundation or its affiliates in the United States and/or other countries.
Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.