Dataflow is built on the open source Apache Beam project. You can use the Apache Beam SDK to build pipelines for Dataflow. This document lists some resources for getting started with Apache Beam programming.
Install the Apache Beam SDK: Shows how to install the Apache Beam SDK so that you can run your pipelines on the Dataflow service.
Apache Beam programming guide: Provides guidance for using the Apache Beam SDK classes to build and test your pipeline.
Tour of Apache Beam: A learning guide you can use to familiarize yourself with Apache Beam. Learning units are accompanied by code examples that you can run and modify.
Apache Beam playground: An interactive environment to try out Apache Beam transforms and examples without having to install Apache Beam in your environment.
On the Apache Beam website, you can also find information about how to design, create, and test your pipeline:
Design your pipeline: Shows how to determine your pipeline's structure, how to choose which transforms to apply to your data, and how to determine your input and output methods.
Create your pipeline: Explains the mechanics of using the classes in the Apache Beam SDKs and the necessary steps needed to build a pipeline.
Test your pipeline: Presents best practices for testing your pipelines.
You can use the following examples from the Apache Beam GitHub to start building a streaming pipeline:
- Streaming word extraction (Java)
- Streaming word count (Python), and
streaming_wordcap
(Go).