By Frances Perry, Software Engineering Lead at Google and Apache Beam PMC Member
Apache Beam is now a Top Level Project at the Apache Software Foundation, and its future looks brighter than ever — including for Google Cloud Dataflow users
Today, the Apache Software Foundation announced that Apache Beam has successfully graduated from incubation, becoming a Top Level Project following the community-driven development processes of the foundation. Congratulations, Apache Beam!
Apache Beam’s roots come from Google Cloud Dataflow, the fully managed service for executing both batch and streaming data processing pipelines that powers mission critical processes for companies like Spotify, Citi, Qubit and Google itself. Its novel programming model enables users to write unified, efficient and portable data processing pipelines.
Last January, Google and partners from Cloudera, data Artisans, and Talend proposed the creation of a new project to generalize and extend the Dataflow programming model. The resulting Apache Beam project spent the last year in incubation with the Apache Software Foundation, building a vibrant and welcoming open source community, with a number of new contributors joining the original developers. Today’s announcement is a recognition of a sustainable community, with the potential to grow this technology more than any single company could alone.
As announced previously, the Cloud Dataflow SDKs will be based on Apache Beam going forward. This means that Cloud Dataflow users will continue to use the same intuitive programming model for expressing batch and streaming computations and get the same no-knobs, performant Cloud Dataflow runtime that’s tightly integrated with the rest of Google Cloud Platform (GCP). In addition, users will benefit from the portability provided by Apache Beam, allowing them to easily move the same data processing pipeline onto any supported runtime environment, including but not limited to on-premise Apache Spark clusters, Apache Flink running in the cloud, and Cloud Dataflow on GCP.
On that note, we’re beaming with joy to announce the availability of the first Beam-based Dataflow SDK for Java, version 2.0.0-beta1. Now you can use both Java and Python SDKs to run Beam-based pipelines with Beta support on the Cloud Dataflow service. Dataflow SDK for Java 1.9.0 continues to be the recommended version for production use, but to get a taste of the Beam-based future, as well as the usual new features and improvements, try out 2.0.0-beta1 and run your pipeline on the Cloud Dataflow service, as well as anywhere else you’d like.
We’re excited to be involved in the future of Apache Beam and its ecosystem, while continuing to ensure that Cloud Dataflow is the premier place to run Beam pipelines.