Jump to Content

Announcing a Firestore Connector for Apache Beam and Cloud Dataflow

November 8, 2021
Chris Wilcox

Staff Engineer, Google Cloud

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

Large scale data processing workloads can be challenging to operationalize and orchestrate. We’re excited to announce the release of a Firestore in Native Mode connector for Apache Beam to make data processing easier than ever for Firestore users. Apache Beam is an open source project that supports large scale data processing with a unified batch and streaming processing model.  Beam is portable, works with many different backend runners, and allows for flexible deployment. The Firestore Beam I/O Connector joins BigQuery, Bigtable, and Datastore as Google databases with Apache Beam connectors.  The Firestore I/O Connector is automatically included with the Google Cloud Platform IO module of the Apache Beam Java SDK.  

The Firestore connector can be used with a variety of Apache Beam backends, including Google Cloud Dataflow. Dataflow, an Apache Beam backend runner, provides a structure for developers to solve “embarrassingly parallel” problems. Mutating every record of your database is an example of such a problem. Using Beam pipelines removes much of the work of orchestrating the parallelization and allows developers to instead focus on the transforms on the data.

The Firestore connector can be used in a simple way, the same way you would use other Beam connectors:


There are many possible applications for this connector for Google Cloud users. Joining disparate data in a Firestore in Native Mode database, relating data across multiple databases, deleting a large number of entities, writing Firestore data to BigQuery, and more. We’re excited to have contributed this connector to the Apache Beam ecosystem and can’t wait to see how you use the Firestore connector to build the next great thing.

Posted in