Partnerships & Integrations
Google Cloud Platform partners and 3rd party developers have developed integrations with Dataflow to quickly and easily enable powerful data processing tasks of any size. Integrations are done with the open APIs provided by Dataflow.
- Resource Management
- Cloud Dataflow fully automates management of required processing resources. No more spinning up instances by hand.
- On Demand
- All resources are provided on demand, enabling you to scale to meet your business needs. No need to buy reserved compute instances.
- Intelligent Work Scheduling
- Automated and optimized work partitioning which can dynamically rebalance lagging work. No more chasing down “hot keys” or pre-processing your input data.
- Auto Scaling
- Horizontal auto scaling of worker resources to meet optimum throughput requirements results in better overall price-to-performance.
- Unified Programming Model
- The Dataflow API enables you to express MapReduce like operations, powerful data windowing, and fine grained correctness control regardless of data source.
- Open Source
- Developers wishing to extend the Dataflow programming model can fork and or submit pull requests on the Apache Beam SDKs. Dataflow pipelines can also run on alternate runtimes like Spark and Flink.
- Integrated into the Google Cloud Platform Console, Cloud Dataflow provides statistics such as pipeline throughput and lag, as well as consolidated worker log inspection—all in near-real time.
- Integrates with Cloud Storage, Cloud Pub/Sub, Cloud Datastore, Cloud Bigtable, and BigQuery for seamless data processing. And can be extended to interact with others sources and sinks like Apache Kafka and HDFS.
- Reliable & Consistent Processing
- Cloud Dataflow provides built-in support for fault-tolerant execution that is consistent and correct regardless of data size, cluster size, processing pattern or pipeline complexity.
“Streaming Google Cloud Dataflow perfectly fits requirements of time series analytics platform at Wix.com, in particular, its scalability, low latency data processing and fault-tolerant computing. Wide range of data collection transformations and grouping operations allow to implement complex stream data processing algorithms.”- Gregory Bondar Ph.D., Sr. Director of Data Services Platform, Wix.com