Data Transformation with Cloud Dataflow
- Automated Resource Management
- Cloud Dataflow automates provisioning and management of processing resources to minimize latency and maximize utilization; no more spinning up instances by hand or reserving them.
- Dynamic Work Rebalancing
- Automated and optimized work partitioning dynamically rebalances lagging work. No need to chase down “hot keys” or pre-process your input data.
- Reliable & Consistent Exactly-once Processing
- Provides built-in support for fault-tolerant execution that is consistent and correct regardless of data size, cluster size, processing pattern or pipeline complexity.
- Horizontal Auto-scaling
- Horizontal auto-scaling of worker resources for optimum throughput results in better overall price-to-performance.
- Unified Programming Model
- Apache Beam SDK offers equally rich MapReduce-like operations, powerful data windowing, and fine-grained correctness control for streaming and batch data alike.
- Community-driven Innovation
- Developers wishing to extend the Cloud Dataflow programming model can fork and/or contribute to Apache Beam.
“Running our pipelines on Cloud Dataflow lets us focus on programming without having to worry about deploying and maintaining instances running our code (a hallmark of GCP overall).”- Jibran Saithi Lead Architect, Qubit
|Dataflow Worker Type||vCPU
|Local storage - Persistent Disk
|Local storage - SSD based
|Dataflow Shuffle 3