Yesterday, Spotify engineer Igor Maravić released the third and final blog post in a series that talks about Spotify’s experience implementing streaming pipelines using Google Cloud Dataflow, and prototyping the solution so far. Of note:
- Lessons learned working with the unified batch and stream processing model offered by Cloud Dataflow
- Dataflow’s concepts of window and watermark to work with late arriving data
- Performance and scalability of running Dataflow pipelines
- Plans to mature Spotify’s Pub/Sub - Dataflow architecture to production
Two weeks ago, Igor shared details with us about Spotify’s existing event delivery architecture, largely based on Kafka, HDFS. and Crunch-MapReduce.
Posted by Tino Tereshko, BigQuery Technical Program Manager