Google Cloud

Spotify’s experiments with stream processing on Google Cloud Dataflow

March 11, 2016

Tino Tereshko

Product Manager, Google BigQuery

Yesterday, Spotify engineer Igor Maravić released the third and final blog post in a series that talks about Spotify’s experience implementing streaming pipelines using Google Cloud Dataflow, and prototyping the solution so far. Of note:

Lessons learned working with the unified batch and stream processing model offered by Cloud Dataflow
Dataflow’s concepts of window and watermark to work with late arriving data
Performance and scalability of running Dataflow pipelines
Plans to mature Spotify’s Pub/Sub - Dataflow architecture to production

Last week, Igor talked about operating Kafka at scale, and their choice to leverage Google Cloud Pub/Sub as a messaging queue for the next generation of their event delivery architecture.