Jump to Content
Google Cloud

Spotify’s experiments with stream processing on Google Cloud Dataflow

March 11, 2016
Tino Tereshko

Product Manager, Google BigQuery

Yesterday, Spotify engineer Igor Maravić released the third and final blog post in a series that talks about Spotify’s experience implementing streaming pipelines using Google Cloud Dataflow, and prototyping the solution so far. Of note:

  • Lessons learned working with the unified batch and stream processing model offered by Cloud Dataflow
  • Dataflow’s concepts of window and watermark to work with late arriving data
  • Performance and scalability of running Dataflow pipelines
  • Plans to mature Spotify’s Pub/Sub - Dataflow architecture to production
Last week, Igor talked about operating Kafka at scale, and their choice to leverage Google Cloud Pub/Sub as a messaging queue for the next generation of their event delivery architecture.

https://storage.googleapis.com/gweb-cloudblog-publish/images/spotify-3jdmg.max-700x700.PNG

Two weeks ago, Igor shared details with us about Spotify’s existing event delivery architecture, largely based on Kafka, HDFS. and Crunch-MapReduce.

You can find Igor’s latest blog post and the previous two posts in this series on the Spotify Engineering blog.

Posted in