Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

Spotify’s experiments with stream processing on Google Cloud Dataflow

Friday, March 11, 2016

Yesterday, Spotify engineer Igor Maravić released the third and final blog post in a series that talks about Spotify’s experience implementing streaming pipelines using Google Cloud Dataflow, and prototyping the solution so far. Of note:

  • Lessons learned working with the unified batch and stream processing model offered by Cloud Dataflow
  • Dataflow’s concepts of window and watermark to work with late arriving data
  • Performance and scalability of running Dataflow pipelines
  • Plans to mature Spotify’s Pub/Sub - Dataflow architecture to production

Last week, Igor talked about operating Kafka at scale, and their choice to leverage Google Cloud Pub/Sub as a messaging queue for the next generation of their event delivery architecture.

Two weeks ago, Igor shared details with us about Spotify’s existing event delivery architecture, largely based on Kafka, HDFS. and Crunch-MapReduce.

You can find Igor’s latest blog post and the previous two posts in this series on the Spotify Engineering blog.

Posted by Tino Tereshko, BigQuery Technical Program Manager

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.