Joining fans and artists in perfect harmony with Cloud SQL
Editor’s note: We’re hearing today from Songkick, a U.K.-based concert discovery service owned by Warner Music Group. Annually, Songkick helps over 175 million music fans from all around the world to track their favorite artists, discover concerts and live streams, and buy tickets with confidence online and via their mobile apps and website. Here’s how the team was able to streamline their process and open up new potential by moving their data from physical servers to the cloud.
Since 2007, we have specialized in making it as easy, fun, and fair as possible for fans to see their favorite artists live. We do this by gathering information shared by artists, promoters, and ticketing partners, storing it on a database of event information, and cross-referencing against user-flagged data in a tracking database. This lets our users know who is playing in their favorite venues, where their favorite artists are performing, and how to get tickets as soon as they’re on sale.
For many years, all of this depended on physical server space. We managed three racks in an offsite location, so whenever we had any hardware issues, it meant that someone would need to physically go to the location to make changes, even if it was the middle of the night. This meant more unnecessary, time-consuming work for our team and a greater potential for long downtimes. When we were acquired by Warner Music Group, we evaluated what we should focus on and what kind of value we want to deliver as an engineering team. It became clear that maintaining physical machines or database servers were not part of it.
Moving to a global venue
Moving to the cloud was the obvious solution, and when we did our research, we found that Google Cloud was the best option for us. By adopting Google Cloud managed services, all of our database infrastructure is managed for us, meaning we don’t have to deal with issues like hardware failure—especially not at 4 a.m. It also meant that we no longer had to deal with one of the biggest infrastructure headaches—software upgrades—which, between testing and prep work, previously would have taken over a month to upgrade the physical offsite servers. Honestly, we are just happy to let Google deal with that and our engineers can focus on creating software.
Migration was thankfully very easy with Google Cloud. Using external replication, we moved one database instance at a time, with about five minutes of downtime for each. We could have made it with almost zero downtime but it was not necessary for our scenario. Today, all four of our databases run on Cloud SQL for MySQL with the largest databases—musical event information and artist tour and show tracking information—hosted on dedicated instances. These are quite large; our total data usage is around 1.25TB, which includes about 400 GB of event data and 100 GB of tracking data. The two larger databases are 8 CPU, 30 GB of RAM, and the other two are 4 CPU, with 15 GB RAM. We duplicate that data into our staging environment, so total data in CloudSQL is about 2.5 TB.
Overall, we get to spend less time thinking about and dealing with MySQL, and more time making improvements that directly impact the business.
Keeping data clean and clear with Cloud SQL
One of the great things about Songkick is that we get data directly from artists, promoters, venues, and ticket sellers, meaning that we can get more accurate information as soon as it’s available. The drawback of this is that when data comes from all of these sources, it means that it comes in multiple formats that often weren’t created to work together. It also means that we often get the same information from multiple sources, which can make things confusing for users.
Cloud SQL acts as our source-of-truth datastore, ensuring that all of our teams and the 30 applications that contain our business logic are sharing the same information. We apply dedupe and normalization rules on incoming data before it is stored in Cloud SQL, thus reducing the risk of incorrect, inconsistent, duplicated, or incomplete data.
This is only the beginning of what we’re looking to improve at Songkick on Google Cloud. We’re planning to expand our data processing operations, including creating a service for artists that will show them where their most engaged audiences are, helping them plan better tours. We want to streamline this process by aggregating queries on BigQuery, then storing the summarized results back in Cloud SQL. That means a better experience for the fans and the artists, and it all starts with a better database in the cloud.