Data Analytics

Architecting data pipelines at Universe.com puts customer experience on center stage

September 11, 2019

Ahmed El Hussaini

Data Architect, Universe.com

Editor’s note: Today we’re hearing from Universe, an event-based ticketing and marketing platform and a division of Live Nation. They moved to Google Cloud so they could develop new features faster, gather and act on data insights, and bring customers a great online experience.

At Universe, we serve customers day and night and are always working to make sure they have a great experience, whether online or at one of our live events. Our technology has to make that possible, and our legacy systems weren’t cutting it anymore. What we needed was a consistent, reliable infrastructure that would help our internal teams provide a fast and innovative ticket-buying experience to customers. With our data well-managed, we could free up time for our developers to bring new web features to customers, like tailored add-ons at checkout.

Our team of about 20 software engineers needed more flexibility and agility in our infrastructure; we were using various data processing tools, and it wasn’t easy to share data across teams so that everyone saw the same information. We also needed to incorporate streaming data into the data warehouse to ensure the consistency and integrity of data that’s read in a particular window of time from multiple sources. Our developer teams needed to be able to ship new features faster, and the data back ends were getting in the way.

In addition, when GDPR regulations went into effect, we needed to make sure all our data was anonymized, and we couldn’t do that with our legacy tools.

Finding the right data tools for the job
To make sure our customers were getting a top-notch online experience, we had to make the right technology choices. Our first step was to centralize multiple data sources and create a single data warehouse that could serve as the foundation for all of our reporting requirements, both internal and external. The new technology infrastructure we built had to let us move and analyze data easily, so our teams could focus on using that data and insights to better serve our customers. Previously, we had lots of siloed systems and applications running in AWS. We did a trial using Redshift, but we needed more flexibility than it offered in how we loaded historical data into our cloud data warehouse. Though we were using MongoDB Atlas for our transactional database, it was important to continue using SQL for querying data.

Google Cloud

Announcing MongoDB Atlas free tier on Google Cloud

The free tier offers a no-cost sandbox environment for MongoDB Atlas on GCP so you can test any potential MongoDB workloads and decide to upgrade to a larger paid Atlas cluster once you have confidence in our cloud products and performance.

By Kent Smith • 2-minute read

https://storage.googleapis.com/gweb-cloudblog-publish/images/15_-_Google_Cloud_2fKuG6b.max-900x900.jpg

The trial task that really sold us on BigQuery was when we wanted to alter a small table that had about 20 million rows, used for internal reporting. We needed to add a value, but our PostgreSQL system wouldn’t allow it. Using Apache Beam, we set up a simple pipeline that moved data from the original source into BigQuery to see if we could add the column there. BigQuery ingested the data and let us add the new value in seconds. That was a significant moment that led us to start looking at how we could build end-to-end solutions on Google Cloud. BigQuery gave us multiple options to load our historical data in batches and build powerful pipelines.

We also explored Google Cloud’s migration tools and data pipeline options. Once we saw how Cloud Dataflow worked, with its Apache Beam back-end, we never looked back. Google Cloud provided us with the data tools we needed to build our data infrastructure.

Cloud for data, and for users
Introducing new technologies isn’t always simple—companies sometimes avoid it altogether because it’s so hard. But our Google Cloud onboarding process has been easy.

It took us less than two months to fully deploy our BigQuery data warehouse using the Cloud Dataflow-Apache Beam combination. Moving to Google Cloud brought us a lot of technology advantages, but it’s also been hugely helpful for our internal users and customers. The data analytics capabilities that we’re now able to offer users has really impressed our internal teams, including developers and DevOps, even those who haven’t used this type of technology before. Some internal clients are already entirely self-service. We’ve hosted frequent demos, and also hosted some “hack days,” where we share knowledge with our internal teams to show them what’s possible.

We quickly found that BigQuery helped us solve scale and speed problems. One of our main pain points had been adding upsell opportunities for customers during the checkout process. The legacy technology hadn’t allowed us to quickly reflect those changes in the data warehouse. With BigQuery, we’re able to do that, and devote fewer resources to making it happen. We’ve also eliminated the time we were spending tuning memory and availability, since BigQuery handles that. Database administration and tuning required specialized knowledge and experience and took up time. With BigQuery, we don’t have to worry about configuring that hardware and software. It just works.

Two features in particular that we implemented using BigQuery have helped us improve the performance of our core transactional database. First, using Cloud Dataflow to convert raw MongoDB logs to structured rows under a BigQuery table, which we can then query using SQL to identify slow or underperforming queries. Second, we can now query multiple logging tables using wildcards, since we load Fastly logs to BigQuery.

Along with MongoDB Atlas as our main transactional database, much of our infrastructure now runs as Google Cloud microservices using Google Kubernetes Engine (GKE), including the home page and our payment system. Kubernetes cron jobs power background scheduled jobs, and we also use Cloud Pub/Sub. Cloud Storage handles any data storage if any space constraints emerge.

Our overall performance has increased by about 10x with BigQuery. Both our customers and internal clients, like our sales and finance teams, benefit from the new low-latency reporting. Reports that used to be weekly or monthly are now available in near-real time. It’s not only faster to read records, but faster to move the data, too. We have Cloud Dataflow pipelines that write to multiple places, and the speed of moving and processing data is incredibly helpful. We stream in financial data using Cloud Dataflow in streaming mode, and plan to have different streaming pipelines as we grow. We have several batch pipelines that run every day. We can move terabytes of data without performance issues, and process more than 100,000 rows of data per second from the underlying database. It used to take us a month to move that volume of data into our data warehouse. With BigQuery, it takes two days.

We’re also enjoying how easy and productive these tools are. They make our life as software engineers easier, so we can focus on the problem at hand, not fighting with our tools.

What’s next for Universe
Our team will continue to push even more into Google Cloud’s data platform. We have plans to explore Cloud Datastore next. We’re also moving our databases to PostgreSQL on GCP, using Cloud Dataflow and Beam. BigQuery’s machine learning tools may also come into play as Universe’s cloud journey evolves, so we can start doing predictive analytics based on our data. We’re looking forward to gaining even more speed and agility to meet our business goals and customer needs.

Learn more here about Google Cloud, and more here about Universe.

Posted in