Data champions: How the Golden State Warriors are turning on-court data into a competitive advantage
The Google Cloud and the Golden State Warriors (GSW) partnership began in 2019 and launched with the opening of the Chase Center, the state-of-the-art sports and entertainment venue in San Francisco. As the public cloud provider for the Warriors, we also joined forces to help transform the franchise through data-driven decision making.
Today, the Warriors use intelligent technologies with Google’s Data Cloud to enable their next generation of machine learning and data analytics to better serve the needs of coaches, front office, staff, players and fans. Together, we developed a real-time data pipeline analysis that provides faster analytics on high volumes of data to help Coaches and Basketball Operations make quicker, more informed decisions. The analytics team was spending 70 percent of its time collecting and shaping data, and only 30 percent analyzing it. To get more from its data, the team wanted to spend less time preparing it.
An NBA basketball operations team covers all aspects of a team’s on-court performance. Within that domain, the Warriors’ Strategy team studies player and team metrics for the purposes of team strategy as well as player acquisition. They gather data, create reports, and explore analysis to help coaches and players produce (literal) wins, and any tool that can improve the speed or reliability of delivering these insights offers a significant competitive advantage. Think DevOps, but within basketball.
The GSW Data and Analytics team sees the real value of data in how it’s wielded, and is always exploring opportunities for process improvement, automation, and seamless collaboration. That exploration starts with data integration, which then leads to project deployment. When these elements become faster and simpler to maintain, teams can extract more and increasingly sophisticated analytic value. We wanted to apply that approach to the opportunity with the Warriors.
Integration: the data pipelines
The first step was to build a sustainable data pipeline. There are a few pieces to the puzzle: the size of the core data set, the type of data, where it lived, and how often it would need to update.
One integral data source for every NBA team is Second Spectrum, which provides real-time, 3D spatial data from optical tracking to capture nearly every movement that occurs on the basketball court—up to a million entries during a typical NBA game. While that’s not “big data” per se, at 30 teams playing 82 games per year (plus playoffs), and years of historical data, it still means terabytes of data to ingest on a constantly updating basis. (And since ingestion is a function of data engineering, they wanted to get it right from the start to prevent downstream problems later on.)
Second Spectrum serves raw data to their storage buckets on AWS S3, which means the Warriors needed to access a ton of raw data outside of our eventual ecosystem. The first tool they tapped for the pipeline was Google Cloud Transfer Service. They configured a one-time copy of each S3 bucket in the Google Cloud UI for the first copy job, and within seconds, they had all of the raw data in Google Cloud Storage. The Warriors then scheduled daily pulls of any new or changed files so that Cloud Storage would stay current, and did it all without leaving the UI.
With raw files storage in Cloud Storage taken care of, the team could pivot to connecting its pipeline to BigQuery. This serverless and cost-effective multi-cloud data warehouse is designed for business agility, scales up to petabytes of data with zero operational overhead and integrates seamlessly with Google Cloud products. This was achieved through the powerful combination of Apache Beam, a parallel processing tool, and Cloud Dataflow, Google Cloud’s fully-managed service for stream and batch data processing. Had they not parallelized the data ingest, an initial data warehouse setup would have taken multiple days in runtime. Instead, the whole initial ingest took about half an hour of wall-clock runtime, while also providing an avenue for quick iteration in case table schemas changed or other file edits emerged in the future.
While Second Spectrum is one very important data source for NBA teams, there are a host of others that allow the strategy team to answer the various questions asked of them by the rest of the basketball ops organization. After the initial, singular pipeline outlined above, the strategy team started thinking about how to integrate and manage more data sources with similar properties. This would require a more robust, holistic integration to avoid having a series of fundamentally disjointed pipelines. The solution was Google Cloud Composer.
Cloud Composer is Google Cloud’s fully managed workflow orchestration tool, built on Apache Airflow; an open source framework for authoring, scheduling and monitoring workflows. The fully managed nature of Composer means that it integrates seamlessly with other Google Cloud services. For example, when creating a Composer environment in the Google Cloud UI, a Kubernetes pod is spun up where the Composer environment exists and the Airflow code runs.
The strategy team used Airflow and Composer to build out fully integrated, continuously updating data pipelines bringing more than a dozen different data sources into the BigQuery data warehouse, while also building out long term storage within Cloud Storage and logging exports via Cloud Pub/Sub.
With those pipelines in place, the fun (and impact) could truly begin.
Extracting results: making this data actionable
Data warehousing is integral for any large scale analysis project, but the fun part is leveraging that data and delivering analysis. It turns out that professional basketball teams aren’t too dissimilar from a typical enterprise: instead of customer purchase data, clickthrough data, or stock prices, they worry about shots, pick-and-rolls and scouting reports. And like many businesses, certain types of analysis can be anticipated and are repeatable.
The strategy team uses dbt to drive a collection of data transforms within BigQuery to calculate thousands of metrics in new tables and views which can then be queried just like any other table in BigQuery. For example, one data model and it’s target transform may take a collection of shots and shot locations and turn that into a player’s effective field goal percentage from a particular zone on the court, which will in turn feed into materials like a scouting report. These transformers and modeling operations are then orchestrated with Cloud Composer.
Shorter time from creation to delivery means faster extraction of value and more room to flex analytical muscle, especially when that process becomes automated. In many industries, latency and timing are key, and basketball is no different. Consider the NBA offseason. After a grueling season, players will get some well earned rest, but front offices are busy trying to improve the roster for the upcoming season.
Teams obviously do their homework ahead of time and have inclinations of who they want to take with a given pick, but when draft day rolls around things generally get a bit crazy. Players and picks get traded and when the team in front of you has made their pick, you have 300 seconds to make yours. Whether the initial question is “What did Scout X note about player Y’s work rate in his mid January scouting trip?” or “What was player Z’s effective field goal percentage against top 25 teams?” having the answers in a centralized data warehouse streamlines the process of answering those questions.
When integration time shrinks from multiple days to less than an hour, and deployment dwindles from hours to minutes, analysts are freed up to finally explore wherever the data may, while also building up accessible knowledge. This accessible knowledge promotes a more effective environment for analytical ideation and hypothesis testing - enabling coaches, players and analysts to build new paths of intelligence. When combined with wisdom, experience and leadership this intelligence helps inform more efficient and objective decision making which spans what happens on and off the court.
Regardless of industry, data cloud services from Google Cloud enable you to reduce time across ingestion, transformation, modeling and insight extraction - providing ever-increasing value to your organization. In short, they enable you to build champions. Go Dubs!