Google Cloud Platform

Google Cloud and NCAA® team up for a unique March Madness® competition hosted on Kaggle

Today, we’re excited to launch the Google Cloud and NCAA® Machine Learning Competition hosted on Kaggle, and open to data scientists and data analysts of all skill levels. This year’s Kaggle competition includes both the Division I Men’s and Women’s Basketball Championships and a $100,000 prize pool.

ncaa-kaggle-1e6e4.PNG

The data:

March Madness®, the culmination of thousands of regular season games, represents some of the most uniquely competitive games and moments. This year, we are thrilled to release every NCAA Division I Men’s and Women’s Basketball play-by-play moment since 2009. Whether you write SQL, Pandas, R, or Apps Script—or use a pen and paper—these 40,000,000+ plays enable the development of more skilled and interactive features, supporting the development of smarter models. Here is one of my Wile E. Coyote feature query ideas that focuses on pressure, performance, competition, and experience:

How many times has a team with 3 freshmen starters shot better than 50% from three point range and had a 2:1 assist to turnover ratio when they were losing by 8 points with 6 minutes to go against an opponent ranked 5th in three point shots allowed?

Having this data in a SQL-compliant serverless data warehouse like BigQuery enables you to rapidly explore feature concepts and enrich the data with other data like travel, weather, and more.

The competition:

As a result of the continued collaboration between Google Cloud and the NCAA, the fifth annual Kaggle-backed March Madness competition is open for registration today through Thursday March, 15 at 3PM UTC, when final brackets are due.

Whether you’re a data scientist by profession, by degree, or you’re simply an enthusiast, the competition uses the NCAA March Madness tournament as the common backdrop for participants to strengthen their knowledge of basketball, statistics, data modeling, and cloud technology.

Since this is Kaggle we’re talking about, the submissions will be scored by log loss results, which is a way of measuring accuracy often used with machine learning models. The greater the confidence a model shows in a wrong answer, the higher the result and the greater the penalty. So in this case, the smaller the log loss, the better. Check out the full scoring methodology over at Kaggle.

The learning:

This Wednesday at 1PM EST, we’ll be hosting a Reddit AMA with Kaggle CEO, Anthony Goldbloom, where he’ll be answering questions about applying machine learning to topical events like the Google Cloud & NCAA Machine Learning Competition. Kaggle’s discussion forums for past and present competitions are another great resource where competitors can get started or learn from others.

Throughout the competition, we’ll be providing starter code in Kaggle Kernels, and $300 in GCP credits are available to new GCP users, so even novice data scientists have a starting point for building machine learning models to forecast which teams will advance through the tournament.

We’re also hitting the road and headed back to college to co-host twenty Google Cloud Advanced Bracketology campus events where we’ll conduct hands-on GCP training, introduce higher education students to the Kaggle platform, and help build their skills in the field of data science and machine learning.

The spoils:

There will be a total of $100,000 in prize money with $25K awarded to first, $15K to second, and $10K to third place, across both the men’s and women’s competitions for the most innovative applications of machine learning as determined by the competition rules.

Life is about wins and losses, your challenge in this competition is to maximize your wins and minimize your losses. Don’t be too hot on your probabilities, yet don’t be too cold. This “Goldilocks” challenge is a tough one to crack given the Madness of the tournament.

We look forward to seeing the Madness play out on the modeling court and of course we’ll be tracking the progress all the way to San Antonio for the greatest query result of all:

Language: SQL

  SELECT  team, COUNT(moment) AS cut_down_the_net
FROM  `bigquery-public-data:ncaa_basketball.mbb_pbp_ncaa`
WHERE moment = “ONE SHINING” AND season = “2017_2018”
GROUP BY team
ORDER BY cut_down_the_net DESC

Be sure to sign up, and let the modeling games begin!