NCAA Basketball Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI, the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

Currently, BigQuery public datasets are stored in the US multi-region location. When you query a public dataset, supply the --location=US flag on the command line, choose US as the processing location in the BigQuery web UI, or specify the location property in the jobReference section of the job resource when you use the API. Because the public datasets are stored in the US, you cannot write public data query results to a table in another region, and you cannot join tables in public datasets with tables in another region.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the Manage resources page

  3. Make sure that billing is enabled for your project.

    Learn how to enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This public data includes statistics for NCAA men’s and women's basketball back to 1894. This dataset contains data about NCAA Basketball games, teams, and players. Game data covers play-by-play and box scores back to 2009, as well as final scores back to 1996. Additional data about wins and losses goes back to the 1894-5 season in some teams' cases.

You can start exploring this data in the BigQuery console:

Go to the NCAA Basketball public dataset

Sample Query

Here is an example of a SQL query you can run on this data in BigQuery.

This sample uses BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

How much does three-point shot accuracy change at the end of a game?

What if you wanted to see how much three-point shooting percentage changes in the last five minutes of Division I games? You'd start with the play-by-play data and group three point shots (both made and attempted) by when they occurred according to the game clock. You'd then divide three-pointers made by three- pointers attempted, giving you the three-point shooting percentages for the first 35 minutes and the last five minutes of the game.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  #first 35 minutes of regulation
  COUNTIF(event_type = "threepointmade"
    AND elapsed_time_sec < 2100) AS threes_made_first35,
  COUNTIF((event_type = "threepointmade"
      OR event_type = "threepointmiss")
    AND elapsed_time_sec < 2100) AS threes_att_first35,
  COUNTIF(event_type = "threepointmade"
    AND elapsed_time_sec < 2100) / COUNTIF((event_type = "threepointmade"
      OR event_type = "threepointmiss")
    AND elapsed_time_sec < 2100) AS three_pt_pct_first35,
  #last five minutes of regulation
  COUNTIF(event_type = "threepointmade"
    AND elapsed_time_sec >= 2100) AS threes_made_last5,
  COUNTIF((event_type = "threepointmade"
      OR event_type = "threepointmiss")
    AND elapsed_time_sec >= 2100) AS threes_att_last5,
  COUNTIF(event_type = "threepointmade"
    AND elapsed_time_sec >= 2100) / COUNTIF((event_type = "threepointmade"
      OR event_type = "threepointmiss")
    AND elapsed_time_sec >= 2100) AS three_pt_pct_last5
FROM
  bigquery-public-data.ncaa_basketball.mbb_pbp_sr
WHERE
  home_division_alias = "D1"
  AND away_division_alias = "D1"

Command-line

bq query --use_legacy_sql=false '
SELECT
COUNTIF(event_type = "threepointmade"
  AND elapsed_time_sec < 2100) AS threes_made_first35,
COUNTIF((event_type = "threepointmade"
    OR event_type = "threepointmiss")
  AND elapsed_time_sec < 2100) AS threes_att_first35,
COUNTIF(event_type = "threepointmade"
  AND elapsed_time_sec < 2100) / COUNTIF((event_type = "threepointmade"
    OR event_type = "threepointmiss")
  AND elapsed_time_sec < 2100) AS three_pt_pct_first35,
COUNTIF(event_type = "threepointmade"
  AND elapsed_time_sec >= 2100) AS threes_made_last5,
COUNTIF((event_type = "threepointmade"
    OR event_type = "threepointmiss")
  AND elapsed_time_sec >= 2100) AS threes_att_last5,
COUNTIF(event_type = "threepointmade"
  AND elapsed_time_sec >= 2100) / COUNTIF((event_type = "threepointmade"
    OR event_type = "threepointmiss")
  AND elapsed_time_sec >= 2100) AS three_pt_pct_last5

FROM bigquery-public-data.ncaa_basketball.mbb_pbp_sr WHERE home_division_alias = "D1" AND away_division_alias = "D1"'

Sample results are shown here:

+---------------------+--------------------+----------------------+-------------------+------------------+---------------------+
| threes_made_first35 | threes_att_first35 | three_pt_pct_first35 | threes_made_last5 | threes_att_last5 | three_pt_pct_last5  |
+---------------------+--------------------+----------------------+-------------------+------------------+---------------------+
|              132539 |             377798 |   0.3508197502368991 |             19775 |            63308 | 0.31236178681999116 |
+---------------------+--------------------+----------------------+-------------------+------------------+---------------------+

About the dataset

Dataset Source: Sportradar LLC, NCAA

Category: Sports, Analytics, Encyclopedic

Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.

Copyright National Collegiate Athletic Association. Access to data is provided solely for internal research and testing purposes, and may not be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from the National Collegiate Athletic Association.

View in BigQuery: Go to the NCAA Basketball public dataset

Was this page helpful? Let us know how we did:

Send feedback about...