Major League Baseball Data

This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016.

This dataset contains the following tables:

Table Name Description
games_wide Every pitch, steal, or lineup event for each at bat in the 2016 regular season.*
games_post_wide Every pitch, steal, or lineup event for each at-bat in the 2016 post season.*
schedules The schedule for every team in the regular season.

*The schemas for the games_wide and games_post_wide tables are identical.

With this data you can effectively replay a game and rebuild basic statistics for players and teams.

You can start exploring this data in the BigQuery console:

Go to the Baseball public dataset

Sample Queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What types of pitches were thrown, how many, top speeds?

#standardSQL
SELECT
  stats.pitchTypeDescription as ptDescription,
  COUNT(stats.pitchType) as thrown,
  MAX(stats.pitchSpeed) as speed_max,
  SUM(is_hit) as hits,
  SUM(out_on_strikes) as strike_outs,
  SUM(is_ground_out) as ground_outs,
  ROUND(SUM(is_ground_out)/ COUNT(stats.pitchType),4) as gdPerPitch,
  SUM(is_walk) as walks
FROM (
  SELECT
    inningNumber,
    inningHalf,
    inningHalfEventSequenceNumber as isn,
    inningEventType,
    atBatEventSequenceNumber as absn,
    description,
    atBatEventType,
    pitchType,
    pitchTypeDescription,
    pitchSpeed,
    pitchZone,
    balls,
    strikes,
    outs,
    status,
    outcomeId,
    outcomeDescription,
    is_hit,
    is_double_play,
    (if (strikes = 3, 1, 0)) as out_on_strikes,
    (if (balls = 4, 1, 0)) as is_walk,
    (if (pitchZone = 1 OR pitchZone = 2 or pitchZone = 3 or pitchZone = 10, 1, 0)) as high_zone,
    (if (pitchZone = 7 OR pitchZone = 8 or pitchZone = 9 or pitchZone = 12, 1, 0)) as low_zone,
    (if (outcomeId  = "oGO", 1, 0)) as is_ground_out
  FROM
    `bigquery-public-data.baseball.games_wide`
  WHERE
    atBatEventType = "PITCH"
  ORDER BY
    inningNumber ASC,
    inningHalf DESC,
    inningHalfEventSequenceNumber ASC,
    atBatEventSequenceNumber  ASC) as stats
GROUP BY
  ptDescription
ORDER BY
  thrown DESC

Only the top ten results are shown here, along with the first several columns:

pitch type query results

What was the at bat and pitching sequence for the fastest pitch(es) in the 2016 season?

#standardSQL
SELECT
  pitcherLastName,
  startTime,
  inningNumber,
  inningHalf as half,
  inningHalfEventSequenceNumber as seq,
  hitterPitchCount,
  pitcherPitchCount,
  pitchSpeed,
  pitchTypeDescription,
  pitchZone,
  hitterLastName,
  outcomeDescription
FROM
  `bigquery-public-data.baseball.games_wide`
WHERE
  atBatEventType = "PITCH"
  AND year = 2016
ORDER BY
  pitchSpeed DESC
LIMIT 10

The results are shown below:

fastest pitches query results

About the dataset

Dataset Source: Sportradar LLC

Category: Sports

Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.

Update Frequency: Monthly (as needed)

View in BigQuery: Go to baseball data

Send feedback about...

BigQuery Documentation