This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016.
This dataset contains the following tables:
|games_wide||Every pitch, steal, or lineup event for each at bat in the 2016 regular season.*|
|games_post_wide||Every pitch, steal, or lineup event for each at-bat in the 2016 post season.*|
|schedules||The schedule for every team in the regular season.|
*The schemas for the games_wide and games_post_wide tables are identical.
With this data you can effectively replay a game and rebuild basic statistics for players and teams.
You can start exploring this data in the BigQuery console:
Here are some examples of SQL queries you can run on this data in BigQuery.
These samples use BigQuery’s support for standard SQL. Use the
tag to let BigQuery know you want to use standard SQL. For more information
#standardSQL prefix, see
Setting a query prefix.
What types of pitches were thrown, how many, top speeds?
#standardSQL SELECT stats.pitchTypeDescription as ptDescription, COUNT(stats.pitchType) as thrown, MAX(stats.pitchSpeed) as speed_max, SUM(is_hit) as hits, SUM(out_on_strikes) as strike_outs, SUM(is_ground_out) as ground_outs, ROUND(SUM(is_ground_out)/ COUNT(stats.pitchType),4) as gdPerPitch, SUM(is_walk) as walks FROM ( SELECT inningNumber, inningHalf, inningHalfEventSequenceNumber as isn, inningEventType, atBatEventSequenceNumber as absn, description, atBatEventType, pitchType, pitchTypeDescription, pitchSpeed, pitchZone, balls, strikes, outs, status, outcomeId, outcomeDescription, is_hit, is_double_play, (if (strikes = 3, 1, 0)) as out_on_strikes, (if (balls = 4, 1, 0)) as is_walk, (if (pitchZone = 1 OR pitchZone = 2 or pitchZone = 3 or pitchZone = 10, 1, 0)) as high_zone, (if (pitchZone = 7 OR pitchZone = 8 or pitchZone = 9 or pitchZone = 12, 1, 0)) as low_zone, (if (outcomeId = "oGO", 1, 0)) as is_ground_out FROM `bigquery-public-data.baseball.games_wide` WHERE atBatEventType = "PITCH" ORDER BY inningNumber ASC, inningHalf DESC, inningHalfEventSequenceNumber ASC, atBatEventSequenceNumber ASC) as stats GROUP BY ptDescription ORDER BY thrown DESC
Only the top ten results are shown here, along with the first several columns:
What was the at bat and pitching sequence for the fastest pitch(es) in the 2016 season?
#standardSQL SELECT pitcherLastName, startTime, inningNumber, inningHalf as half, inningHalfEventSequenceNumber as seq, hitterPitchCount, pitcherPitchCount, pitchSpeed, pitchTypeDescription, pitchZone, hitterLastName, outcomeDescription FROM `bigquery-public-data.baseball.games_wide` WHERE atBatEventType = "PITCH" AND year = 2016 ORDER BY pitchSpeed DESC LIMIT 10
The results are shown below:
About the dataset
Dataset Source: Sportradar LLC
Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.
Update Frequency: Monthly (as needed)
View in BigQuery: Go to baseball data