Public Data Sets
This page lists a special group of public datasets that Google BigQuery hosts for you to access and integrate into your applications.
USA Names Data
A Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
NYC TLC Trips
Data collected by the NYC Taxi and Limousine Commission (TLC) that includes trip records from all trips completed in yellow and green taxis in NYC from 2009 to 2015.
A dataset that contains all stories and comments from Hacker News since its launch in 2006.
USA Disease Data
A dataset published by the US Department of Health and Human Services that includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published between 1888 and 2013.
GDELT Books Corpus
A dataset that contains 3.5 million digitized books stretching back two centuries, encompassing the complete English-language public domain collections of the Internet Archive (1.3M volumes) and HathiTrust (2.2 million volumes).
NOAA GSOD Weather
This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and 2016, collected from over 9000 stations.
This public dataset contains GitHub activity data for more than 2.8 million open source GitHub repositories, more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files.