A public dataset is any dataset that is stored in BigQuery and made available to the general public. This page lists a special group of public datasets that Google BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these data sets and provides public access to the data via BigQuery. You pay only for the queries that you perform on the data (the first 1 TB per month is free, subject to query pricing details).
1000 Cannabis Genomes Project
Genomic open dataset of approximately 850 strains of Cannabis via the Open Cannabis Project.
Bay Area Bike Share Trips
This data includes all Bay Area Bike Share trips from August 2013 to the present, and is updated daily.
Chicago Crime Data
This dataset reflects reported incidents of crime that occurred in the City of Chicago from 2001 to the present.
Chicago Taxi Trips
This dataset includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency.
EPA Historical Air Quality Data
This dataset is provided by the EPA and includes 16 measurements of air quality in the United States from 1990 to the present.
GDELT Book Corpus
A dataset that contains 3.5 million digitized books stretching back two centuries, encompassing the complete English-language public domain collections of the Internet Archive (1.3M volumes) and HathiTrust (2.2 million volumes).
This public dataset contains GitHub activity data for more than 2.8 million open source GitHub repositories, more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files.
A dataset that contains all stories and comments from Hacker News since its launch in 2006.
Healthcare Common Procedure Coding System (HCPCS) Level II
This dataset includes classification of procedures performed for patients using the Healthcare Common Procedure Coding System (HCPCS) which is maintained by Centers for Medicare and Medicaid Services (CMS).
IRS Form 990 Data
A dataset that contains financial information about nonprofit/exempt organizations in the United States, gathered by the Internal Revenue Service (IRS) using Form 990.
Major League Baseball Data
This public dataset contains pitch-by-pitch activity data for Major League Baseball (MLB) in 2016.
This public dataset summarizes the utilization and payments for procedures, services, and prescription drugs provided to Medicare beneficiaries by specific inpatient and outpatient hospitals, physicians, and other suppliers.
NHTSA Traffic Fatality Data
This public dataset was created by the Unites States Department of Transportation's National Highway Traffic Safety Administration and includes numerous aspects of traffic accidents that resulted in fatalities.
This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. This dataset draws from more than 20 sources, including some data from every year since 1763.
This public dataset was created by the National Oceanic and Atmospheric Administration (NOAA) and includes global data obtained from the USAF Climatology Center. This dataset covers GSOD data between 1929 and 2016, collected from over 9000 stations.
NOAA International Comprehensive Ocean-Atmosphere Data Set
The ICOADS dataset contains global marine data from ships (merchant, navy, research) and buoys, each capturing details according to the current weather or ocean conditions.
NYC 311 Service Requests
This public data includes all 311 service requests from 2010 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
NYC Citi Bike Trips
Data collected by the NYC Citi Bike bicycle sharing program, that includes trip records for 10,000 bikes and 600 stations across Manhattan, Brooklyn, Queens, and Jersey City since Citi Bike launched in September 2013.
NYC TLC Trips
Data collected by the NYC Taxi and Limousine Commission (TLC) that includes trip records from all trips completed in yellow and green taxis in NYC from 2009 to the present.
NYC Tree Census
The NYC street tree data includes data from the 1995, 2005 and 2015 Street Tree Censuses, which are conducted by volunteers organized by the NYC Department of Parks and Recreation.
NYPD Motor Vehicle Collisions
This dataset includes details of Motor Vehicle Collisions in New York City provided by the Police Department (NYPD) from 2012 to the present.
OpenAQ: Real-time Air Quality Data
OpenAQ is an open-source project to surface live, real-time air quality data from around the world.
Open Images Data
This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.
The RxNorm public dataset was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs. This dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
San Francisco 311 Service Requests Data
This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily.
San Francisco Fire Department Service Calls Data
This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition.
San Francisco Police Reports Data
This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present.
San Francisco Street Trees Data
This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location.
Stack Overflow Data
This public dataset contains an archive of Stack Overflow content, including posts, votes, tags, and badges.
USA Bureau of Labor Statistics
This dataset includes economic statistics on inflation, prices, unemployment, and pay & benefits provided by the Bureau of Labor Statistics (BLS).
United States Census Data
This dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age, and location using zip code tabular areas (ZCTAs) and GEOIDs.
United States Census Bureau International Data
The United States Census Bureau’s International Dataset provides estimates of country populations since 1950 and projections through 2050.
USA Disease Surveillance
A dataset published by the US Department of Health and Human Services that includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published between 1888 and 2013.
A Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
World Bank: Education Statistics
This dataset from the World Bank provides global education information and statistics for over 200 countries around the world.
World Bank: Global Health, Nutrition, and Population
This dataset from the World Bank provides global health information and statistics for over 200 countries around the world.
World Bank: International Debt
This dataset from the World Bank provides international debt statistics for the world’s economies from 1970-2015, including scheduled debt payments on commitments to 2023.
How to query public data sets using BigQuery
BigQuery is a fully managed data warehouse and analytics platform. The public datasets listed on this page are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.
The first terabyte of data processed per month is free, so you can start querying datasets without enabling billing. To get started running some sample queries, select or create a project and then run the example queries on the NOAA GSOD weather dataset.
- Select or create a Cloud Platform Console project.
- Go to the NOAA GSOD dataset in the BigQuery Web UI.
Go to NOAA GSOD dataset
- Click the COMPOSE QUERY button.
- Copy and paste the SQL examples on the NOAA GSOD page.
Other Public Datasets
There are many other public datasets available for you to query, some of which are also hosted by Google, but many more that are hosted by third parties. You can share any of your datasets with the public by changing the sharing permissions associated with your dataset. For more information about sharing datasets, see Access Control.
- Sample Tables
- Google Genomics Public Data
- Datasets publicly available on Google BigQuery (reddit.com)
How to list your public data set on BigQuery
If you have any questions about listing a public data set on this page, please contact us at firstname.lastname@example.org.