This public data is published by the US Department of Health and Human Services and includes all weekly surveillance reports of nationally notifiable diseases for all U.S. cities and states published between 1888 and 2013. The data set consists of eight important vaccine-preventable contagious diseases: diphtheria, hepatitis A, measles, mumps, pertussis, polio, rubella and smallpox.
You can start exploring this data in the BigQuery console:
Here are some examples of SQL queries you can run on this data in BigQuery.
These samples use BigQuery’s legacy SQL by setting the
prefix. For more information, see Setting a query prefix.
Diseases by year
#legacySQL SELECT * FROM ( SELECT *, MIN(z___rank) OVER (PARTITION BY cdc_reports_epi_year) AS z___min_rank FROM ( SELECT *, RANK() OVER (PARTITION BY cdc_reports_disease ORDER BY cdc_reports_epi_year ) AS z___rank FROM ( SELECT FLOOR(cdc_reports.epi_week/100) AS cdc_reports_epi_year, cdc_reports.disease AS cdc_reports_disease, COALESCE(CAST(SUM((FLOAT(cdc_reports.cases))) AS FLOAT),0) AS cdc_reports_total_cases FROM [lookerdata:cdc.project_tycho_reports] AS cdc_reports GROUP EACH BY 1, 2) ww ) aa ) xx WHERE z___min_rank <= 500 LIMIT 30000
Comparing Mumps outbreak in California and Connecticut
#legacySQL SELECT * FROM ( SELECT *, MIN(z___rank) OVER (PARTITION BY cdc_reports_epi_week) AS z___min_rank FROM ( SELECT *, RANK() OVER (PARTITION BY cdc_reports_state ORDER BY cdc_reports_epi_week ) AS z___rank FROM ( SELECT cdc_reports.epi_week AS cdc_reports_epi_week, cdc_reports.state AS cdc_reports_state, COALESCE(CAST(SUM((FLOAT(cdc_reports.cases))) AS FLOAT),0) AS cdc_reports_total_cases FROM [lookerdata:cdc.project_tycho_reports] AS cdc_reports WHERE (cdc_reports.disease = 'MUMPS') AND (FLOOR(cdc_reports.epi_week/100) = 1970) AND (cdc_reports.state = 'CA' OR cdc_reports.state = 'CT') GROUP EACH BY 1, 2) ww ) aa ) xx WHERE z___min_rank <= 500 LIMIT 30000
Mumps shows the same seasonal pattern coast to coast, in both California and Connecticut, during the 1970 outbreak.
About the data
Dataset Source: Data.gov
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
View in BigQuery: Go to the USA Contagious Diseasae Dataset