EPA Historical Air Quality Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

The United States Environmental Protection Agency (EPA) protects both public health and the environment by establishing the standards for national air quality. The EPA provides annual summary data as well as hourly and daily data across the following categories:

  • Criteria Gases

    • Carbon monoxide (CO)
    • Ground-level Ozone (O3)
    • Nitrogen Dioxide (NO2)
    • Sulfur Dioxide (SO2)
  • Particulates

    • PM2.5 FRM/FEM Mass
    • PM2.5 non FRM/FEM Mass
    • PM10 Mass
    • PM2.5 Speciation
  • Meteorological

    • Barometric Pressure
    • Relative Humidity and Dewpoint
    • Temperature
    • Winds (Resultant)
  • Toxics

    • Lead (Pb)
    • Hazardous Air Pollutants (HAPs)
    • Nitrous Oxides (NONOxNOy)
    • Volatile Organic Compounds (VOCs)

You can start exploring this data in the BigQuery console:

Go to EPA historical air quality data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What are the air quality monitoring sites local to Mountain View, California?

This query retrieves air quality monitoring sites close to Mountain View, California.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  address,
  city_name,
  state_name,
  county_name,
  latitude,
  longitude,
  local_site_name,
  site_num
FROM
  `bigquery-public-data.epa_historical_air_quality.air_quality_annual_summary`
WHERE
  latitude < 37.4
  AND latitude > 37.2
  AND longitude < -122.0
  AND longitude > -122.2
GROUP BY
  address,
  city_name,
  state_name,
  county_name,
  latitude,
  longitude,
  local_site_name,
  site_num
ORDER BY
  latitude,
  longitude

Command-line

bq query --use_legacy_sql=false '
SELECT
  address,
  city_name,
  state_name,
  county_name,
  latitude,
  longitude,
  local_site_name,
  site_num
FROM
  `bigquery-public-data.epa_historical_air_quality.air_quality_annual_summary`
WHERE
  latitude < 37.4
  AND latitude > 37.2
  AND longitude < -122.0
  AND longitude > -122.2
GROUP BY
  address,
  city_name,
  state_name,
  county_name,
  latitude,
  longitude,
  local_site_name,
  site_num
ORDER BY
  latitude,
  longitude'

Sample results are shown here:

+--------------------------------+----------------+-------------+--------------+-----------+--------------+---------------------------------+-----------+
| address                        | city_name      | state_name  | county_name  | latitude  | longitude    | local_site_name                 | site_num  |
+--------------------------------+----------------+-------------+--------------+-----------+--------------+---------------------------------+-----------+
| 22601 Voss Ave                 | Cupertino      | California  | Santa Clara  | 37.318435 | -122.069705  | Cupertino Monta Vista           | 2009      |
| 10300 Ainsworth Dr             | Cupertino      | California  | Santa Clara  | 37.32756  | -122.0646    | Stevens Creek Elementary School | 5506      |
| 910 TICONDEROGA DRIVE          | Sunnyvale      | California  | Santa Clara  | 37.355475 | -122.050845  | Sunnyvale - Ticonderoga         | 2007      |
| 160 CUESTA DR., MOUNTAIN VIEW  | Mountain View  | California  | Santa Clara  | 37.3735   | -122.077     | Mountain View - Cuesta          | 1002      |
+--------------------------------+----------------+-------------+--------------+-----------+--------------+---------------------------------+-----------+

What was the distribution of air quality for Los Angeles in 2015?

The Air Quality Index (AQI) is a color-based safety rating based on 5 critical pollutants regulated under the Clean Air Act. The query below shows the distribution of days in each category for Los Angeles in 2015.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  COUNT(date_local) AS NUM_OF_DAYS,
  CASE
    WHEN aqi < 51 THEN "Good (green)"
    WHEN aqi <101 THEN "Moderate (yellow)"
    WHEN aqi <151 THEN "Unhealthy for sensitive groups (orange)"
    WHEN aqi <201 THEN "Unhealthy (red)"
    WHEN aqi <301 THEN "Very unhealthy (purple)"
    WHEN aqi <501 THEN "Hazardous (maroon)"
    ELSE "unexpected data"
  END AS AQ_RATING
FROM
  `bigquery-public-data.epa_historical_air_quality.pm25_frm_daily_summary`
WHERE
  city_name = "Los Angeles"
  AND state_name = "California"
  AND sample_duration = "24 HOUR"
  AND poc = 1
  AND EXTRACT(YEAR FROM date_local) = 2015
GROUP BY
  2
ORDER BY
  1 DESC

Command-line

bq query --use_legacy_sql=false '
SELECT
  COUNT(date_local) AS NUM_OF_DAYS,
  CASE
    WHEN aqi < 51 THEN "Good (green)"
    WHEN aqi <101 THEN "Moderate (yellow)"
    WHEN aqi <151 THEN "Unhealthy for sensitive groups (orange)"
    WHEN aqi <201 THEN "Unhealthy (red)"
    WHEN aqi <301 THEN "Very unhealthy (purple)"
    WHEN aqi <501 THEN "Hazardous (maroon)"
    ELSE "unexpected data"
  END AS AQ_RATING
FROM
  `bigquery-public-data.epa_historical_air_quality.pm25_frm_daily_summary`
WHERE
  city_name = "Los Angeles"
  AND state_name = "California"
  AND sample_duration = "24 HOUR"
  AND poc = 1
  AND EXTRACT(YEAR FROM date_local) = 2015
GROUP BY
  2
ORDER BY
  1 DESC'

Sample results are shown here:

+-------------+-----------------------------------------+
| NUM_OF_DAYS | AQ_RATING                               |
+-------------+-----------------------------------------+
| 208         | Good (green)                            |
| 124         | Moderate (yellow)                       |
| 6           | Unhealthy for sensitive groups (orange) |
| 1           | Unhealthy (red)                         |
+-------------+-----------------------------------------+

About the data

Dataset Source: United States Environmental Protection Agency

Category: Science

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Annual

View in BigQuery: Go to EPA historical air quality data

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...