Google Cloud Big Data and Machine Learning Blog

Innovation in data processing and machine learning technology

U.S. EPA and OpenAQ air quality data now available in BigQuery

Wednesday, June 7, 2017

By Mike Hamberg, Partner Operations Manager, gTech Feeds

Using these new public datasets in BigQuery is a great way to understand air quality in your community.

Take a deep breath: The average person takes between 17,000 and 23,000 breaths a day. But how often do you breathe in poor quality air? Do you know if the air in your town is clean?

We’re helping answer those questions. We’ve leveraged decades of data from the U.S. EPA and real-time information from OpenAQ to add two air quality datasets to the Google Cloud Public Datasets program:

OpenAQ: Real-time air quality

The OpenAQ dataset is updated hourly1 to show a nearly live look at government-reported air quality around the world. With this dataset, you can answer questions like:

  • Where are the global hotspots for poor air quality right now?
  • How does one city compare to others?

Let’s take a deeper dive on a couple of these by charting them in Data Studio. First, where are the global hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?

Answer: Hualpén, Chile and 2 locations in Turkey currently have the highest concentrations of PM10.

You can try this query yourself in BigQuery using the following standard SQL:
#standardSQL
SELECT 
 location, city, country, value, 
 CONCAT(CAST(latitude AS STRING), ', ', CAST(longitude AS STRING)) AS latlong  
FROM 
  `bigquery-public-data.openaq.global_air_quality` 
WHERE 
  pollutant = "pm10" 
ORDER BY 
  value DESC
Zooming the dashboard in on Europe, we can see how cities compare (and we can infer that we don’t yet have data in some places).

You can also see an interactive map on the OpenAQ website.

EPA: Historical air quality

The EPA dataset contains over 25GB of data, ranging from annual summaries to hourly particulate measurements from around the country. With BigQuery, you can query these vast archives in seconds, helping to answer questions like:

  • Which states have the cleanest air?
  • Is my city’s air quality getting better or worse over time?

In 2015, which states had the cleanest air (in terms of least concentration of PM2.5)?

Montana had the lowest PM2.5 concentrations. 2015 is the last year for which we have an entire year of data, but the trend in 2016 looks similar.

Try this query yourself in BigQuery using the following standard SQL statement:
#standardSQL
SELECT
  state_name, avg(arithmetic_mean) as avg_value
FROM
  `bigquery-public-data.epa_historical_air_quality.pm25_frm_daily_summary`
WHERE
  sample_duration = "24 HOUR"
  AND poc = 1
  AND EXTRACT(YEAR FROM date_local) = 2015
GROUP BY 
  state_name
ORDER BY 
  avg_value
LIMIT 15

Let’s look at PM10 concentrations to see if the air quality in Pittsburgh, Pennsylvania is getting better or worse over time.

It’s getting better! You can replace the city/state name with your location and use the query below in BigQuery. For smaller areas, you can leverage the city, county, or cbsa_name fields (or you can always use latitude/longitude).

Try it yourself with the following standard SQL:

#standardSQL
SELECT
  EXTRACT(YEAR FROM date_local) as year, avg(arithmetic_mean) as avg_value
FROM
  `bigquery-public-data.epa_historical_air_quality.pm10_daily_summary`
WHERE
  poc = 1
  AND sample_duration = "24 HOUR"
  AND city_name = "Pittsburgh"
  AND state_name = "Pennsylvania"
GROUP BY 
  year
ORDER BY 
  year

Take action

Making air quality data available is one more way that Google organizes the world’s information and makes it universally accessible and useful. But we hope that you try these datasets (EPA, OpenAQ) out for yourself and learn something about your own community. If your area doesn’t have any data available, work with your local leaders to publish or share this information. You can even contribute data or code to the OpenAQ open-source project.

We’d love to hear how you're using air quality metrics to make an impact! Share your success on Reddit or by telling us at gcp-public-data@google.com.

If you’d like to learn more about OpenAQ, check out their blog. For more on EPA national standards, see their standards overview.

1 While we update the dataset in BigQuery hourly, the individual locations may send updates less frequently. We display the latest information we have.

  • Big Data Solutions

  • Product deep dives, technical comparisons, how-to's and tips and tricks for using the latest data processing and machine learning technologies.

  • Learn More

12 Months FREE TRIAL

Try BigQuery, Machine Learning and other cloud products and get $300 free credit to spend over 12 months.

TRY IT FREE