By Mike Hamberg, Partner Operations Manager, gTech Feeds
Using these new public datasets in BigQuery is a great way to understand air quality in your community.
Take a deep breath: The average person takes between 17,000 and 23,000 breaths a day. But how often do you breathe in poor quality air? Do you know if the air in your town is clean?
We’re helping answer those questions. We’ve leveraged decades of data from the U.S. EPA and real-time information from OpenAQ to add two air quality datasets to the Google Cloud Public Datasets program:
- OpenAQ, which includes real-time air quality from 47 countries around the world
- EPA, which includes the last 27 years of air quality from around the United States
OpenAQ: Real-time air quality
The OpenAQ dataset is updated hourly1 to show a nearly live look at government-reported air quality around the world. With this dataset, you can answer questions like:
- Where are the global hotspots for poor air quality right now?
- How does one city compare to others?
Let’s take a deeper dive on a couple of these by charting them in Data Studio. First, where are the global hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?
Answer: Hualpén, Chile and 2 locations in Turkey currently have the highest concentrations of PM10.You can try this query yourself in BigQuery using the following standard SQL:
Zooming the dashboard in on Europe, we can see how cities compare (and we can infer that we don’t yet have data in some places).
#standardSQL SELECT location, city, country, value, CONCAT(CAST(latitude AS STRING), ', ', CAST(longitude AS STRING)) AS latlong FROM `bigquery-public-data.openaq.global_air_quality` WHERE pollutant = "pm10" ORDER BY value DESC
You can also see an interactive map on the OpenAQ website.
EPA: Historical air quality
The EPA dataset contains over 25GB of data, ranging from annual summaries to hourly particulate measurements from around the country. With BigQuery, you can query these vast archives in seconds, helping to answer questions like:
- Which states have the cleanest air?
- Is my city’s air quality getting better or worse over time?
In 2015, which states had the cleanest air (in terms of least concentration of PM2.5)?
Montana had the lowest PM2.5 concentrations. 2015 is the last year for which we have an entire year of data, but the trend in 2016 looks similar.
#standardSQL SELECT state_name, avg(arithmetic_mean) as avg_value FROM `bigquery-public-data.epa_historical_air_quality.pm25_frm_daily_summary` WHERE sample_duration = "24 HOUR" AND poc = 1 AND EXTRACT(YEAR FROM date_local) = 2015 GROUP BY state_name ORDER BY avg_value LIMIT 15
Let’s look at PM10 concentrations to see if the air quality in Pittsburgh, Pennsylvania is getting better or worse over time.
It’s getting better! You can replace the city/state name with your location and use the query below in BigQuery. For smaller areas, you can leverage the city, county, or cbsa_name fields (or you can always use latitude/longitude).
Try it yourself with the following standard SQL:
#standardSQL SELECT EXTRACT(YEAR FROM date_local) as year, avg(arithmetic_mean) as avg_value FROM `bigquery-public-data.epa_historical_air_quality.pm10_daily_summary` WHERE poc = 1 AND sample_duration = "24 HOUR" AND city_name = "Pittsburgh" AND state_name = "Pennsylvania" GROUP BY year ORDER BY year
Making air quality data available is one more way that Google organizes the world’s information and makes it universally accessible and useful. But we hope that you try these datasets (EPA, OpenAQ) out for yourself and learn something about your own community. If your area doesn’t have any data available, work with your local leaders to publish or share this information. You can even contribute data or code to the OpenAQ open-source project.
1 While we update the dataset in BigQuery hourly, the individual locations may send updates less frequently. We display the latest information we have.