OpenAQ: Real-time Air Quality Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

OpenAQ is an open-source project to surface live, real-time air quality data from around the world. OpenAQ's ”mission is to enable previously impossible science, impact policy, and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.

Scientists, researchers, developers, and citizens can use this data to understand current air quality near them. The dataset only includes the most current measurement available for the location (no historical data).

Read more about OpenAQ.

You can start exploring this data in the BigQuery console:

Go to OpenAQ data

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

What is the current global picture of PM10 concentration?

PM10 (small particulate matter of 10 microns or less in diameter) is known to cause disease and cancer, contributing to an estimated 3 million premature deaths worldwide per year in 2012. This query reveals PM10 concentrations by location.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  location,
  country,
  value,
  latitude,
  longitude
FROM
  `bigquery-public-data.openaq.global_air_quality`
WHERE
  pollutant = "pm10"

Command-line

bq query --use_legacy_sql=false '
SELECT
  location,
  country,
  value,
  latitude,
  longitude
FROM
  `bigquery-public-data.openaq.global_air_quality`
WHERE
  pollutant = "pm10"'

Sample results are shown here:

+-------------------------+---------+---------+---------------+---------------+
| location                | country | value   | latitude      | longitude     |
+-------------------------+---------+---------+---------------+---------------+
| Grünbach bei Freistadt  | AT      | 17.8966 | 48.5311111095 | 14.5747222222 |
| Traismauer              | AT      | 9.0     | 48.3533333295 | 15.74694444   |
| Salzburg Mirabellplatz  | AT      | 11.0    | 47.8055555995 | 13.043333     |
| Wien Stadlau            | AT      | 9.0     | 48.2280593769 | 16.4605496031 |
| Enzenkirchen im Sauwald | AT      | 11.0    | 48.3916468078 | 13.6711009829 |
| ...                                                                         |
+-------------------------+---------+---------+---------------+---------------+

If you concatenate latitude and longitude, you can use the query results to visualize global PM10 hotspots. For example, you can create a visualization like the following in Google Data Studio. For a tutorial on using Google Data Studio with BigQuery, see Visualizing BigQuery Data Using Google Data Studio.

Population by zip code

One interesting result in the visualization is the point shown with bad air quality directly below western Africa. This is actually bad data in the dataset recorded at 0 degrees latitude, 0 degrees longitude.

Which 10 locations have had the worst air quality this month (as measured by high PM10)?

This query retrieves the top 10 locations with the highest concentrations of PM10.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  location,
  city,
  country,
  value,
  timestamp
FROM
  `bigquery-public-data.openaq.global_air_quality`
WHERE
  pollutant = "pm10"
  AND timestamp > "2017-04-01"
ORDER BY
  value DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  location,
  city,
  country,
  value,
  timestamp
FROM
  `bigquery-public-data.openaq.global_air_quality`
WHERE
  pollutant = "pm10"
  AND timestamp > "2017-04-01"
ORDER BY
  value DESC
LIMIT
  10'

Sample results are shown here:

+----------------+-----------------+-----------+----------+--------------------------+
| location       | city            | country   | value    | timestamp                |
+----------------+-----------------+-----------+----------+--------------------------+
| JUNJI          | Hualpén         | CL        | 2088.35  | 2017-04-07 15:00:00 UTC  |
| Yatağan        | Muğla           | TR        | 1000.0   | 2017-04-24 17:00:00 UTC  |
| MOGUER         | CCAA Andalucia  | ES        | 764.4    | 2017-04-21 07:00:00 UTC  |
| MARBELLA ARCO  | CCAA Andalucia  | ES        | 493.0    | 2017-04-21 07:00:00 UTC  |
| Anand Vihar    | Delhi           | IN        | 454.0    | 2017-04-20 06:50:00 UTC  |
| Mandir Marg    | Delhi           | IN        | 322.0    | 2017-04-20 06:55:00 UTC  |
| R K Puram      | Delhi           | IN        | 318.0    | 2017-04-20 03:50:00 UTC  |
| Punjabi Bagh   | Delhi           | IN        | 290.0    | 2017-04-20 06:05:00 UTC  |
| Chandrapur     | Chandrapur      | IN        | 278.0    | 2017-04-24 16:30:00 UTC  |
| Nisekh         | Ulaanbaatar     | MN        | 263.0    | 2017-04-24 16:00:00 UTC  |
+----------------+-----------------+-----------+----------+--------------------------+

In the sample results, Hualpén, Chile has the worst air quality in April, 2017. Depending on when you query the dataset, your results may vary. The information is updated hourly.

Which city in the US has improved its air quality (PM2.5) the most since 1990?

This query joins the EPA historical dataset with OpenAQ to retrieve the cities in the US with the most improved PM2.5 concentrations since 1990.

Web UI

Open the following query in the Web UI

#standardSQL
SELECT
  aq.location,
  aq.city,
  epa.arithmetic_mean,
  aq.value,
  (epa.arithmetic_mean - aq.value) AS air_quality_difference,
  aq.timestamp,
  epa.state_code,
  epa.latitude,
  epa.longitude
FROM
  `bigquery-public-data.openaq.global_air_quality` AS aq
JOIN
  `bigquery-public-data.epa_historical_air_quality.air_quality_annual_summary` AS epa
ON
  ROUND(aq.latitude, 1) = ROUND(epa.latitude, 1)
  AND ROUND(aq.longitude, 1) = ROUND(epa.longitude, 1)
WHERE
  epa.units_of_measure = "Micrograms/cubic meter (LC)"
  AND epa.parameter_name = "Acceptable PM2.5 AQI & Speciation Mass"
  AND epa.year = 1990
  AND epa.sample_duration = "24 HOUR"
  AND epa.poc = 1
  AND aq.pollutant = "pm25"
ORDER BY
  air_quality_difference DESC
LIMIT
  10

Command-line

bq query --use_legacy_sql=false '
SELECT
  aq.location,
  aq.city,
  epa.arithmetic_mean,
  aq.value,
  (epa.arithmetic_mean - aq.value) AS air_quality_difference,
  aq.timestamp,
  epa.state_code,
  epa.latitude,
  epa.longitude
FROM
  `bigquery-public-data.openaq.global_air_quality` AS aq
JOIN
  `bigquery-public-data.epa_historical_air_quality.air_quality_annual_summary` AS epa
ON
  ROUND(aq.latitude, 1) = ROUND(epa.latitude, 1)
  AND ROUND(aq.longitude, 1) = ROUND(epa.longitude, 1)
WHERE
  epa.units_of_measure = "Micrograms/cubic meter (LC)"
  AND epa.parameter_name = "Acceptable PM2.5 AQI & Speciation Mass"
  AND epa.year = 1990
  AND epa.sample_duration = "24 HOUR"
  AND epa.poc = 1
  AND aq.pollutant = "pm25"
ORDER BY
  air_quality_difference DESC
LIMIT
  10'

Sample results are shown here:

+-----------------------+--------------+-----------------+--------+------------------------+-------------------------+------------+-----------+------------+
| location              | city         | arithmetic_mean | value  | air_quality_difference | timestamp               | state_code | latitude  | longitude  |
+-----------------------+--------------+-----------------+--------+------------------------+-------------------------+------------+-----------+------------+
| McMillan NCORE        | WASHINGTON   | 15.31707        | 0.002  | 15.315069999999999     | 2017-04-27 20:00:00 UTC | 11         | 38.876233 | -77.034076 |
| Acadia NP - McFarlan  | HANCOCK      | 8.179546        | -1.0   | 9.179546               | 2017-04-27 18:00:00 UTC | 47         | 44.37705  | -68.2609   |
| Big Bend NP           | BREWSTER     | 6.214457        | 1.4    | 4.814457000000001      | 2017-04-27 20:00:00 UTC | 46         | 29.30265  | -103.17781 |
| Badlands              | JACKSON      | 5.304384        | 1.2    | 4.104384               | 2017-04-27 20:00:00 UTC | 23         | 43.74561  | -101.941218|
| Look Rock - GSMNP     | Knoxville    | 14.81396        | 11.2   | 3.6139600000000005     | 2016-12-20 15:00:00 UTC | 48         | 35.63348  | -83.941606 |
| Shenandoah NP         | MADISON      | 11.873345       | 8.6    | 3.273345000000001      | 2017-04-27 20:00:00 UTC | 51         | 38.5231   | -78.43471  |
| ...                                                                                                                                                      |
+-----------------------+--------------+-----------------+--------+------------------------+-----------+-------------+------------+-----------+------------+

In the query results, the Washington, D.C. metro area has improved the most in the timeframe measured. Your results may vary. OpenAQ has live data that updates hourly.

Caveats: The datasets were joined using a rounded latitude/longitude (1 decimal) which is not a perfect way to make matches. Matches should be made on readings within 11km of each other. Rounding to 1 decimal only yields six matches. This is primarily because monitoring did not become required until 1998. If you expand the rounding to 0 digits, you get 36 matches with some duplicates. The dataset of potential matches is limited since you need to have comparable units of measure.

About the data

Dataset Source: openaq.org

Category: Science

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://openaq.org/#/about - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Hourly

View in BigQuery: Go to OpenAQ data

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...