Open Images Data

How to query public data sets using BigQuery

BigQuery is a fully managed data warehouse and analytics platform. Public datasets are available for you to analyze using SQL queries. You can access BigQuery public data sets using the web UI the command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.

To get started using a BigQuery public dataset, create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you should also enable billing.

  1. Sign in to your Google account.

    If you don't already have one, sign up for a new account.

  2. Select or create a Cloud Platform project.

    Go to the Manage resources page

  3. Enable billing for your project.

    Enable billing

  4. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, Enable the BigQuery API.

    Enable the API

Dataset overview

This public dataset contains approximately 9 million URLs and metadata for images that have been annotated with labels spanning more than 6,000 categories.

You can start exploring this data in the BigQuery console:

Go to Open Images Dataset

Sample queries

Here are some examples of SQL queries you can run on this data in BigQuery.

These samples use BigQuery’s support for standard SQL. Use the #standardSQL tag to let BigQuery know you want to use standard SQL. For more information about the #standardSQL prefix, see Setting a query prefix.

Which labels are in the dataset?

#standardSQL
SELECT
  *
FROM
  `bigquery-public-data.open_images.dict`
LIMIT
  10;

Run this query

The results are shown here:

+------------+--------------------+
| label_name | label_display_name |
+------------+--------------------+
| /m/0h989   | go                 |
| /m/03bx7vb | ox                 |
| /m/0m09    | ale                |
| /m/0_k2    | ant                |
| /m/01hf_2  | ape                |
| /m/0dzf4   | arm                |
| /m/0jjw    | art                |
| /m/0n5v01m | bag                |
| /m/01nz0z  | bar                |
| /m/01h44   | bat                |
+------------+--------------------+

Which labels have "bus" in their display names?

#standardSQL
SELECT
  *
FROM
  `bigquery-public-data.open_images.dict`
WHERE
  label_display_name LIKE '%bus%'
LIMIT
  20;

Run this query

The results are shown here:

+------------+--------------------+
| label_name | label_display_name |
+------------+--------------------+
| /m/01bjv   | bus                |
| /m/04yqq2  | bust               |
| /m/015zfz  | airbus             |
| /m/02539r  | sorbus             |
| /m/045jsc  | minibus            |
| /m/01jw_1  | bus stop           |
| /m/0c5q0q  | mi rebus           |
| /m/05jlh5  | saltbush           |
| /m/016_bh  | shadbush           |
| /m/02yvhj  | school bus         |
| /m/0f6pl   | trolleybus         |
| /m/015zbk  | airbus a330        |
| /m/018rl2  | airbus a380        |
| /m/03qk36c | airport bus        |
| /m/03_k0c  | busy lizzie        |
| /m/0hgryjx | business bag       |
| /m/01kqwy  | business jet       |
| /m/012t_z  | businessperson     |
| /m/02w11w8 | tour bus service   |
| /m/03n9vx  | double-decker bus  |
+------------+--------------------+

How many images of a trolleybus are in the dataset?

#standardSQL
SELECT
  COUNT(*)
FROM
  `bigquery-public-data.open_images.labels` a
INNER JOIN
  `bigquery-public-data.open_images.images` b
ON
  a.image_id = b.image_id
WHERE
  a.label_name='/m/0f6pl'
  AND a.confidence > 0.5;

Run this query

The results are shown here:

+------+
| f0_  |
+------+
| 3550 |
+------+

What are some landing pages of images with a trolleybus?

#standardSQL
SELECT
  original_landing_url,
  confidence
FROM
  `bigquery-public-data.open_images.labels` l
INNER JOIN
  `bigquery-public-data.open_images.images` i
ON
  l.image_id = i.image_id
WHERE
  label_name='/m/0f6pl'
  AND confidence = 1
  AND subset='validation'
LIMIT
  10;

Run this query

+----------------------------------------------------------+------------+
|                   original_landing_url                   | confidence |
+----------------------------------------------------------+------------+
| https://www.flickr.com/photos/gazeronly/6356698903       |        1.0 |
| https://www.flickr.com/photos/hisgett/3453032426         |        1.0 |
| https://www.flickr.com/photos/metrocincinnati/4400806389 |        1.0 |
| https://www.flickr.com/photos/tjc/165330995              |        1.0 |
| https://www.flickr.com/photos/koraxdc/10888199614        |        1.0 |
| https://www.flickr.com/photos/toms/128871696             |        1.0 |
| https://www.flickr.com/photos/tadokoro/8615989093        |        1.0 |
| https://www.flickr.com/photos/sergejf/8706867707         |        1.0 |
| https://www.flickr.com/photos/daveiam/3492373572         |        1.0 |
| https://www.flickr.com/photos/cityoftoronto/10732215443  |        1.0 |
+----------------------------------------------------------+------------+
5 FULTON Transbay Terminal
by torbakhopper
under CC BY 2.0
Trolley Bus
by Tony Hisgett
under CC BY 2.0
Trolley bus #1472
by Metro Bus
under CC BY 2.0
Intersection
by TimothyJ
under CC BY 2.0

Which images with cherries are in the training set?

#standardsql
SELECT
  i.image_id AS image_id,
  original_url,
  confidence
FROM
  `bigquery-public-data.open_images.labels` l
INNER JOIN
  `bigquery-public-data.open_images.images` i
ON
  l.image_id = i.image_id
WHERE
  label_name='/m/0f8sw'
  AND confidence >= 0.85
  AND Subset='train'
LIMIT
  10;

Run this query

+------------------+-----------------------------------------------------------------+------------+
|     image_id     |                          original_url                           | confidence |
+------------------+-----------------------------------------------------------------+------------+
| 16abc5e3dd5aee38 | https://c2.staticflickr.com/4/3276/2734551390_b3b1f46826_o.jpg  |        0.9 |
| 275344e5e05fbd55 | https://c2.staticflickr.com/6/5515/11645877016_a813d091c1_o.jpg |        0.9 |
| cd9f51a7d2909088 | https://c1.staticflickr.com/3/2661/3704400114_0f37df3c76_o.jpg  |        0.9 |
| 87754460acc77207 | https://c1.staticflickr.com/5/4138/4913426822_a1539dc915_o.jpg  |        0.9 |
| d923fb3fdb415915 | https://c1.staticflickr.com/9/8352/8303394799_d321c27b35_o.jpg  |        0.9 |
| 0fbbf595e9eb88b1 | https://c1.staticflickr.com/1/202/508179173_0a112bdedd_o.jpg    |        0.9 |
| 0485896eb3297811 | https://c2.staticflickr.com/8/7458/9221235455_02570f8348_o.jpg  |        0.9 |
| 118491448098cb46 | https://c1.staticflickr.com/3/2428/3769786809_e12895e412_o.jpg  |        0.9 |
| 847212d0c174bff7 | https://c2.staticflickr.com/4/3124/2397453988_1b3819bde3_o.jpg  |        0.9 |
| 26ed578010c2126b | https://c2.staticflickr.com/8/7557/15584458587_beeaf99d1d_o.jpg |        0.9 |
+------------------+-----------------------------------------------------------------+------------+
Cherries
by Kevin
under CC BY 2.0
Bowl of cherries
by Rebecca Wilson
under CC BY 2.0
Cherries
by liz west
under CC BY 2.0
Cherry-o
by d3adcrab
under CC BY 2.0

About the data

Dataset Source: https://github.com/openimages/dataset

Category: Image, Creative Commons

APA-style citation: Google Research (2016). The Open Images dataset [Image urls and labels]. Available from github: https://github.com/openimages/dataset.

Use: The annotations are licensed by Google Inc. under CC BY 4.0 license.

The images referenced in the dataset are listed as having a CC BY 2.0 license. Note: while we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

Update Frequency: Quarterly

View in BigQuery: Go to Open Images dataset

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...