Query a public dataset with the bq tool
Learn how to examine and query a public dataset with the bq command-line tool.
To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
Ensure that the BigQuery API is enabled.
If you created a new project, the BigQuery API is automatically enabled.
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
If you don't enable billing for the Google Cloud project that you use in this tutorial, then you will work with data in the BigQuery sandbox. The BigQuery sandbox lets you learn BigQuery with a limited set of BigQuery features at no charge.
Examine a public dataset
BigQuery offers several sample tables in the
bigquery-public-data.samples
dataset that you can query. In this tutorial,
you run queries on the shakespeare
table, which contains an entry for every
word in every Shakespeare play.
Examine the shakespeare
table in the samples
dataset:
bq show bigquery-public-data:samples.shakespeare
The output is similar to the following. Some columns are omitted to simplify the output.
Last modified Schema Total Rows Total Bytes
----------------- ------------------------------------ ------------ ------------
14 Mar 17:16:45 |- word: string (required) 164656 6432064
|- word_count: integer (required)
|- corpus: string (required)
|- corpus_date: integer (required)
Query a public dataset
Use the bq query
command to run SQL queries on data.
Determine how many times the substring
raisin
appears in Shakespeare's works:bq query --use_legacy_sql=false \ 'SELECT word, SUM(word_count) AS count FROM `bigquery-public-data.samples.shakespeare` WHERE word LIKE "%raisin%" GROUP BY word;'
The output is similar to the following:
+---------------+-------+ | word | count | +---------------+-------+ | praising | 8 | | Praising | 4 | | raising | 5 | | dispraising | 2 | | dispraisingly | 1 | | raisins | 1 | +---------------+-------+
Search for the substring
huzzah
in Shakespeare's works:bq query --use_legacy_sql=false \ 'SELECT word FROM `bigquery-public-data.samples.shakespeare` WHERE word = "huzzah";'
Because the substring doesn't appear in Shakespeare's works, no results are returned.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, delete the Google Cloud project with the resources.
Delete the project
If you used the BigQuery sandbox to query the public dataset, then billing is not enabled for your project.The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Learn more about using the bq tool.
- Learn about the BigQuery sandbox.
- Learn more about BigQuery public datasets.
- Learn how to load data into BigQuery.
- Learn more about querying data in BigQuery.
- Get updates about BigQuery.
- Learn about BigQuery pricing.
- Learn about BigQuery quotas and limits.