This page explains how to use the bq
command-line tool
to run queries and load data.
Before you begin
Before you begin this quickstart, use the Google Cloud Console to create or select a project and install the Cloud SDK.
-
Sign in to your Google Account.
If you don't already have one, sign up for a new account.
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
- Install and initialize the Cloud SDK.
- BigQuery is automatically enabled in new projects. To activate BigQuery in a preexisting project, go to Enable the BigQuery API.
- BigQuery provides a sandbox if you do not want to provide a credit card or enable billing for your project. The steps in this topic work for a project whether or not your project has billing enabled. If you optionally want to enable billing, see Learn how to enable billing.
Examine a table
BigQuery offers several
sample tables
that you can query. In this quickstart, you run some queries against the
shakespeare
table, which contains an entry for every word in every Shakespeare play.
To examine the schema of a specific table, run the following command. If the
project and dataset IDs are the default values for your bq
tool,
you can omit the project and dataset IDs.
bq show PROJECT_ID:DATASET_ID.TABLE_ID
The following example examines the shakespeare
table in the
samples
dataset:
bq show bigquery-public-data:samples.shakespeare
Output:
Table bigquery-public-data:samples.shakespeare Last modified Schema Total Rows Total Bytes Expiration ----------------- ------------------------------------ ------------ ------------- ------------ 26 Aug 14:43:49 |- word: string (required) 164656 6432064 |- word_count: integer (required) |- corpus: string (required) |- corpus_date: integer (required)
Run the help command
Use bq help
to get detailed information about the bq
command-line tool.
bq help
Include a command name to get information about a specific command. For example,
the following call to bq help
retrieves information about the query
command.
bq help query
Run a query
Run a query to see how many times the substring raisin
appears in
Shakespeare's works.
To run a query, run the command bq query "SQL_STATEMENT"
.
Escape any quotation marks inside the
SQL_STATEMENT
with a\
mark, orUse a different quotation mark type than the surrounding marks (
"
versus'
).
The following standard SQL query does a word count for the number of times that
the substring raisin
appears in all of Shakespeare's works.
bq query --use_legacy_sql=false \ 'SELECT word, SUM(word_count) AS count FROM `bigquery-public-data`.samples.shakespeare WHERE word LIKE "%raisin%" GROUP BY word'
Output:
Waiting on job_dcda37c0bbed4c669b04dfd567859b90 ... (0s) Current status: DONE +---------------+-------+ | word | count | +---------------+-------+ | Praising | 4 | | raising | 5 | | raisins | 1 | | praising | 8 | | dispraising | 2 | | dispraisingly | 1 | +---------------+-------+
If you search for a word that isn't in Shakespeare's works, no results are
returned. For example, the following search for huzzah
returns no matches.
bq query --use_legacy_sql=false \ 'SELECT word FROM `bigquery-public-data`.samples.shakespeare WHERE word = "huzzah"'
Output:
Waiting on job_e19 ... (4s) Current status: DONE $
Create a new table
Now create your own table. Every table is stored inside a dataset. A dataset is a group of resources such as tables and views.
Step 1: Download the sample data
The sample data contains approximately 7 MB of data about popular baby names, and it is provided by the US Social Security Administration.
Extract the file.
The zip file contains a read me file that describes the dataset schema. Learn more about the dataset.
Open the file named
yob2010.txt
to see what it looks like. The file is a CSV file with the following three columns: name, sex (M
orF
), and number of children with that name. The file has no header row.Copy or move the
yob2010.txt
file into the directory where you are running thebq
command-line tool. If you are running thebq
command-line tool in Cloud Shell, upload theyob2010.txt
file; see Managing Files with Cloud Shell.
Step 2: Create a new dataset
-
Use the
bq ls
command to see whether your default project has any existing datasets.bq ls
Sample output:
datasetId ------------- olddataset
-
Run
bq ls
again to list the datasets in a specific project by including the project ID followed by a colon (:). The following example lists the datasets in thepublicdata
project.bq ls publicdata:
Output:
datasetId ----------- samples
-
Use the
bq mk
command to create a new dataset namedbabynames
in your default project. A dataset name can be up to 1,024 characters long, and consist of A-Z, a-z, 0-9, and the underscore, but it cannot start with a number or underscore, or have spaces.bq mk babynames
Sample output:
Dataset 'myprojectid:babynames' successfully created.
-
Run
bq ls
to confirm that the dataset now appears as part of the default project:bq ls
Sample output:
datasetId ------------- olddataset babynames
Step 3: Upload the table
The bq load
command creates or updates a table and loads data in a single step.
-
Run the
bq load
command to load your source file into a new table callednames2010
in thebabynames
dataset you created. By default, this command runs synchronously and takes a few seconds to complete.bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer
The
bq load
command arguments:- datasetID:
babynames
- tableID:
names2010
- source:
yob2010.txt
: if necessary, include the full path to the file - schema:
name:string,gender:string,count:integer
Sample output:
Waiting on job_4f0c0878f6184119abfdae05f5194e65 ... (35s) Current status: DONE
- datasetID:
-
Run
bq ls
to confirm that the table now appears in the dataset:bq ls babynames
Output:
tableId Type ----------- ------- names2010 TABLE
-
Run
bq show
to see the schema:bq show babynames.names2010
Output:
Table myprojectid:babynames.names2010 Last modified Schema Total Rows Total Bytes Expiration ----------------- ------------------- ------------ ------------- ------------ 13 Mar 15:31:00 |- name: string 34041 653855 |- gender: string |- count: integer
By default, when you load data, BigQuery expects UTF-8 encoded data.
If you have data in ISO-8859-1 (or Latin-1) encoding and you're having
problems with it, tell BigQuery to treat your data as Latin-1
using the -E
flag. For more information, see
Encoding.
Step 4: Run queries
-
Run the following command to return the most popular girls' names:
bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'F' ORDER BY count DESC LIMIT 5"
Output:
Waiting on job_58c0f5ca52764ef1902eba611b71c651 ... (0s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Isabella | 22731 | | Sophia | 20477 | | Emma | 17179 | | Olivia | 16860 | | Ava | 15300 | +----------+-------+
-
Run the following command to see the most unusual boys' names.
The minimum count is 5 because the source data omits names
with fewer than 5 occurrences.
bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'M' ORDER BY count ASC LIMIT 5"
Output:
Waiting on job_556ba2e5aad340a7b2818c3e3280b7a3 ... (1s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Aarian | 5 | | Aaidan | 5 | | Aamarion | 5 | | Aadhavan | 5 | | Aaqib | 5 | +----------+-------+
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this quickstart, follow these steps.
Run the
bq rm
command to remove thebabynames
dataset. Use the-r
flag to delete all tables in the dataset, include thenames2010
table.bq rm -r babynames
Confirm the delete command by typing
y
.
What's next
To learn more about the
bq
command-line tool, see Using thebq
command-line tool.To learn more about loading data into BigQuery, see Introduction to loading data.
To learn more about querying data, see Overview of querying BigQuery data.
To learn how to export data out of BigQuery, see Exporting table data.
To learn more about accessing BigQuery programmatically, see the REST API reference or the BigQuery API client libraries page.