Load data into BigQuery with the bq
tool
This quickstart shows you how to use the bq
command-line tool
to run queries and load data into BigQuery.
Before you begin
The BigQuery sandbox lets you explore BigQuery at no cost. For more information, see Enable the BigQuery sandbox.
Install and
initialize the Google Cloud CLI. Create a Google Cloud project.
gcloud projects create PROJECT_ID
Run queries
This section shows you how to write queries and get help.
Examine a table
Examine the schema of a specific table:
bq show PROJECT_ID:DATASET_ID.TABLE_ID
Replace the following:
PROJECT_ID
: your project IDDATASET_ID
: your dataset IDTABLE_ID
: your table ID
If the project and dataset IDs are the default values for your bq
command-line tool, then
you can omit the project and dataset IDs:
bq show TABLE_ID
BigQuery offers several
sample tables
that you can query. In this quickstart, you will run queries against the
shakespeare
table, which contains an entry for every word in every Shakespeare
play.
Examine the shakespeare
table in the samples
dataset:
bq show bigquery-public-data:samples.shakespeare
The output is similar to the following:
Table bigquery-public-data:samples.shakespeare Last modified Schema Total Rows Total Bytes Expiration ----------------- ------------------------------------ ------------ ------------- ------------ 26 Aug 14:43:49 |- word: string (required) 164656 6432064 |- word_count: integer (required) |- corpus: string (required) |- corpus_date: integer (required)
View bq
tool help
bq
tool help provides a list of commands and arguments available for
the bq
command-line tool.
View detailed information about the
bq
tool:bq help
View information about a specific command:
bq help query
In this example, the call to
bq help
retrieves information about thebq query
command.
Search for strings
The bq query
command lets you run SQL queries on data.
Run a query to see how many times the substring
raisin
appears in Shakespeare's works using thebq query
command:bq query --use_legacy_sql=false \ 'SELECT word, SUM(word_count) AS count FROM `bigquery-public-data`.samples.shakespeare WHERE word LIKE "%raisin%" GROUP BY word'
The output is similar to the following:
Waiting on job_dcda37c0bbed4c669b04dfd567859b90 ... (0s) Current status: DONE +---------------+-------+ | word | count | +---------------+-------+ | Praising | 4 | | raising | 5 | | raisins | 1 | | praising | 8 | | dispraising | 2 | | dispraisingly | 1 | +---------------+-------+
Run a query to see how many times the substring
huzzah
appears in Shakespeare's works using thebq query
command:bq query --use_legacy_sql=false \ 'SELECT word FROM `bigquery-public-data`.samples.shakespeare WHERE word = "huzzah"'
Since the substring doesn't appear in Shakespeare's works, no results are returned.The output is similar to the following:
Waiting on job_e19 ... (4s) Current status: DONE
Create a table and load data
In the next sections, you create a new table and place it in a new dataset. Every table is stored inside a dataset. A dataset is a group of resources such as tables and views.
Download the sample data
Follow these steps to download the sample data. The sample data is provided by the US Social Security Administration and contains approximately 7 MB of data about popular baby names.
Download and extract the baby names zip file.
The zip file contains a file named
NationalReadMe.pdf
that describes the dataset schema. Learn more about the dataset.Open the file
yob2010.txt
to see what it looks like. It's a comma-separated value (CSV) file that contains the following three columns: name, sex (M
orF
), and number of children with that name. The file has no header row.Copy or move the file
yob2010.txt
into the directory where you're running thebq
command-line tool. If you're running thebq
command-line tool in Cloud Shell, upload theyob2010.txt
file. For more information, see Managing files with Cloud Shell.
Create a dataset
Next, create a new dataset.
-
To see whether your default project has any existing datasets, use the
bq ls
command.bq ls
The output looks similar to the following:
datasetId ------------- olddataset
- List the datasets in a specific project by including
the project ID followed by a colon (
:
):bq ls publicdata:
This example lists the datasets in the
publicdata
project.The output is similar to the following:
datasetId ----------- samples
-
In the project you selected for this quickstart, create a new dataset named
babynames
:bq mk babynames
A dataset name can be up to 1,024 characters long and consist of A-Z, a-z, 0-9, and the underscore. The name cannot start with a number or underscore, and it cannot have spaces.
The output is similar to the following:
Dataset 'myprojectid:babynames' successfully created.
- Confirm that the dataset now appears as part
of the default project:
bq ls
The output is similar to the following:
datasetId ------------- olddataset babynames
Load the data
-
In the
babynames
dataset that you created, load your source fileyob2010.txt
into a new table callednames2010
:bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer
The
bq load
command creates a table and loads data in a single step.The command includes the following arguments:
- datasetID:
babynames
- tableID:
names2010
- source:
yob2010.txt
(if necessary, include the full path) - schema:
name:string,gender:string,count:integer
- For the names of supported data types, see Standard SQL data types.
The output is similar to the following:
Upload complete. Waiting on job_4f0c0878f6184119abfdae05f5194e65 ... (35s) Current status: DONE
- datasetID:
-
Confirm that the table now appears in the dataset:
bq ls babynames
The output is similar to the following:
tableId Type ----------- ------- names2010 TABLE
-
View the schema:
bq show babynames.names2010
The output is similar to the following
Table myprojectid:babynames.names2010 Last modified Schema Total Rows Total Bytes Expiration ----------------- ------------------- ------------ ------------- ------------ 13 Mar 15:31:00 |- name: string 34041 653855 |- gender: string |- count: integer
By default, when you load data, BigQuery expects UTF-8 encoded data.
If you have data in ISO-8859-1 (or Latin-1) encoding and you're having
problems with it, instruct BigQuery to treat your data as Latin-1
using the -E
flag. For more information, see
Encoding.
Query the sample data
-
Query the data for the most popular girls' names:
bq query 'SELECT name, count FROM babynames.names2010 WHERE gender = "F" ORDER BY count DESC LIMIT 5'
The output is similar to the following:
Waiting on job_58c0f5ca52764ef1902eba611b71c651 ... (0s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Isabella | 22731 | | Sophia | 20477 | | Emma | 17179 | | Olivia | 16860 | | Ava | 15300 | +----------+-------+
-
Query the data for the most unusual boys' names:
bq query 'SELECT name, count FROM babynames.names2010 WHERE gender = "M" ORDER BY count ASC LIMIT 5'
The minimum count is 5 because the source data omits names with fewer than 5 occurrences.
The output is similar to the following:
Waiting on job_556ba2e5aad340a7b2818c3e3280b7a3 ... (1s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Aarian | 5 | | Aaidan | 5 | | Aamarion | 5 | | Aadhavan | 5 | | Aaqib | 5 | +----------+-------+
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, delete the Cloud project with the resources.
Remove the
babynames
dataset:bq rm --recursive=true babynames
The
--recursive
flag deletes all tables in the dataset, including thenames2010
table.To confirm the delete command, type
y
.
If you followed this quickstart in a new project, then you can delete the project.
What's next
To learn more about the
bq
command-line tool, see Using thebq
command-line tool.To learn more about loading data into BigQuery, see Introduction to loading data.
To learn more about querying data, see Overview of querying BigQuery data.
To learn how to export data out of BigQuery, see Exporting table data.
To learn more about accessing BigQuery programmatically, see the REST API reference or the BigQuery API Client Libraries page.