Using the bq Command-Line Tool

bq is a python-based tool that accesses BigQuery from the command line.


Install the Google Cloud SDK.

General usage

Flag positioning

bq supports two kinds of flags—common flags and command flags—and they must be used in the order shown here:

bq --common_flags <bq_command> --command-specific_flags <command_arguments>
  • Common flags are flags that apply to all commands, such as --project_id or --apilog. Call bq --helpshort for a list of often-used common flags.
  • Command-specific flags are any flags that apply to a specific command: for example, -p, -j, -d for the ls command (to list projects, jobs, and datasets respectively).

Listing common flags after the command will usually fail.

Example: bq --format=json ls -d

Help and debugging

  • For a full list of commands, run bq help
  • For help on a specific command, run bq help <command>
  • To see the actual requests sent and received, which is useful for debugging your applications, add the flag --apilog=<file> to save a log of operations to file. Using - instead of a file path will print the log to the console. bq works by making standard REST-based API calls, which can be useful to see. It is also useful to attach this log when reporting issues.
  • To help troubleshoot errors, use the --format=prettyjson flag when getting the job status. Using this flag outputs the error object, including the reason property, which you can use to look up troubleshooting steps at troubleshooting errors. For example, bq --format=prettyjson show -j <job id>

Setting default values for common flags

The first time you run bq, you will be prompted for a default project ID. This is stored in your .bigqueryrc file.

You can update this, or add additional common flag values by modifying your .bigqueryrc file. Add new default values, one per line, as flagname=flagvalue. You can only set common flag defaults in this file; command-specific flags will raise an error when trying to run bq. This file is read every time you run bq in the single usage mode, so changes should be updated immediately. However when you run bq in interactive mode (bq shell) you must restart the shell before changes will be imported.

cat ~/.bigqueryrc
project_id = 1092187650

Running asynchronous operations

The bq tool runs asynchronous commands by updating status counter on the command line. If a command is going to take a very long time you can instead ask for the job ID when starting the job and return control immediately, and then later ask bq to hook into that running job. This frees up your command-line for doing other things. Note that this is useful only for potentially long-running jobs such as queries or loads.

To start an asynchronous job and ask for the ID, call your bq command with the --nosync flag, which will return the job ID. To get the job results (or print out job status), pass this job ID into the command bq wait <job_id>.

The following example starts an asynchronous load job, lists the datasets in the current database, and later makes a call to get the job results.

$ bq --nosync load mydataset.names2010 gs://bigquerybucket/yob2010.txt name:string,gender:string,count:integer
Successfully started load job job_f0afc92f7a9043849e5cae7b2e7ea2f9
$ bq ls
$ bq wait job_f0afc92f7a9043849e5cae7b2e7ea2f9

Running bq in interactive mode

You can run bq in an interactive mode, where you don't need to prefix the commands with "bq". To start interactive mode, call bq shell. The prompt is the ID of the default project. To exit interactive mode, type "exit".

bq shell
Welcome to BigQuery! (Type help for more information.)
10000000021> ls
10000000021> exit

Useful common flags

Common flags are used between bq and the command. For a full list of flags, call bq --help.

Here are some of the most useful flags:

  • --apilog - Turn on logging of all server requests and responses. If no string is provided (--apilog=), log to stdout; if a string is provided, instead log to that file (--apilog=filename).
  • --format [none|json|prettyjson|csv|sparse|pretty] - The output format.

Listing projects, datasets, and tables

To list objects, use the bq ls command with the following syntax:

bq ls [<project_id>:][<dataset_id>]

<project_id> and <dataset_id> are optional if you have those values defined in your .bigqueryrc file, however values that you pass in to bq ls will override any defined values in .bigqueryrc. A project ID is not the friendly name; a project ID is usually just a string of digits.

More information:

Working with projects

List projects

To list all projects:

bq ls -p

Set a default project

When you first run bq it will ask you for a default project ID and save that in the .bigqueryrc file. You can edit this file to change the value of your default project.

See your default project

Run cat ~/.bigqueryrc to see your default values. Alternatively, when you run in interactive mode, the prompt lists the default project ID.

Working with datasets

Create a dataset

Use the bq mk command to create a dataset:

bq mk [DATASET_ID]

Where [DATASET_ID] is a valid dataset ID. A valid dataset ID:

  • Contains only letters, numbers, or underscores [a-zA-Z0-9_]
  • Is unique within its project

Dataset IDs are case-sensitive. For example, my_dataset and My_Dataset are different IDs that can coexist in the same project.

Set a default dataset

You can specify a default dataset, so that you don't have to qualify your tables or other actions with a dataset ID when using your default dataset. To do so, add the following line to your .bigqueryrc file:


A quick way to do this (replace [DATASET_ID] with your dataset ID):

echo dataset_id=[DATASET_ID] >> ~/.bigqueryrc

List datasets

The syntax differs, depending on whether you have a default project or dataset defined:

# List datasets in the default project:
bq ls

# List datasets in another project:
bq ls [PROJECT_ID]:

# List datasets when you have a default dataset defined:
bq ls -d
bq ls :

Working with tables

Get table information

bq show <project_id>:<dataset_id>.<table_id>


bq show publicdata:samples.shakespeare
    tableId      Last modified                  Schema
 ------------- ----------------- ------------------------------------
  shakespeare   01 Sep 13:46:28   |- word: string (required)
                                  |- word_count: integer (required)
                                  |- corpus: string (required)
                                  |- corpus_date: integer (required)

Previewing table data

bq head [-n <rows>] <project_id>:<dataset_id>.<table_id>


bq head -n 10 publicdata:samples.shakespeare
|     word     | word_count |    corpus    | corpus_date |
| brave        |          6 | 1kinghenryiv |        1597 |
| profession   |          1 | 1kinghenryiv |        1597 |
| treason      |          2 | 1kinghenryiv |        1597 |
| Ned          |          9 | 1kinghenryiv |        1597 |
| answered     |          1 | 1kinghenryiv |        1597 |
| Perceived    |          1 | 1kinghenryiv |        1597 |
| 'You         |          1 | 1kinghenryiv |        1597 |
| degenerate   |          1 | 1kinghenryiv |        1597 |
| neighbouring |          1 | 1kinghenryiv |        1597 |
| grandam      |          1 | 1kinghenryiv |        1597 |

Note that this operation is only intended for previewing the contents of a table and is not an efficient way to extract a large portion of a table. If not explicitly specified, the command defaults to returning 100 rows.

List tables

The syntax differs, depending on whether you have a default project or dataset defined:

# In the default project:
  bq ls dataset_id

# In the default project and dataset:
bq ls

# In another project or dataset:
  bq ls [project_id:][dataset_id]

Create a table from a file

You can load the following file formats to create a table or append data to an existing table:

  • Uncompressed CSV, JSON, or Avro files from Google Cloud Storage
  • Schema files cannot be loaded from Google Cloud Storage. If you are creating a table using both a data file and a schema file, the schema file must be local.

  • Compressed (gzipped) CSV or JSON files from Google Cloud Storage
  • Cloud Datastore Backup files from Google Cloud Storage
  • Uncompressed CSV, JSON, or Avro files from your disk

You can create a new table by creating a new schema and then loading the files in separate calls, or you can combine both actions into a single call with the following syntax:

bq load <destination_table> <data_source_uri> <table_schema>
The fully-qualified table name of the table to create, or append to if the table already exists
The source CSV data file used to populate the table. Note that this can be an uncompressed local file or, alternatively, a fully-qualified Google Cloud Storage URI referring to an uncompressed or gzipped file, in the format gs://bucket/file. For example, the following are all valid file types: my_file.csv, gs://[BUCKET_NAME]/my_file.csv.gz, or my_file.csv.gz.

You can load multiple files into a table by specifying a comma-separated list of files or by using a single wildcard character (*). The wildcard character cannot be used as part of the bucket name.

A description of the table schema to use. This can be either a local file name or a comma-separated list of column_name:datatype pairs. We will use the comma-separated list for this example. Try using the following schema descriptor for your table: name:string,gender:string,count:integer where "name", "gender", and "count" are labels that are assigned to the columns in the new table.

If you would like, you can also specify your schema in a separate file and provide that file as the table schema. Your schema file must contain a single array object with entries that provide the following properties:

  • "name": Name of the column
  • "type": Type of data, e.g. string. For a full list of allowed data types, see BigQuery data types.
  • "mode" (optional): Whether this field can be null

A sample schema file might look like the following:

  {"name": "name", "type": "string", "mode": "required"},
  {"name": "gender", "type": "string", "mode": "nullable"},
  {"name": "count", "type": "integer", "mode": "required"}

See the bq command-line tool quickstart for a detailed walkthrough of creating and populating a table using bq.

When using the bq load command, you can specify the following optional flags:

Type: string
Description: The source file type. This can be either JSON, CSV, Avro, or Cloud Datastore Backup files. The default value is CSV.
Valid Values:
  • CSV
  • AVRO
bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV|AVRO|DATASTORE_BACKUP]  <destination_table> <data_source_uri> [<table_schema>]
--field_delimiter, -F
Type: string
Description: The character that indicates the boundary between columns in the input file. By default, this is a comma.
Valid Values: BigQuery converts the delimiter string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (',').
bq load -F '|'  <destination_table> <data_source_uri> [<table_schema>]
bq load --field_delimiter='|' <destination_table> <data_source_uri> [<table_schema>]
--encoding, -E
Type: string
Description: The character encoding used by the input file.
Valid Values:
  • UTF-8
  • ISO-8859-1
bq load -E ISO-8859-1 <destination_table> <data_source_uri> [<table_schema>]
bq load --encoding=UTF-8 <destination_table> <data_source_uri> [<table_schema>]
Type: integer
Description: The maximum number of bad rows to skip before the load job is aborted and no updates are performed. If this value is larger than 0, the job will succeed as long as the number of bad records do not exceed this value. This is useful if you would like to load files that may have bad records. The default value for this parameter is 0 (all rows are required to be valid).
Valid Values: Any integer
bq load --max_bad_records=3 <destination_table> <data_source_uri> [<table_schema>]
Type: integer
Description: Skip a certain number of top rows. This is useful for skipping header rows in your source CSV file. The default value for this parameter is 0 (all rows are considered data rows).
Valid Values: Any integer
bq load --skip_leading_rows=1 <destination_table> <data_source_uri> [<table_schema>]
Type: boolean
Description: Enable autodetection of schema and options for formats—such as CSV and JSON—that are not self-describing. The default value is --noautodetect.
Valid Values: --autodetect, --noautodetect
bq load --autodetect <destination_table> <data_source_uri>

Character encodings

By default, the BigQuery service expects all source data to be UTF-8 encoded. Optionally, if you have CSV files with data encoded in ISO-8859-1 format, you should explicitly specify the encoding when you import your data so that BigQuery can properly convert your data to UTF-8 during the import process. Currently, it is only possible to import data that is ISO-8859-1 or UTF-8 encoded. Keep in mind the following when you specify the character encoding of your data:

  • If you don't specify an encoding, or explicitly specify that your data is UTF-8 but then provide a CSV file that is not UTF-8 encoded, BigQuery attempts to convert your CSV file to UTF-8.

    Generally, your data will be imported successfully but may not match byte-for-byte what you expect. To avoid this, specify the correct encoding and try your import again.

  • Delimiters must be encoded as ISO-8859-1.

    Generally, it is best practice to use a standard delimiter, such as a tab, pipe, or comma.

  • If BigQuery cannot convert a character, it is converted to the standard Unicode replacement character: �.
  • JSON files must always be encoded in UTF-8.

Copy an existing table

To copy a table, run the bq cp command:

bq cp <source_table> <destination_table>

For example, a sample copy command might look like the following:

bq cp dataset.mytable dataset2.mynewtable

You can also copy tables across different projects by specifying the project id in the source and destination path:

bq cp 123456789123:dataset.mytable 0987654321098:dataset2.mynewtable

Note that when you perform a table copy, you must specify a unique destination table. For example, if you already have a table called mynewtable in dataset2, the above command will fail and the command line tool throws an exception. If you would like to append or overwrite an existing table, you can programmatically start a copy job that appends or overwrites an existing table using the API.

Create a table from a query

Run a query, and specify the --destination_table flag

bq query --destination_table=mydataset.happyhalloween "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

Create an empty table

Create an empty table by using the bq mk command with the -t flag.

bq mk --schema name:string,value:integer -t mydataset.newtable

Where mydataset is the name of an existing dataset, and newtable is the name of the new table. This example provided a comma-separated list of fields for the --schema value. Alternatively, you can store your schema in a file and supply that file's name as the --schema value.

Working with queries

Run a query

Run the following command to run a query:

bq query <query_string>
  • The query string must be contained within double quote characters.
  • An ending ; (semicolon) mark is not required.
  • You do not need to specify the project ID or dataset ID if you have defined default values.
  • You can use the common --format flag to specify other output formats, such as JSON or CSV.
  • If a query will take a long time to run, you can run it asynchronously in the command line.
  • To save the results to a table, use the --destination_table flag.

The following examples use a public dataset of baby names that is described in the Create a new table section of the command line tool quickstart page.

bq ls -d
bq ls :mydataset
bq query "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"
Waiting on job_a4e77f793e7b4d5bbc1fd69244d9e792 ... (0s) Current status: DONE
|   name   | COUNT |
| Zachary  | 22731 |
| Alfred   | 20477 |
| Gregory  | 17179 |
| Ned      | 16860 |
| Ulrich   | 15300 |
| Thomas   | 14995 |

You can also run batch queries by using the --batch flag.

The following example shows how to start an asynchronous batch query:

bq --nosync query --batch "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

Enabling standard SQL

To enable standard SQL for a query, set the --use_legacy_sql flag to false. For example, the following query runs using standard SQL:

bq query --use_legacy_sql=false "SELECT word FROM publicdata.samples.shakespeare"

Working with query results

Allow large results

If your query results exceed the maximum response size, you can allow large results and write the result set to a destination table. Use the --allow_large_results flag with the --destination_table flag to create a destination table to hold the large results set:

bq query --destination_table '[DATASET].[TABLE_NAME]' --allow_large_results "[QUERY]"

For more information on allowing large results, see: Returning large query results.

Send feedback about...