Querying Data

This document describes several ways to query data using BigQuery's SQL dialect, including running synchronous queries, asyncronous queries, interactive queries, and batch queries. This document also discusses how to opt out of query caching, return large query results, create a view, and query meta-tables.

Running synchronous queries

A synchronous query waits for the query to complete before returning a response. The response includes the query results in the form of a table, which can be a temporary or permanent table.

If you prefer to submit a query request that returns before the query completes, see Running asynchronous queries.

By default, synchronous queries write results to temporary tables. To run a synchronous query that writes to a temporary table:

Web UI

  1. Go to the BigQuery web UI. Go to the BigQuery web UI
  2. Click the Compose query button.
  3. Enter a valid BigQuery SQL query in the New Query text area.
  4. Click the Run query button.

This generates a synchronous query that writes the output to a temporary table.

Command-line

The bq command-line tool runs your query synchronously by default. For information on how to run a query, see Run a query.

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public BigqueryQueryJob SyncQuery(string projectId, string datasetId, string tableId,
    string query, double timeoutMs, BigqueryClient client)
{
    var table = client.GetTable(projectId, datasetId, tableId);
    BigqueryJob job = client.CreateQueryJob(query,
        new CreateQueryJobOptions { UseQueryCache = false });
    // Get the query result, waiting for the timespan specified in milliseconds.
    BigqueryQueryJob result = client.GetQueryResults(job.Reference.JobId,
        new GetQueryResultsOptions { Timeout = TimeSpan.FromMilliseconds(timeoutMs) });
    return result;
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

func Query(proj, q string) ([][]bigquery.Value, error) {
	ctx := context.Background()

	client, err := bigquery.NewClient(ctx, proj)
	if err != nil {
		return nil, err
	}

	query := client.Query(q)
	iter, err := query.Read(ctx)
	if err != nil {
		return nil, err
	}

	var rows [][]bigquery.Value

	for {
		var row []bigquery.Value
		err := iter.Next(&row)
		if err == iterator.Done {
			return rows, nil
		}
		if err != nil {
			return nil, err
		}
		rows = append(rows, row)
	}
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public static void run(
    final PrintStream out,
    final String queryString,
    final long waitTime,
    final boolean useLegacySql) throws IOException {
  BigQuery bigquery =
      new BigQueryOptions.DefaultBigqueryFactory().create(BigQueryOptions.getDefaultInstance());

  QueryRequest queryRequest =
      QueryRequest.newBuilder(queryString)
          .setMaxWaitTime(waitTime)
          // Use standard SQL syntax or legacy SQL syntax for queries.
          // See: https://cloud.google.com/bigquery/sql-reference/
          .setUseLegacySql(useLegacySql)
          .build();
  QueryResponse response = bigquery.query(queryRequest);

  if (response.hasErrors()) {
    throw new RuntimeException(
        response
            .getExecutionErrors()
            .stream()
            .<String>map(err -> err.getMessage())
            .collect(Collectors.joining("\n")));
  }

  QueryResult result = response.getResult();
  Iterator<List<FieldValue>> iter = result.iterateAll();
  while (iter.hasNext()) {
    List<FieldValue> row = iter.next();
    out.println(row.stream().map(val -> val.toString()).collect(Collectors.joining(",")));
  }
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

function syncQuery (sqlQuery) {
  // Instantiates a client
  const bigquery = BigQuery();

  // Query options list: https://cloud.google.com/bigquery/docs/reference/v2/jobs/query
  const options = {
    query: sqlQuery,
    timeoutMs: 10000, // Time out after 10 seconds.
    useLegacySql: false // Use standard SQL syntax for queries.
  };

  // Runs the query
  return bigquery.query(options)
    .then((results) => {
      const rows = results[0];
      console.log('Rows:');
      rows.forEach((row) => console.log(row));
      return rows;
    });
}

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

use Google\Cloud\BigQuery\BigQueryClient;

/**
 * Run a BigQuery query.
 * Example:
 * ```
 * $query = 'SELECT TOP(corpus, 10) as title, COUNT(*) as unique_words ' .
 *          'FROM [publicdata:samples.shakespeare]';
 * run_query($projectId, $query, true);
 * ```.
 *
 * @param string $projectId The Google project ID.
 * @param string $query     A SQL query to run.
 * @param bool $useLegacySql Specifies whether to use BigQuery's legacy SQL
 *        syntax or standard SQL syntax for this query.
 */
function run_query($projectId, $query, $useLegacySql)
{
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $queryResults = $bigQuery->runQuery(
        $query,
        ['useLegacySql' => $useLegacySql]);

    if ($queryResults->isComplete()) {
        $i = 0;
        $rows = $queryResults->rows();
        foreach ($rows as $row) {
            printf('--- Row %s ---' . PHP_EOL, ++$i);
            foreach ($row as $column => $value) {
                printf('%s: %s' . PHP_EOL, $column, $value);
            }
        }
        printf('Found %s row(s)' . PHP_EOL, $i);
    } else {
        throw new Exception('The query failed to complete');
    }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def sync_query(query):
    client = bigquery.Client()
    query_results = client.run_sync_query(query)

    # Use standard SQL syntax for queries.
    # See: https://cloud.google.com/bigquery/sql-reference/
    query_results.use_legacy_sql = False

    query_results.run()

    # Drain the query results by requesting a page at a time.
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# project_id   = "your google cloud project id"
# query_string = "query string to execute (using bigquery query syntax)"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id

query_results = bigquery.query query_string

query_results.each do |row|
  puts row.inspect
end

Storing synchronous results in a permanent table

To save your query results in a permanent table:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click the Show Options button.

  5. Click the Select Table button in the Destination Table section.

  6. Enter a table ID and click OK.

  7. Click the Run query button.

This generates a synchronous query that writes the output to a permanent table.

Alternately, if you forget to specify a destination table before running your query, you can copy the temporary table to a permanent table by clicking the Save as Table button in the results window.

Command-line


Use the --destination_table flag to create a permanent table based on the query results. For example, the following query creates a permanent table named happyhalloween in the mydataset dataset:

bq query --destination_table=mydataset.happyhalloween "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

API

To save the query results to a permanent table, run the query asynchronously and specify a permanent table.

Storing synchronous results in a view

For information on how to create views, see Creating views.

Running asynchronous queries

An asynchronous query returns a response immediately, generally before the query completes. You then periodically check whether the query completed using a separate API call.

If you prefer that the query wait until the query completes before returning a response, use a synchronous query.

Asynchronous queries are always saved to a table: either a new table, an existing table, or a temporary table. You can choose whether to append or overwrite data in an existing table, and whether to create a new table if none exists by that name.

By default, asynchronous queries write results to temporary tables. To run an asynchronous query that writes to a temporary table, use the command-line tool or the BigQuery API:

Command-line

Use the --no-sync flag to run a query asynchronously. For more information on how to run an asynchronous query using the bq tool, see Running asynchronous operations.

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public BigqueryQueryJob AsyncQuery(string projectId, string datasetId, string tableId,
    string query, BigqueryClient client)
{
    var table = client.GetTable(projectId, datasetId, tableId);
    BigqueryJob job = client.CreateQueryJob(query,
        new CreateQueryJobOptions { UseQueryCache = false });

    // Wait for the job to complete.
    job.PollUntilCompleted();

    // Then we can fetch the results, either via the job or by accessing
    // the destination table.
    return client.GetQueryResults(job.Reference.JobId);
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

q := client.Query(fmt.Sprintf(`
	SELECT name, count
	FROM [%s.%s]
`, datasetID, tableID))
job, err := q.Run(ctx)
if err != nil {
	return err
}

// Wait until async querying is done.
status, err := job.Wait(ctx)
if err != nil {
	return err
}
if err := status.Err(); err != nil {
	return err
}

it, err := job.Read(ctx)
for {
	var row []bigquery.Value
	err := it.Next(&row)
	if err == iterator.Done {
		break
	}
	if err != nil {
		return err
	}
	fmt.Println(row)
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

QueryRequest request = QueryRequest.of(query);
QueryResponse response = bigquery.query(request);
// Wait for things to finish
while (!response.jobCompleted()) {
  Thread.sleep(1000);
  response = bigquery.getQueryResults(response.getJobId());
}
if (response.hasErrors()) {
  // handle errors
}
QueryResult result = response.getResult();
Iterator<List<FieldValue>> rowIterator = result.iterateAll();
while (rowIterator.hasNext()) {
  List<FieldValue> row = rowIterator.next();
  // do something with the data
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

function asyncQuery (sqlQuery) {
  // Instantiates a client
  const bigquery = BigQuery();

  // Query options list: https://cloud.google.com/bigquery/docs/reference/v2/jobs/query
  const options = {
    query: sqlQuery,
    useLegacySql: false // Use standard SQL syntax for queries.
  };

  let job;

  // Runs the query as a job
  return bigquery.startQuery(options)
    .then((results) => {
      job = results[0];
      console.log(`Job ${job.id} started.`);
      return job.promise();
    })
    .then(() => {
      console.log(`Job ${job.id} completed.`);
      return job.getQueryResults();
    })
    .then((results) => {
      const rows = results[0];
      console.log('Rows:');
      rows.forEach((row) => console.log(row));
      return rows;
    });
}

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\ExponentialBackoff;

/**
 * Run a BigQuery query as a job.
 * Example:
 * ```
 * $query = 'SELECT TOP(corpus, 10) as title, COUNT(*) as unique_words ' .
 *          'FROM [publicdata:samples.shakespeare]';
 * run_query_as_job($projectId, $query, true);
 * ```.
 *
 * @param string $projectId The Google project ID.
 * @param string $query     A SQL query to run. *
 * @param bool $useLegacySql Specifies whether to use BigQuery's legacy SQL
 *        syntax or standard SQL syntax for this query.
 */
function run_query_as_job($projectId, $query, $useLegacySql)
{
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $job = $bigQuery->runQueryAsJob(
        $query,
        ['jobConfig' => ['useLegacySql' => $useLegacySql]]);
    $backoff = new ExponentialBackoff(10);
    $backoff->execute(function () use ($job) {
        print('Waiting for job to complete' . PHP_EOL);
        $job->reload();
        if (!$job->isComplete()) {
            throw new Exception('Job has not yet completed', 500);
        }
    });
    $queryResults = $job->queryResults();

    if ($queryResults->isComplete()) {
        $i = 0;
        $rows = $queryResults->rows();
        foreach ($rows as $row) {
            printf('--- Row %s ---' . PHP_EOL, ++$i);
            foreach ($row as $column => $value) {
                printf('%s: %s' . PHP_EOL, $column, $value);
            }
        }
        printf('Found %s row(s)' . PHP_EOL, $i);
    } else {
        throw new Exception('The query failed to complete');
    }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def async_query(query):
    client = bigquery.Client()
    query_job = client.run_async_query(str(uuid.uuid4()), query)
    query_job.use_legacy_sql = False
    query_job.begin()

    wait_for_job(query_job)

    # Drain the query results by requesting a page at a time.
    query_results = query_job.results()
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break
def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# project_id   = "your google cloud project id"
# query_string = "query string to execute (using bigquery query syntax)"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id

puts "Running query"
query_job = bigquery.query_job query_string

puts "Waiting for query to complete"
query_job.wait_until_done!

puts "Query results:"
query_job.query_results.each do |row|
  puts row.inspect
end

Storing asynchronous results in a permanent table

To store the query results from an asynchronous query in a permanent table, include the name of a destination table.

Command-line


Use the --destination_table flag to create a permanent table based on the query results. For example, the following query creates a permanent table named happyhalloween in the mydataset dataset:

bq query --nosync --destination_table=mydataset.happyhalloween "SELECT name,count FROM mydataset.names_2013 WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

API

To save the query results to a permanent table, include a value for the jobs#configuration.query.destinationTable property.

Storing asynchronous results in a view

For information on how to create views, see Creating views.

Running interactive or batch queries

By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit. All of the examples in Running synchronous queries are interactive queries.

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf, and starts the query as soon as idle resources are available, usually within a few minutes. If BigQuery hasn't started the query within 24 hours, BigQuery changes the job priority to interactive. Batch queries don't count towards your concurrent rate limit, which can make it easier to start many queries at once.

To run a batch query:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click the Show Options button.

  5. Select the Batch option in the Query Priority section.

  6. Click the Run query button.

Command-line


Use the --batch flag to run a batch query. For example, the following query shows how to start an asynchronous batch query:

bq --nosync query --batch "SELECT name,count FROM mydataset.names_2013 WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

API

Include the configuration.query.priority property, with the value set to BATCH. For example, the language-specific examples in running asynchronous queries all use batch queries.

Opting out of query caching

BigQuery caches query results on a best-effort basis for increased performance. Results are cached for approximately 24 hours and cache lifetimes are extended when a query returns a cached result.

You aren't charged for cached queries, but cached queries are subject to the same quota policies as non-cached queries. BigQuery caches all queries that don't specify a destination table, including both interactive and batch queries.

To prevent query caching:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click Show Options.

  5. Uncheck Use Cached Results.

Command-line


Use the nouse_cache flag to prevent query caching. For example, the following query does not use the query cache:

 bq query --nouse_cache --batch "SELECT name,count FROM mydataset.names_2013 WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

API

To disable query caching, set the useQueryCache property to false.

Ensuring cached query results

If you use the jobs.insert() function to run a query, you can ensure that the job returns a query result from the cache, if it exists, by setting the createDisposition property of the job configuration to CREATE_NEVER.

If the query result doesn't exist in the cache, a NOT_FOUND error is returned.

Checking to see if BigQuery returned a cached result

There are two ways to determine if BigQuery returned a cached result:

  • If using the BigQuery web UI, the result string does not contain information about the number of processed bytes, and displays the word "cached".

  • If using the BigQuery API, the cacheHit property in the query result is set to true.

Returning large query results

Normally, queries have a maximum response size. If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.

Queries that return large results take longer to execute, even if the result set is small, and are subject to additional limitations:

  • You must specify a destination table.
  • You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
  • Window functions can return large query results only if used in conjunction with a PARTITION BY clause.

Querying table subsets by using table decorators

Normally, BigQuery performs a full column scan when running a query. You can use table decorators to perform a more cost-effective query of a subset of your data. For more information, see table decorators.

Creating views

A view is a virtual table defined by a SQL query. You can query views in the browser tool, or by using a query job.

BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.

BigQuery supports up to eight levels of nested views; if there are more than eight levels, an INVALID_INPUT error returns. You are also limited to 1,000 authorized views per dataset. In addition, views can only reference other tables and views with the same Dataset location.

Querying a view requires the READER role for all datasets that contain the tables in the view chain. For more information about roles, see access control.

Web UI


After running a query, click the Save View button in the query results window to save the results as a view.

Command-line


Use the mk command with the --view flag.

bq mk --view [SQL_QUERY] [DATASET.VIEW]

Where:

  • SQL_QUERY is a valid BigQuery SQL query
  • DATASET is a dataset in your project
  • VIEW is the name of the view you want to create

If your query references external user-defined function resources stored in Google Cloud Storage or local files, use the --view_udf_resource flag to specify those resources. For example:

bq mk --view="SELECT foo FROM myUdf(table1)" --view_udf_resource="gs://my-bucket/some_library.js" --view_udf_resource="path/to/local/file.js"

API


Call tables.insert() with a table resource that contains a view property.

You can edit a view by calling tables.patch() in the BigQuery API.

Using meta-tables

BigQuery offers some special tables whose contents represent metadata, such as the names of your tables. The "meta-tables" are read-only. Normally, you use one by referencing it in a SELECT statement.

Meta-tables can be used in other API operations besides a query job, such as tables.get or tabledata.list. They do not support tables.insert and cannot be used as a destination table; they also do not support table decorators. Meta-tables do not appear in a tables.list of the dataset.

Metadata about tables in a dataset

You can access metadata about the tables in a dataset by using the __TABLES__ or __TABLES_SUMMARY__ meta-table.

Use the following syntax to query a meta-table:

    SELECT [FIELD] FROM [DATASET].__TABLES__;

Where DATASET is the name of your dataset, and FIELD is one of the following:

Field Description
project_id Name of the project.
dataset_id Name of the dataset.
table_id Name of the table.
creation_time The time at which the table was created, in milliseconds since January 1, 1970 UTC.
last_modified_time The time at which the table was most recently changed, in milliseconds since January 1, 1970 UTC.
row_count Number of rows in the table.
size_bytes Total size of the table, measured in bytes.
type An integer representing the table type: a regular table (1) or a view (2).

Example

The following query retrieves metadata about the tables in the publicdata:samples dataset.

    SELECT * FROM publicdata:samples.__TABLES__;

Returns:

+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+
| project_id | dataset_id |    table_id     | creation_time | last_modified_time | row_count |  size_bytes  | type |
+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+
| publicdata | samples    | github_nested   | 1348782587310 |      1348782587310 |   2541639 |   1694950811 |    1 |
| publicdata | samples    | github_timeline | 1335915950690 |      1335915950690 |   6219749 |   3801936185 |    1 |
| publicdata | samples    | gsod            | 1335916040125 |      1440625349328 | 114420316 |  17290009238 |    1 |
| publicdata | samples    | natality        | 1335916045005 |      1440625330604 | 137826763 |  23562717384 |    1 |
| publicdata | samples    | shakespeare     | 1335916045099 |      1440625429551 |    164656 |      6432064 |    1 |
| publicdata | samples    | trigrams        | 1335916127449 |      1445684180324 |  68051509 | 277168458677 |    1 |
| publicdata | samples    | wikipedia       | 1335916132870 |      1445689914564 | 313797035 |  38324173849 |    1 |
+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+

Scaling to a large number of tables

__TABLES__ provides more information than Tables.list, but Tables.list is faster on a dataset with many tables.

__TABLES_SUMMARY__ is a meta-table that is faster than __TABLES__ because it does not contain the data-dependent fields: last_modified_time, row_count, and size_bytes.

Generally, __TABLES__ and __TABLES_SUMMARY__ are reasonably fast for datasets with up to a few thousand tables. For larger datasets they will become increasingly slow, and may exceed available resources.

Temporary and permanent tables

BigQuery saves all query results to a table, which can be either permanent or temporary:

  • A temporary table is a randomly named table saved in a special dataset; the table has a lifetime of approximately 24 hours. Temporary tables are not available for sharing, and are not visible using any of the standard list or other table manipulation methods.

  • A permanent table can be a new or existing table in any dataset in which you have WRITE privileges.

Additional limits

In addition to the standard query quotas, the following limits also apply for querying data.

  • Maximum tables per query: 1,000
  • Maximum query length: 256 KB

Send feedback about...

BigQuery Documentation