Querying Data

This document describes several ways to query data using BigQuery, including running query jobs, interactive queries, and batch queries. This document also discusses how to use query caching, return large query results, and query meta-tables.

Running queries

Query results are always saved to either a temporary or permanent table. You can choose whether to append or overwrite data in an existing table and whether to create a new table if none exists by that name.

To run a query that writes to a temporary table:

Web UI

  1. Go to the BigQuery web UI. Go to the BigQuery web UI
  2. Click the Compose query button.
  3. Enter a valid BigQuery SQL query in the New Query text area.
  4. Click the Run query button.

This creates a query job that writes the output to a temporary table.

Command-line

Use the bq command-line tool to Run a query.

API

To run a query using the API, insert a new job and populate the jobs#configuration.query property.

Poll for results by calling getQueryResults. Poll until jobComplete equals true. Check for errors and warnings in the errors list.

C#

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

public BigQueryResults AsyncQuery(string projectId, string datasetId, string tableId,
    string query, BigQueryClient client)
{
    var table = client.GetTable(projectId, datasetId, tableId);
    BigQueryJob job = client.CreateQueryJob(query,
        new CreateQueryJobOptions { UseQueryCache = false });

    // Wait for the job to complete.
    job.PollUntilCompleted();

    // Then we can fetch the results, either via the job or by accessing
    // the destination table.
    return client.GetQueryResults(job.Reference.JobId);
}

Go

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

q := client.Query(fmt.Sprintf(`
	SELECT name, count
	FROM [%s.%s]
`, datasetID, tableID))
job, err := q.Run(ctx)
if err != nil {
	return err
}

// Wait until async querying is done.
status, err := job.Wait(ctx)
if err != nil {
	return err
}
if err := status.Err(); err != nil {
	return err
}

it, err := job.Read(ctx)
for {
	var row []bigquery.Value
	err := it.Next(&row)
	if err == iterator.Done {
		break
	}
	if err != nil {
		return err
	}
	fmt.Println(row)
}

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

Create a new QueryJobConfiguration and use the BigQuery.create() function to insert the query job.

Once the job completes, you can page through the results by calling BigQuery.getQueryResults().

public static void runSimpleQuery(String queryString)
    throws TimeoutException, InterruptedException {
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString).build();

  runQuery(queryConfig);
}
public static void runQuery(QueryJobConfiguration queryConfig)
    throws TimeoutException, InterruptedException {
  BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

  // Create a job ID so that we can safely retry.
  JobId jobId = JobId.of(UUID.randomUUID().toString());
  Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

  // Wait for the query to complete.
  queryJob = queryJob.waitFor();

  // Check for errors
  if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
  } else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
  }

  // Get the results.
  QueryResponse response = bigquery.getQueryResults(jobId);
  QueryResult result = response.getResult();

  // Print all pages of the results.
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      for (FieldValue val : row) {
        System.out.printf("%s,", val.toString());
      }
      System.out.printf("\n");
    }

    result = result.getNextPage();
  }
}

Node.js

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

// Imports the Google Cloud client library
const BigQuery = require('@google-cloud/bigquery');

// The project ID to use, e.g. "your-project-id"
// const projectId = "your-project-id";

// Instantiates a client
const bigquery = BigQuery({
  projectId: projectId
});

// Query options list: https://cloud.google.com/bigquery/docs/reference/v2/jobs/query
const options = {
  query: sqlQuery,
  useLegacySql: false // Use standard SQL syntax for queries.
};

let job;

// Runs the query as a job
bigquery
  .startQuery(options)
  .then((results) => {
    job = results[0];
    console.log(`Job ${job.id} started.`);
    return job.promise();
  })
  .then(() => {
    console.log(`Job ${job.id} completed.`);
    return job.getQueryResults();
  })
  .then((results) => {
    const rows = results[0];
    console.log('Rows:');
    rows.forEach((row) => console.log(row));
  });

PHP

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

use Google\Cloud\BigQuery\BigQueryClient;
use Google\Cloud\Core\ExponentialBackoff;

/**
 * Run a BigQuery query as a job.
 * Example:
 * ```
 * $query = 'SELECT TOP(corpus, 10) as title, COUNT(*) as unique_words ' .
 *          'FROM [publicdata:samples.shakespeare]';
 * run_query_as_job($projectId, $query, true);
 * ```.
 *
 * @param string $projectId The Google project ID.
 * @param string $query     A SQL query to run. *
 * @param bool $useLegacySql Specifies whether to use BigQuery's legacy SQL
 *        syntax or standard SQL syntax for this query.
 */
function run_query_as_job($projectId, $query, $useLegacySql)
{
    $bigQuery = new BigQueryClient([
        'projectId' => $projectId,
    ]);
    $job = $bigQuery->runQueryAsJob(
        $query,
        ['jobConfig' => ['useLegacySql' => $useLegacySql]]);
    $backoff = new ExponentialBackoff(10);
    $backoff->execute(function () use ($job) {
        print('Waiting for job to complete' . PHP_EOL);
        $job->reload();
        if (!$job->isComplete()) {
            throw new Exception('Job has not yet completed', 500);
        }
    });
    $queryResults = $job->queryResults();

    if ($queryResults->isComplete()) {
        $i = 0;
        $rows = $queryResults->rows();
        foreach ($rows as $row) {
            printf('--- Row %s ---' . PHP_EOL, ++$i);
            foreach ($row as $column => $value) {
                printf('%s: %s' . PHP_EOL, $column, $value);
            }
        }
        printf('Found %s row(s)' . PHP_EOL, $i);
    } else {
        throw new Exception('The query failed to complete');
    }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def async_query(query):
    client = bigquery.Client()
    query_job = client.run_async_query(str(uuid.uuid4()), query)
    query_job.use_legacy_sql = False
    query_job.begin()

    wait_for_job(query_job)

    # Drain the query results by requesting a page at a time.
    query_results = query_job.results()
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break
def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)

Ruby

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

# project_id   = "your google cloud project id"
# query_string = "query string to execute (using bigquery query syntax)"

require "google/cloud/bigquery"

bigquery = Google::Cloud::Bigquery.new project: project_id

puts "Running query"
query_job = bigquery.query_job query_string

puts "Waiting for query to complete"
query_job.wait_until_done!

puts "Query results:"
query_job.query_results.each do |row|
  puts row.inspect
end

Storing query results in a permanent table

To save your query results in a permanent table:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click the Show Options button.

  5. Click the Select Table button in the Destination Table section.

  6. Enter a table ID and click OK.

  7. Click the Run query button.

This creates a query job that writes the output to a permanent table.

Alternately, if you forget to specify a destination table before running your query, you can copy the temporary table to a permanent table by clicking the Save as Table button in the results window.

Command-line

Use the --destination_table flag to create a permanent table based on the query results. For example, the following query creates a permanent table named happyhalloween in the mydataset dataset:

bq query --destination_table=mydataset.happyhalloween "SELECT name,number
    FROM [bigquery-public-data:usa_names.usa_1910_current]
    WHERE gender = 'M'
    ORDER BY number DESC
    LIMIT 6"

API

To save the query results to a permanent table, include a value for the jobs#configuration.query.destinationTable property.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To save query results to a permanent table, set the destination table to the desired TableId in a QueryJobConfiguration.

public static void runQueryPermanentTable(
    String queryString,
    String destinationDataset,
    String destinationTable,
    boolean allowLargeResults) throws TimeoutException, InterruptedException {
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString)
          // Save the results of the query to a permanent table.
          // See: https://cloud.google.com/bigquery/querying-data#permanent-table
          .setDestinationTable(TableId.of(destinationDataset, destinationTable))
          // Allow results larger than the maximum response size.
          // If true, a destination table must be set.
          // See: https://cloud.google.com/bigquery/querying-data#large-results
          .setAllowLargeResults(allowLargeResults)
          .build();

  runQuery(queryConfig);
}
public static void runQuery(QueryJobConfiguration queryConfig)
    throws TimeoutException, InterruptedException {
  BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

  // Create a job ID so that we can safely retry.
  JobId jobId = JobId.of(UUID.randomUUID().toString());
  Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

  // Wait for the query to complete.
  queryJob = queryJob.waitFor();

  // Check for errors
  if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
  } else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
  }

  // Get the results.
  QueryResponse response = bigquery.getQueryResults(jobId);
  QueryResult result = response.getResult();

  // Print all pages of the results.
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      for (FieldValue val : row) {
        System.out.printf("%s,", val.toString());
      }
      System.out.printf("\n");
    }

    result = result.getNextPage();
  }
}

Storing query results in a view

For information on how to create views, see Using Views.

Running parameterized queries

BigQuery supports query parameters to help prevent SQL injection when queries are constructed using user input. This feature is only available with standard SQL syntax.

To specify a named parameter, use the @ character followed by an identifier, such as @param_name. For example, this query finds all the words in a specific Shakespeare corpus with counts that are at least the specified value.

#standardSQL
SELECT
  word,
  word_count
FROM
  `bigquery-public-data.samples.shakespeare`
WHERE
  corpus = @corpus
  AND word_count >= @min_word_count
ORDER BY
  word_count DESC;

Alternatively, use the placeholder value ? to specify a positional parameter. Note that a query can use positional or named parameters but not both.

Command-line

Use --parameter to provide values for parameters in the form "name:type:value". An empty name produces a positional parameter. The type may be omitted to assume STRING.

The --parameter flag must be used in conjunction with the flag --use_legacy_sql=False to specify standard SQL syntax.

bq query --use_legacy_sql=False \
    --parameter=corpus::romeoandjuliet \
    --parameter=min_word_count:INT64:250 \
    'SELECT word, word_count
    FROM `bigquery-public-data.samples.shakespeare`
    WHERE corpus = @corpus
    AND word_count >= @min_word_count
    ORDER BY word_count DESC;'

API

To use named parameters, set the jobs#configuration.query.parameterMode to NAMED.

Populate jobs#configuration.query.queryParameters[] with the list of parameters. Set the name of each parameter with the @param_name used in the query.

Enable standard SQL syntax by setting useLegacySql to false.

{
  "query": "SELECT word, word_count FROM `bigquery-public-data.samples.shakespeare` WHERE corpus = @corpus AND word_count >= @min_word_count ORDER BY word_count DESC;",
  "queryParameters": [
    {
      "parameterType": {
        "type": "STRING"
      },
      "parameterValue": {
        "value": "romeoandjuliet"
      },
      "name": "corpus"
    },
    {
      "parameterType": {
        "type": "INT64"
      },
      "parameterValue": {
        "value": "250"
      },
      "name": "min_word_count"
    }
  ],
  "useLegacySql": false,
  "parameterMode": "NAMED"
}

Try it in the Google APIs Explorer.

To use positional parameters, set the jobs#configuration.query.parameterMode to POSITIONAL.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

private static void runNamed(final String corpus, final long minWordCount)
    throws InterruptedException {
  BigQuery bigquery =
      new BigQueryOptions.DefaultBigqueryFactory().create(BigQueryOptions.getDefaultInstance());

  String queryString =
      "SELECT word, word_count\n"
          + "FROM `bigquery-public-data.samples.shakespeare`\n"
          + "WHERE corpus = @corpus\n"
          + "AND word_count >= @min_word_count\n"
          + "ORDER BY word_count DESC";
  QueryRequest queryRequest =
      QueryRequest.newBuilder(queryString)
          .addNamedParameter("corpus", QueryParameterValue.string(corpus))
          .addNamedParameter("min_word_count", QueryParameterValue.int64(minWordCount))
          // Standard SQL syntax is required for parameterized queries.
          // See: https://cloud.google.com/bigquery/sql-reference/
          .setUseLegacySql(false)
          .build();

  // Execute the query.
  QueryResponse response = bigquery.query(queryRequest);

  // Wait for the job to finish (if the query takes more than 10 seconds to complete).
  while (!response.jobCompleted()) {
    Thread.sleep(1000);
    response = bigquery.getQueryResults(response.getJobId());
  }

  // Check for errors.
  if (response.hasErrors()) {
    String firstError = "";
    if (response.getExecutionErrors().size() != 0) {
      firstError = response.getExecutionErrors().get(0).getMessage();
    }
    throw new RuntimeException(firstError);
  }

  // Print all pages of the results.
  QueryResult result = response.getResult();
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      System.out.printf("%s: %d\n", row.get(0).getStringValue(), row.get(1).getLongValue());
    }

    result = result.getNextPage();
  }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)
def print_results(query_results):
    """Print the query results by requesting a page at a time."""
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break
def query_named_params(corpus, min_word_count):
    client = bigquery.Client()
    query = """
        SELECT word, word_count
        FROM `bigquery-public-data.samples.shakespeare`
        WHERE corpus = @corpus
        AND word_count >= @min_word_count
        ORDER BY word_count DESC;
        """
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=(
            bigquery.ScalarQueryParameter('corpus', 'STRING', corpus),
            bigquery.ScalarQueryParameter(
                'min_word_count', 'INT64', min_word_count)))
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

Using arrays in parameterized queries

To use an array type in a query parameter set the type to ARRAY<T> where T is the type of the elements in the array. Construct the value as a comma-separated list of elements enclosed in square brackets, such as [1, 2, 3].

See the data types reference for more information about the array type.

Command-line

This query selects the most popular names for baby boys born in US states starting with the letter W.

bq query --use_legacy_sql=False \
    --parameter='gender::M' \
    --parameter='states:ARRAY<STRING>:["WA", "WI", "WV", "WY"]' \
    'SELECT name, sum(number) as count
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE gender = @gender
    AND state IN UNNEST(@states)
    GROUP BY name
    ORDER BY count DESC
    LIMIT 10;'

Be careful to enclose the array type declaration in single quotes so that the command output is not accidentally redirected to a file by the > character.

API

To use an array-valued parameter set the jobs#configuration.query.queryParameters[].parameterType.type to ARRAY.

If the array values are scalars set the jobs#configuration.query.queryParameters[].parameterType.arrayType.type to the type of the values, such as STRING. If the array values are structures set this to STRUCT and add the needed field definitions to structTypes.

For example, this query selects the most popular names for baby boys born in US states starting with the letter W.

{
 "query": "SELECT name, sum(number) as count\nFROM `bigquery-public-data.usa_names.usa_1910_2013`\nWHERE gender = @gender\nAND state IN UNNEST(@states)\nGROUP BY name\nORDER BY count DESC\nLIMIT 10;",
 "queryParameters": [
  {
   "parameterType": {
    "type": "STRING"
   },
   "parameterValue": {
    "value": "M"
   },
   "name": "gender"
  },
  {
   "parameterType": {
    "type": "ARRAY",
    "arrayType": {
     "type": "STRING"
    }
   },
   "parameterValue": {
    "arrayValues": [
     {
      "value": "WA"
     },
     {
      "value": "WI"
     },
     {
      "value": "WV"
     },
     {
      "value": "WY"
     }
    ]
   },
   "name": "states"
  }
 ],
 "useLegacySql": false,
 "parameterMode": "NAMED"
}

Try it in the Google APIs Explorer.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

private static void runArray(String gender, String[] states) throws InterruptedException {
  BigQuery bigquery =
      new BigQueryOptions.DefaultBigqueryFactory().create(BigQueryOptions.getDefaultInstance());

  String queryString =
      "SELECT name, sum(number) as count\n"
          + "FROM `bigquery-public-data.usa_names.usa_1910_2013`\n"
          + "WHERE gender = @gender\n"
          + "AND state IN UNNEST(@states)\n"
          + "GROUP BY name\n"
          + "ORDER BY count DESC\n"
          + "LIMIT 10;";
  QueryRequest queryRequest =
      QueryRequest.newBuilder(queryString)
          .addNamedParameter("gender", QueryParameterValue.string(gender))
          .addNamedParameter("states", QueryParameterValue.array(states, String.class))
          // Standard SQL syntax is required for parameterized queries.
          // See: https://cloud.google.com/bigquery/sql-reference/
          .setUseLegacySql(false)
          .build();

  // Execute the query.
  QueryResponse response = bigquery.query(queryRequest);

  // Wait for the job to finish (if the query takes more than 10 seconds to complete).
  while (!response.jobCompleted()) {
    Thread.sleep(1000);
    response = bigquery.getQueryResults(response.getJobId());
  }

  // Check for errors.
  if (response.hasErrors()) {
    String firstError = "";
    if (response.getExecutionErrors().size() != 0) {
      firstError = response.getExecutionErrors().get(0).getMessage();
    }
    throw new RuntimeException(firstError);
  }

  // Print all pages of the results.
  QueryResult result = response.getResult();
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      System.out.printf("%s: %d\n", row.get(0).getStringValue(), row.get(1).getLongValue());
    }

    result = result.getNextPage();
  }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)
def print_results(query_results):
    """Print the query results by requesting a page at a time."""
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break
def query_array_params(gender, states):
    client = bigquery.Client()
    query = """
        SELECT name, sum(number) as count
        FROM `bigquery-public-data.usa_names.usa_1910_2013`
        WHERE gender = @gender
        AND state IN UNNEST(@states)
        GROUP BY name
        ORDER BY count DESC
        LIMIT 10;
        """
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=(
            bigquery.ScalarQueryParameter('gender', 'STRING', gender),
            bigquery.ArrayQueryParameter('states', 'STRING', states)))
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

Using timestamps in parameterized queries

To use a timestamp in a query parameter set the type to TIMESTAMP. The value should be in the format YYYY-MM-DD HH:MM:SS.DDDDDD time_zone.

See the data types reference for more information about the timestamp type.

Command-line

This query adds an hour to the timestamp parameter value.

bq query --use_legacy_sql=False \
    --parameter='ts_value:TIMESTAMP:2016-12-07 08:00:00' \
    'SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR);'

API

To use a timestamp parameter set the jobs#configuration.query.queryParameters[].parameterType.type to TIMESTAMP.

This query adds an hour to the timestamp parameter value.

{
  "query": "SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR);",
  "queryParameters": [
    {
      "name": "ts_value",
      "parameterType": {
        "type": "TIMESTAMP"
      },
      "parameterValue": {
        "value": "2016-12-07 08:00:00"
      }
    }
  ],
  "useLegacySql": false,
  "parameterMode": "NAMED"
}

Try it in the Google APIs Explorer.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

private static void runTimestamp() throws InterruptedException {
  BigQuery bigquery =
      new BigQueryOptions.DefaultBigqueryFactory().create(BigQueryOptions.getDefaultInstance());

  DateTime timestamp = new DateTime(2016, 12, 7, 8, 0, 0, DateTimeZone.UTC);

  String queryString = "SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR);";
  QueryRequest queryRequest =
      QueryRequest.newBuilder(queryString)
          .addNamedParameter(
              "ts_value",
              QueryParameterValue.timestamp(
                  // Timestamp takes microseconds since 1970-01-01T00:00:00 UTC
                  timestamp.getMillis() * 1000))
          // Standard SQL syntax is required for parameterized queries.
          // See: https://cloud.google.com/bigquery/sql-reference/
          .setUseLegacySql(false)
          .build();

  // Execute the query.
  QueryResponse response = bigquery.query(queryRequest);

  // Wait for the job to finish (if the query takes more than 10 seconds to complete).
  while (!response.jobCompleted()) {
    Thread.sleep(1000);
    response = bigquery.getQueryResults(response.getJobId());
  }

  // Check for errors.
  if (response.hasErrors()) {
    String firstError = "";
    if (response.getExecutionErrors().size() != 0) {
      firstError = response.getExecutionErrors().get(0).getMessage();
    }
    throw new RuntimeException(firstError);
  }

  // Print all pages of the results.
  QueryResult result = response.getResult();
  DateTimeFormatter formatter = ISODateTimeFormat.dateTimeNoMillis().withZoneUTC();
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      System.out.printf(
          "%s\n",
          formatter.print(
              new DateTime(
                  // Timestamp values are returned in microseconds since 1970-01-01T00:00:00 UTC,
                  // but org.joda.time.DateTime constructor accepts times in milliseconds.
                  row.get(0).getTimestampValue() / 1000, DateTimeZone.UTC)));
    }

    result = result.getNextPage();
  }
}

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)
def print_results(query_results):
    """Print the query results by requesting a page at a time."""
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break
def query_timestamp_params(year, month, day, hour, minute):
    client = bigquery.Client()
    query = 'SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR);'
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=[
            bigquery.ScalarQueryParameter(
                'ts_value',
                'TIMESTAMP',
                datetime.datetime(
                    year, month, day, hour, minute, tzinfo=pytz.UTC))])
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

Using structs in parameterized queries

To use a struct in a query parameter set the type to STRUCT<T> where T defines the fields and types within the struct. Field definitions are separated by commas and are of the form field_name TF where TF is the type of the field. For example, STRUCT<x INT64, y STRING> defines a struct with a field named x of type INT64 and a second field named y of type STRING.

See the data types reference for more information about the struct type.

Command-line

This trivial query demonstrates the use of structured types by returning the parameter value.

bq query --use_legacy_sql=False \
    --parameter='struct_value:STRUCT<x INT64, y STRING>:{"x": 1, "y": "foo"}' \
    'SELECT @struct_value AS s;'

API

To use a struct parameter set the jobs#configuration.query.queryParameters.parameterType.type to STRUCT.

Add an object for each field of the struct to jobs#configuration.query.queryParameters.parameterType.structTypes. If the struct values are scalars set the type to the type of the values, such as STRING. If the struct values are arrays set this to ARRAY and set the nested arrayType field to the appropriate type. If the struct values are structures set type to STRUCT and add the needed structTypes.

This trivial query demonstrates the use of structured types by returning the parameter value.

{
  "query": "SELECT @struct_value AS s;",
  "queryParameters": [
    {
      "name": "struct_value",
      "parameterType": {
        "type": "STRUCT",
        "structTypes": [
          {
            "name": "x",
            "type": {
              "type": "INT64"
            }
          },
          {
            "name": "y",
            "type": {
              "type": "STRING"
            }
          }
        ]
      },
      "parameterValue": {
        "structValues": {
          "x": {
            "value": "1"
          },
          "y": {
            "value": "foo"
          }
        }
      }
    }
  ],
  "useLegacySql": false,
  "parameterMode": "NAMED"
}

Try it in the Google APIs Explorer.

Python

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

def wait_for_job(job):
    while True:
        job.reload()  # Refreshes the state via a GET request.
        if job.state == 'DONE':
            if job.error_result:
                raise RuntimeError(job.errors)
            return
        time.sleep(1)
def print_results(query_results):
    """Print the query results by requesting a page at a time."""
    page_token = None

    while True:
        rows, total_rows, page_token = query_results.fetch_data(
            max_results=10,
            page_token=page_token)

        for row in rows:
            print(row)

        if not page_token:
            break
def query_struct_params(x, y):
    client = bigquery.Client()
    query = 'SELECT @struct_value AS s;'
    query_job = client.run_async_query(
        str(uuid.uuid4()),
        query,
        query_parameters=[
            bigquery.StructQueryParameter(
                'struct_value',
                bigquery.ScalarQueryParameter('x', 'INT64', x),
                bigquery.ScalarQueryParameter('y', 'STRING', y))])
    query_job.use_legacy_sql = False

    # Start the query and wait for the job to complete.
    query_job.begin()
    wait_for_job(query_job)
    print_results(query_job.results())

Running interactive or batch queries

By default, BigQuery runs interactive queries, which means that the query is executed as soon as possible. Interactive queries count towards your concurrent rate limit and your daily limit. All of the examples in Running queries are interactive queries.

BigQuery also offers batch queries. BigQuery queues each batch query on your behalf, and starts the query as soon as idle resources are available, usually within a few minutes. If BigQuery hasn't started the query within 24 hours, BigQuery changes the job priority to interactive. Batch queries don't count towards your concurrent rate limit, which can make it easier to start many queries at once.

To run a batch query:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click the Show Options button.

  5. Select the Batch option in the Query Priority section.

  6. Click the Run query button.

Command-line

Use the --batch flag to run a batch query. For example, the following query shows how to start a batch query:

bq query --batch "SELECT name,number
    FROM [bigquery-public-data:usa_names.usa_1910_current]
    WHERE gender = 'M'
    ORDER BY number DESC
    LIMIT 6"

API

Include the configuration.query.priority property, with the value set to BATCH.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To run a batch query, set the query priority to QueryJobConfiguration.Priority.BATCH when creating a QueryJobConfiguration.

public static void runBatchQuery(String queryString)
    throws TimeoutException, InterruptedException {
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString)
          // Run at batch priority, which won't count toward concurrent rate
          // limit.
          // See: https://cloud.google.com/bigquery/querying-data#interactive-batch
          .setPriority(QueryJobConfiguration.Priority.BATCH)
          .build();

  runQuery(queryConfig);
}
public static void runQuery(QueryJobConfiguration queryConfig)
    throws TimeoutException, InterruptedException {
  BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

  // Create a job ID so that we can safely retry.
  JobId jobId = JobId.of(UUID.randomUUID().toString());
  Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

  // Wait for the query to complete.
  queryJob = queryJob.waitFor();

  // Check for errors
  if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
  } else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
  }

  // Get the results.
  QueryResponse response = bigquery.getQueryResults(jobId);
  QueryResult result = response.getResult();

  // Print all pages of the results.
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      for (FieldValue val : row) {
        System.out.printf("%s,", val.toString());
      }
      System.out.printf("\n");
    }

    result = result.getNextPage();
  }
}

Using cached query results

BigQuery writes all query results to a table. The table is either explicitly identified by the user (a destination table), or it is a temporary, cached results table. Temporary, cached results tables are maintained per-user, per- project.

When you run a duplicate query, BigQuery attempts to reuse cached results. When query results are retrieved from a cached results table, the job statistics property statistics.query.cacheHit returns as true, and you are not charged for the query. Though you are not charged for queries that use cached results, the queries are subject to the BigQuery quota policies. In addition to reducing costs, queries that use cached results are significantly faster because BigQuery does not need to compute the result set.

All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions. Query results are not cached:

  • When a destination table is specified in the job configuration, the web UI, the command line, or the API

  • If any of the referenced tables or logical views have changed since the results were previously cached

  • When any of the tables referenced by the query have recently received streaming inserts (a streaming buffer is attached to the table) even if no new rows have arrived

  • If the query uses non-deterministic functions; for example, date and time functions such as CURRENT_TIMESTAMP() and NOW(), and other functions such as CURRENT_USER() return different values depending on when a query is executed

  • If the cached results have expired; typical cache lifetime is 24 hours, but the cached results are best-effort and may be invalidated sooner

For query results to persist in a cached results table, the result set must be smaller than the maximum response size. For more information about managing large result sets, see Returning large query results.

Cached result tables have additional constraints on their use, given their special status. You cannot target cached result tables with DML statements. Although current semantics allow it, the use of cached results as input for dependendent jobs is discouraged. For example, you should not submit query jobs that retrieve results from the cache table. Instead, write your results to a named destination table. To enable easy cleanup, features such as the dataset level defaultTableExpirationMs property can expire the data automatically after a given duration.

Disabling retrieval of cached results

The Use cached results option reuses results from a previous run of the same query unless the tables being queried have changed. Using cached results is only beneficial for repeated queries. For new queries, the Use cached results option has no effect, though it is enabled by default.

When you repeat a query with the Use cached results option disabled, the existing cached result is overwritten. This requires BigQuery to compute the query result, and you are charged for the query. This is particularly useful in benchmarking scenarios.

If you want to disable retrieving cached results and force live evaluation of a query job, you can set the configuration.query.useQueryCache property of your query job to false.

To disable the Use cached results option:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click Show Options.

  5. Uncheck Use Cached Results.

Command-line

Use the nouse_cache flag to overwrite the query cache. The following example forces BigQuery to process the query without using the existing cached results:

 bq query --nouse_cache --batch "SELECT name,count FROM mydataset.names_2013 WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

API

To process a query without using the existing cached results, set the useQueryCache property to false.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To process a query without using the existing chached results, set use query cache to false when creating a QueryJobConfiguration.

public static void runUncachedQuery(String queryString)
    throws TimeoutException, InterruptedException {
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString)
          // Do not use the query cache. Force live query evaluation.
          // See: https://cloud.google.com/bigquery/querying-data#query-caching
          .setUseQueryCache(false)
          .build();

  runQuery(queryConfig);
}
public static void runQuery(QueryJobConfiguration queryConfig)
    throws TimeoutException, InterruptedException {
  BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

  // Create a job ID so that we can safely retry.
  JobId jobId = JobId.of(UUID.randomUUID().toString());
  Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

  // Wait for the query to complete.
  queryJob = queryJob.waitFor();

  // Check for errors
  if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
  } else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
  }

  // Get the results.
  QueryResponse response = bigquery.getQueryResults(jobId);
  QueryResult result = response.getResult();

  // Print all pages of the results.
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      for (FieldValue val : row) {
        System.out.printf("%s,", val.toString());
      }
      System.out.printf("\n");
    }

    result = result.getNextPage();
  }
}

Ensuring use of the cache

If you use the jobs.insert() function to run a query, you can force a query job to fail unless cached results can be used by setting the createDisposition property of the job configuration to CREATE_NEVER.

If the query result does not exist in the cache, a NOT_FOUND error is returned.

Verifying use of the cache

There are two ways to determine if BigQuery returned a result using the cache:

  • If you are using the BigQuery web UI, the result string does not contain information about the number of processed bytes, and displays the word "cached".

  • If you are using the BigQuery API, the cacheHit property in the query result is set to true.

Returning large query results

Normally, queries have a maximum response size. If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration. Configuring large results requires you to specify a destination table. You incur storage charges for the destination table.

Queries with large results are subject to these limitations:

  • You must specify a destination table.
  • You cannot specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
  • Window functions can return large query results only if used in conjunction with a PARTITION BY clause.

To allow large query results:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click Show Options.

  5. For Destination Table, click Select Table and type a name in the Table ID field.

  6. For Results Size, check Allow Large Results.

Command-line

Use the allow_large_results flag with the destination_table flag to create a destination table to hold the large results set:

 bq query --destination_table '[DATASET].[TABLE_NAME]' --allow_large_results "[QUERY]"

API

To enable large results, set the configuration.query.allowLargeResults property to true and specify a destination table using configuration.query.destinationTable.

Java

For more on installing and creating a BigQuery client, refer to BigQuery Client Libraries.

To enable large results, set allow large results to true and set the destination table to the desired TableId in a QueryJobConfiguration.

public static void runQueryPermanentTable(
    String queryString,
    String destinationDataset,
    String destinationTable,
    boolean allowLargeResults) throws TimeoutException, InterruptedException {
  QueryJobConfiguration queryConfig =
      QueryJobConfiguration.newBuilder(queryString)
          // Save the results of the query to a permanent table.
          // See: https://cloud.google.com/bigquery/querying-data#permanent-table
          .setDestinationTable(TableId.of(destinationDataset, destinationTable))
          // Allow results larger than the maximum response size.
          // If true, a destination table must be set.
          // See: https://cloud.google.com/bigquery/querying-data#large-results
          .setAllowLargeResults(allowLargeResults)
          .build();

  runQuery(queryConfig);
}
public static void runQuery(QueryJobConfiguration queryConfig)
    throws TimeoutException, InterruptedException {
  BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

  // Create a job ID so that we can safely retry.
  JobId jobId = JobId.of(UUID.randomUUID().toString());
  Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

  // Wait for the query to complete.
  queryJob = queryJob.waitFor();

  // Check for errors
  if (queryJob == null) {
    throw new RuntimeException("Job no longer exists");
  } else if (queryJob.getStatus().getError() != null) {
    // You can also look at queryJob.getStatus().getExecutionErrors() for all
    // errors, not just the latest one.
    throw new RuntimeException(queryJob.getStatus().getError().toString());
  }

  // Get the results.
  QueryResponse response = bigquery.getQueryResults(jobId);
  QueryResult result = response.getResult();

  // Print all pages of the results.
  while (result != null) {
    for (List<FieldValue> row : result.iterateAll()) {
      for (FieldValue val : row) {
        System.out.printf("%s,", val.toString());
      }
      System.out.printf("\n");
    }

    result = result.getNextPage();
  }
}

Querying table subsets by using table decorators

Normally, BigQuery performs a full column scan when running a query. You can use table decorators to perform a more cost-effective query of a subset of your data. For more information, see table decorators.

Using meta-tables

BigQuery offers some special tables whose contents represent metadata, such as the names of your tables. The "meta-tables" are read-only. Normally, you use one by referencing it in a SELECT statement.

Meta-tables can be used in other API operations besides a query job, such as tables.get or tabledata.list. They do not support tables.insert and cannot be used as a destination table; they also do not support table decorators. Meta-tables do not appear in a tables.list of the dataset.

Metadata about tables in a dataset

You can access metadata about the tables in a dataset by using the __TABLES__ or __TABLES_SUMMARY__ meta-table.

Use the following syntax to query a meta-table:

    SELECT [FIELD] FROM [DATASET].__TABLES__;

Where DATASET is the name of your dataset, and FIELD is one of the following:

Field Description
project_id Name of the project.
dataset_id Name of the dataset.
table_id Name of the table.
creation_time The time at which the table was created, in milliseconds since January 1, 1970 UTC.
last_modified_time The time at which the table was most recently changed, in milliseconds since January 1, 1970 UTC.
row_count Number of rows in the table.
size_bytes Total size of the table, measured in bytes.
type An integer representing the table type: a regular table (1) or a view (2).

Example

The following query retrieves metadata about the tables in the publicdata:samples dataset.

    SELECT * FROM publicdata:samples.__TABLES__;

Returns:

+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+
| project_id | dataset_id |    table_id     | creation_time | last_modified_time | row_count |  size_bytes  | type |
+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+
| publicdata | samples    | github_nested   | 1348782587310 |      1348782587310 |   2541639 |   1694950811 |    1 |
| publicdata | samples    | github_timeline | 1335915950690 |      1335915950690 |   6219749 |   3801936185 |    1 |
| publicdata | samples    | gsod            | 1335916040125 |      1440625349328 | 114420316 |  17290009238 |    1 |
| publicdata | samples    | natality        | 1335916045005 |      1440625330604 | 137826763 |  23562717384 |    1 |
| publicdata | samples    | shakespeare     | 1335916045099 |      1440625429551 |    164656 |      6432064 |    1 |
| publicdata | samples    | trigrams        | 1335916127449 |      1445684180324 |  68051509 | 277168458677 |    1 |
| publicdata | samples    | wikipedia       | 1335916132870 |      1445689914564 | 313797035 |  38324173849 |    1 |
+------------+------------+-----------------+---------------+--------------------+-----------+--------------+------+

Scaling to a large number of tables

__TABLES__ provides more information than Tables.list, but Tables.list is faster on a dataset with many tables.

__TABLES_SUMMARY__ is a meta-table that is faster than __TABLES__ because it does not contain the data-dependent fields: last_modified_time, row_count, and size_bytes.

Generally, __TABLES__ and __TABLES_SUMMARY__ are reasonably fast for datasets with up to a few thousand tables. For larger datasets they will become increasingly slow, and may exceed available resources.

Temporary and permanent tables

BigQuery saves all query results to a table, which can be either permanent or temporary:

  • A temporary table is a randomly named table saved in a special dataset; the table has a lifetime of approximately 24 hours. Temporary tables are not available for sharing, and are not visible using any of the standard list or other table manipulation methods.

  • A permanent table can be a new or existing table in any dataset in which you have WRITE privileges.

Additional limits

In addition to the standard query quotas, the following limits also apply for querying data.

  • Maximum tables per query: 1,000
  • Maximum query length: 256 KB

Send feedback about...

BigQuery Documentation