Using Cached Query Results

This document describes how to use cached results in BigQuery.

Overview

BigQuery writes all query results to a table. The table is either explicitly identified by the user (a destination table), or it is a temporary, cached results table. Temporary, cached results tables are maintained per-user, per-project. There are no storage costs for temporary tables, but if you write query results to a permanent table, you are charged for storing the data.

All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions.

Limitations

Using the query cache is subject to the following limitations:

  • When you run a duplicate query, BigQuery attempts to reuse cached results. To retrieve data from the cache, the duplicate query text must be exactly the same as the original query.
  • For query results to persist in a cached results table, the result set must be smaller than the maximum response size. For more information about managing large result sets, see Returning large query results.
  • You cannot target cached result tables with DML statements.
  • Although current semantics allow it, the use of cached results as input for dependendent jobs is strongly discouraged. For example, you should not submit query jobs that retrieve results from the cache table. Instead, write your results to a named destination table. To enable easy cleanup, features such as the dataset level defaultTableExpirationMs property can expire the data automatically after a given duration.

Pricing and quotas

When query results are retrieved from a cached results table, the job statistics property statistics.query.cacheHit returns as true, and you are not charged for the query. Though you are not charged for queries that use cached results, the queries are subject to the BigQuery quota policies. In addition to reducing costs, queries that use cached results are significantly faster because BigQuery does not need to compute the result set.

Exceptions to query caching

Query results are not cached:

  • When a destination table is specified in the job configuration, the web UI, the command line, or the API
  • If any of the referenced tables or logical views have changed since the results were previously cached
  • When any of the tables referenced by the query have recently received streaming inserts (a streaming buffer is attached to the table) even if no new rows have arrived
  • If the query uses non-deterministic functions; for example, date and time functions such as CURRENT_TIMESTAMP() and NOW(), and other functions such as CURRENT_USER() return different values depending on when a query is executed
  • If you are querying multiple tables using a wildcard
  • If the cached results have expired; typical cache lifetime is 24 hours, but the cached results are best-effort and may be invalidated sooner
  • If the query runs against an external data source

How cached results are stored

When you run a query, a temporary, cached results table is created in an internal dataset referred to as an "anonymous dataset". By default, the user that runs the query job is given OWNER access to the anonymous dataset, which in turn, gives the user full control over the cached results table. The user that runs the query job is added to the dataset's access controls by using the user's email address via the "User by e-mail" option. For more information on dataset access controls, see Controlling access to a dataset.

Though the user that runs the query has full access to the dataset and the cached results table, using them as inputs for dependendent jobs is strongly discouraged.

The names of anonymous datasets begin with an underscore. This hides them from the datasets list in the BigQuery web UI. You can list anonymous datasets and audit anonymous dataset access controls by using the CLI or the API.

Disabling retrieval of cached results

The Use cached results option reuses results from a previous run of the same query unless the tables being queried have changed. Using cached results is only beneficial for repeated queries. For new queries, the Use cached results option has no effect, though it is enabled by default.

When you repeat a query with the Use cached results option disabled, the existing cached result is overwritten. This requires BigQuery to compute the query result, and you are charged for the query. This is particularly useful in benchmarking scenarios.

If you want to disable retrieving cached results and force live evaluation of a query job, you can set the configuration.query.useQueryCache property of your query job to false.

To disable the Use cached results option:

Web UI

  1. Go to the BigQuery web UI.
    Go to the BigQuery web UI

  2. Click the Compose query button.

  3. Enter a valid BigQuery SQL query in the New Query text area.

  4. Click Show Options.

  5. Uncheck Use Cached Results.

Command-line

Use the nouse_cache flag to overwrite the query cache. The following example forces BigQuery to process the query without using the existing cached results:

 bq --location=US query --nouse_cache --batch "SELECT name,count FROM mydataset.names_2013 WHERE gender = 'M' ORDER BY count DESC LIMIT 6"

API

To process a query without using the existing cached results, set the useQueryCache property to false.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Go API reference documentation .

// To run this sample, you will need to create (or reuse) a context and
// an instance of the bigquery client.  For example:
// import "cloud.google.com/go/bigquery"
// ctx := context.Background()
// client, err := bigquery.NewClient(ctx, "your-project-id")

q := client.Query(
	"SELECT corpus FROM `bigquery-public-data.samples.shakespeare` GROUP BY corpus;")
q.DisableQueryCache = true
// Location must match that of the dataset(s) referenced in the query.
q.Location = "US"
job, err := q.Run(ctx)
if err != nil {
	return err
}
status, err := job.Wait(ctx)
if err != nil {
	return err
}
if err := status.Err(); err != nil {
	return err
}
it, err := job.Read(ctx)
for {
	var row []bigquery.Value
	err := it.Next(&row)
	if err == iterator.Done {
		break
	}
	if err != nil {
		return err
	}
	fmt.Println(row)
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Java API reference documentation .

To process a query without using the existing chached results, set use query cache to false when creating a QueryJobConfiguration.

// BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
String query = "SELECT corpus FROM `bigquery-public-data.samples.shakespeare` GROUP BY corpus;";
QueryJobConfiguration queryConfig =
    QueryJobConfiguration.newBuilder(query)
        // Disable the query cache to force live query evaluation.
        .setUseQueryCache(false)
        .build();

// Print the results.
for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
  for (FieldValue val : row) {
    System.out.printf("%s,", val.toString());
  }
  System.out.printf("\n");
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery Quickstart Using Client Libraries . For more information, see the BigQuery Python API reference documentation .

# from google.cloud import bigquery
# client = bigquery.Client()

job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query.
    location='US',
    job_config=job_config)  # API request

# Print the results.
for row in query_job:  # API request - fetches results
    print(row)

Ensuring use of the cache

If you use the jobs.insert() function to run a query, you can force a query job to fail unless cached results can be used by setting the createDisposition property of the job configuration to CREATE_NEVER.

If the query result does not exist in the cache, a NOT_FOUND error is returned.

Verifying use of the cache

There are two ways to determine if BigQuery returned a result using the cache:

  • If you are using the BigQuery web UI, the result string does not contain information about the number of processed bytes, and displays the word "cached".

  • If you are using the BigQuery API, the cacheHit property in the query result is set to true.

Was this page helpful? Let us know how we did:

Send feedback about...