Using cached query results
BigQuery writes all query results to a table. The table is either explicitly identified by the user (a destination table), or it is a temporary, cached results table. If you run the exact same query again, BigQuery returns the results from the cached table, if it exists. Temporary, cached results tables are maintained per-user, per-project. Depending on your edition, you might have access to cached results from other users running queries in the same project. There are no storage costs for cached query result tables, but if you write query results to a permanent table, you are charged for storing the data.
All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions.
Limitations
Using the query cache is subject to the following limitations:
- When you run a duplicate query, BigQuery attempts to reuse cached results. To retrieve data from the cache, the duplicate query text must be the same as the original query.
- For query results to persist in a cached results table, the result set must be smaller than the maximum response size. For more information about managing large result sets, see Returning large query results.
- You cannot target cached result tables with DML statements.
- Although current semantics allow it, the use of cached results as
input for dependent jobs is discouraged. For example, you shouldn't
submit query jobs that retrieve results from the cache table. Instead, write
your results to a named destination table. To simplify cleanup, features such
as the dataset level
defaultTableExpirationMs
property can expire the data automatically after a given duration.
Pricing and quotas
Cached query results are stored as temporary tables. You aren't charged for the
storage of cached query results in temporary tables. When query results are
retrieved from a cached results table, the job statistics property
statistics.query.cacheHit
returns as true
, and you are not charged for the
query. Though you are not charged for queries that use cached results, the
queries are subject to the BigQuery
quota policies.
In addition to reducing costs, queries that use cached results are
significantly faster because BigQuery does not need to compute
the result set.
Exceptions to query caching
Query results are not cached:
- When a destination table is specified in the job configuration, the Google Cloud console, the bq command-line tool, or the API.
- If any of the referenced tables or logical views have changed since the results were previously cached.
- When any of the tables referenced by the query have recently received streaming inserts (table has data in the write-optimized storage) even if no new rows have arrived.
- If the query uses non-deterministic functions; for example, date and time
functions such as
CURRENT_TIMESTAMP()
andCURRENT_DATE
, and other functions such asSESSION_USER()
, it returns different values depending on when a query is executed. - If you are querying multiple tables using a wildcard.
- If the cached results have expired; typical cache lifetime is 24 hours, but the cached results are best-effort and may be invalidated sooner.
- If the query runs against an external data source. other than Cloud Storage. (GoogleSQL queries on Cloud Storage are supported by cached query results.)
- If the query runs against a table protected by row-level security, then the results are not cached.
- If the query runs against a table protected by column-level security, results may not be cached.
How cached results are stored
When you run a query, a temporary, cached results table is created in a special
type of hidden dataset
referred to as an anonymous dataset.
Unlike regular datasets which
inherit permissions from the IAM resource hierarchy model (project and
organization permissions), access to anonymous datasets is restricted to the
owner. The owner of an anonymous dataset is the user who ran the query that
produced the cached result. In addition, the bigquery.jobs.create
permission
is checked on the project to verify that the user has access to the project.
BigQuery doesn't support sharing anonymous datasets. If you intend to share query results, don't use the cached results stored in an anonymous dataset. Instead, write the results to a named destination table.
Although the user that runs the query has full access to the dataset and the cached results table, using them as inputs for dependent jobs is discouraged.
The names of anonymous datasets begin with an underscore. This hides them from the datasets list in the Google Cloud console. You can list anonymous datasets and audit anonymous dataset access controls by using the bq command-line tool or the API.
For more information about listing and getting information about datasets, including anonymous datasets, see Listing datasets.
Cross-user caching
If you are using the Enterprise or Enterprise Plus edition and have the required permissions to execute a query that is cached in your project for another user, then BigQuery produces the cached result. The cached result is copied into your personal anonymous dataset and remains there for 24 hours from when you ran the query. The same limits and exceptions for single-user caching apply to cross-user caching.
Disabling retrieval of cached results
The Use cached results option reuses results from a previous run of the same query unless the tables being queried have changed. Using cached results is only beneficial for repeated queries. For new queries, the Use cached results option has no effect, though it is enabled by default.
When you repeat a query with the Use cached results option disabled, the existing cached result is overwritten. This requires BigQuery to compute the query result, and you are charged for the query. This is particularly useful in benchmarking scenarios.
If you want to disable retrieving cached results and force live evaluation of a
query job, you can set the configuration.query.useQueryCache
property of your query job to false
.
To disable the Use cached results option:
Console
Open the Google Cloud console.
Go to the BigQuery pageClick Compose new query.
Enter a valid SQL query in the Query editor text area.
Click More and select Query settings.
For Cache preference, clear Use cached results.
bq
Use the nouse_cache
flag to overwrite the query cache. The following
example forces BigQuery to process the query without using the
existing cached results:
bq query \
--nouse_cache \
--batch \
'SELECT
name,
count
FROM
`my-project`.mydataset.names_2013
WHERE
gender = "M"
ORDER BY
count DESC
LIMIT
6'
API
To process a query without using the existing cached results, set the
useQueryCache
property to false
in the query
job configuration.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
To process a query without using the existing cached results,
set use query cache
to false
when creating a
QueryJobConfiguration.
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PHP
Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Ensuring use of the cache
If you use the jobs.insert
method
to run a query, you can force a query job to fail unless cached results can be
used by setting the createDisposition
property of the query
job
configuration to CREATE_NEVER
.
If the query result does not exist in the cache, a NOT_FOUND
error is
returned.
bq
Use the --require_cache
flag to require results from the query cache. The
following example forces BigQuery to process the query if its
results exist in the cache:
bq query \
--require_cache \
--batch \
'SELECT
name,
count
FROM
`my-project`.mydataset.names_2013
WHERE
gender = "M"
ORDER BY
count DESC
LIMIT
6'
API
To process a query with existing cached results, set the
createDisposition
property
to CREATE_NEVER
in the query
job configuration.
Verifying use of the cache
Use one of the following methods to determine if BigQuery returned a result using the cache:
- Use the Google Cloud console. Go to Query results and click Job Information. Bytes processed shows 0 B (results cached).
- Use the BigQuery API.
The
cacheHit
property in the query result is set totrue
.
Impact of column-level security
By default, BigQuery caches query results for 24 hours, with the exceptions noted previously. Queries against a table protected by column-level security might not be cached. If BigQuery does cache the result, the 24-hour cache lifetime applies.
A change such as removing a group or a user from the Data Catalog Fine Grained Reader role used for a policy tag does not invalidate the 24-hour cache. A change to the Data Catalog Fine Grained Reader access control group itself is propagated immediately, but the change does not invalidate the cache.
The impact is if a user ran a query, the query results remain visible to the user on screen. The user can also retrieve those results from the cache even if they lost access to the data within the last 24 hours.
During the 24 hours after a user is removed from the Data Catalog Fine Grained Reader role for a policy tag, the user can access the cached data only for data that the user was previously allowed to see. If rows are added to the table, the user can't see the added rows, even if the results are cached.