Estimate query costs
BigQuery offers two compute (analysis) pricing models:
On-demand pricing: You pay for the data scanned by your queries. You have a fixed, per-project query processing capacity, and your cost is based on the number of bytes processed by each query.
Capacity-based pricing: You pay for dedicated or autoscaled query processing capacity, measured in slots, over a period of time. Multiple queries share the same slot capacity.
This page provides information about estimating your query costs in BigQuery.
Estimate on-demand query costs
To estimate costs before running a query using the on-demand pricing model, you can use one of the following methods:
- Query validator in the Google Cloud console
--dry_run
flag in thebq
command-line tooldryRun
parameter when submitting a query job using the API or client libraries
The estimated costs provided by these methods might vary from the actual costs due to several factors. For example, consider the following scenarios:
- A query clause that filters data, such as a
WHERE
clause, might significantly reduce the number of bytes that are read. - Data that is added or deleted after the estimate is provided could increase or decrease the number of bytes that are read when the query is run.
Console
When you enter a query in the Google Cloud console, the query validator verifies the query syntax and provides an estimate of the number of bytes read. You can use this estimate to calculate query cost in the Pricing Calculator.
bq
When you run a query in the bq
command-line tool, you can use the --dry_run
flag to
estimate the number of bytes read. You can use this estimate to calculate
query cost in the Pricing Calculator.
A bq
tool query that uses the --dry_run
flag looks like the following:
bq query \ --use_legacy_sql=false \ --dry_run \ 'SELECT column1, column2, column3 FROM `project_id.dataset.table` LIMIT 1000'
When you run the command, the response contains the estimated bytes read:
Query successfully validated. Assuming the tables are not modified, running
this query will process 10918 bytes of data.
API
To perform a dry run by using the API, submit a query job with
dryRun
set to true
.
go
Before trying this sample, follow the Go setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Go API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Java API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Node.js API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
PHP
Before trying this sample, follow the PHP setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery PHP API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Python API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for a local development environment.
True
.
Client.query()
always returns a completed
QueryJob
when provided a dry run query configuration.
On-demand query size calculation
This section describes how to calculate the number of bytes processed by various types of query using the on-demand billing model.
DML statements
If you use on-demand billing, BigQuery charges for data manipulation language (DML) statements based on the number of bytes processed by the statement.
Non-partitioned tables
For non-partitioned tables, the number of bytes processed is calculated as follows:
- q = The sum of bytes processed by the DML statement itself, including any columns referenced in tables scanned by the DML statement.
- t = The sum of bytes for all columns in the table updated by the DML statement at the time the query starts, regardless of whether those columns are referenced or modified by the DML statement.
DML statement | Bytes processed |
---|---|
INSERT |
q |
UPDATE |
q + t |
DELETE |
q + t |
MERGE |
If there are only INSERT clauses: q. If there is an UPDATE or DELETE clause: q + t. |
Partitioned tables
For partitioned tables, the number of bytes processed is calculated as follows:
- q' = The sum of bytes processed by the DML statement itself, including any columns referenced in all partitions scanned by the DML statement.
- t' = The sum of bytes for all columns in the partitions updated by the DML statement at the time the query starts, regardless of whether those columns are referenced or modified by the DML statement.
DML statement | Bytes processed |
---|---|
INSERT |
q' |
UPDATE |
q' + t' |
DELETE |
q' + t' |
MERGE |
If there are only INSERT clauses in the MERGE statement: q'. If there is an UPDATE or DELETE clause in the MERGE statement: q' + t'. |
DDL statements
If you use on-demand billing, BigQuery charges for data definition language (DDL) queries based on the number of bytes processed by the query.
DDL statement | Bytes processed |
---|---|
CREATE TABLE |
None. |
CREATE TABLE ... AS SELECT ... |
The sum of bytes processed for all the columns referenced from the tables scanned by the query. |
CREATE VIEW |
None. |
DROP TABLE |
None. |
DROP VIEW |
None. |
Multi-statement queries
If you use on-demand billing, BigQuery charges for multi-statement queries based on the number of bytes processed during execution of the multi-statement queries.
The following pricing applies for these multi-statement queries:
DECLARE
: the sum of bytes scanned for any tables referenced in theDEFAULT
expression.DECLARE
statements with no table references do not incur a cost.SET
: the sum of bytes scanned for any tables referenced in the expression.SET
statements with no table references do not incur a cost.IF
: the sum of bytes scanned for any tables referenced in the condition expression.IF
condition expressions with no table reference do not incur a cost. Any statements within theIF
block that are not executed do not incur a cost.WHILE
: the sum of bytes scanned for any tables referenced in the condition expression.WHILE
statements with no table references in the condition expression do not incur a cost. Any statements within theWHILE
block that are not executed do not incur a cost.CONTINUE
orITERATE
: No associated cost.BREAK
orLEAVE
: No associated cost.BEGIN
orEND
: No associated cost.
If a multi-statement query fails, the cost of any statements up until the failure still applies. The statement that failed does not incur any costs.
Multi-statement query pricing example
The following example contains comments preceding every statement that explain what cost, if any, is incurred by the following statement.
-- No cost, since no tables are referenced.
DECLARE x DATE DEFAULT CURRENT_DATE();
-- Incurs the cost of scanning string_col from dataset.table.
DECLARE y STRING DEFAULT (SELECT MAX(string_col) FROM dataset.table);
-- Incurs the cost of copying the data from dataset.big_table. Once the
-- table is created, you are not charged for storage while the rest of the
-- multi-statement query runs.
CREATE TEMP TABLE t AS SELECT * FROM dataset.big_table;
-- Incurs the cost of scanning column1 from temporary table t.
SELECT column1 FROM t;
-- No cost, since y = 'foo' doesn't reference a table.
IF y = 'foo' THEN
-- Incurs the cost of scanning all columns from dataset.other_table, if
-- y was equal to 'foo', or otherwise no cost since it is not executed.
SELECT * FROM dataset.other_table;
ELSE
-- Incurs the cost of scanning all columns from dataset.different_table, if
-- y was not equal to 'foo', or otherwise no cost since it is not executed.
UPDATE dataset.different_table
SET col = 10
WHERE true;
END IF;
-- Incurs the cost of scanning date_col from dataset.table for each
-- iteration of the loop.
WHILE x < (SELECT MIN(date_col) FROM dataset.table) DO
-- No cost, since the expression does not reference any tables.
SET x = DATE_ADD(x, INTERVAL 1 DAY);
-- No cost, since the expression does not reference any tables.
IF true THEN
-- LEAVE has no associated cost.
LEAVE;
END IF;
-- Never executed, since the IF branch is always taken, so does not incur
-- a cost.
SELECT * FROM dataset.big_table;
END WHILE;
Clustered tables
Clustered tables can help you to reduce query costs by pruning data so it is not processed by the query. This process is called block pruning.
Block pruning
BigQuery sorts the data in a clustered table based on the values in the clustering columns and organizes them into blocks.
When you run a query against a clustered table, and the query includes a filter on the clustered columns, BigQuery uses the filter expression and the block metadata to prune the blocks scanned by the query. This allows BigQuery to scan only the relevant blocks.
When a block is pruned, it is not scanned. Only the scanned blocks are used to calculate the bytes of data processed by the query. The number of bytes processed by a query against a clustered table equals the sum of the bytes read in each column referenced by the query in the scanned blocks.
If a clustered table is referenced multiple times in a query that uses several filters, BigQuery charges for scanning the columns in the appropriate blocks in each of the respective filters.
Clustered table pricing example
You have a clustered table named ClusteredSalesData
. The table is partitioned
by the timestamp
column, and it is clustered by the customer_id
column. The
data is organized into the following set of blocks:
Partition identifier | Block ID | Minimum value for customer_id in the block | Maximum value for customer_id in the block |
---|---|---|---|
20160501 | B1 | 10000 | 19999 |
20160501 | B2 | 20000 | 24999 |
20160502 | B3 | 15000 | 17999 |
20160501 | B4 | 22000 | 27999 |
You run the following query against the table. The query contains a filter on
the customer_id
column.
SELECT SUM(totalSale) FROM `mydataset.ClusteredSalesData` WHERE customer_id BETWEEN 20000 AND 23000 AND DATE(timestamp) = "2016-05-01"
This query:
- Scans the
timestamp
,customer_id
, andtotalSale
columns in blocks B2 and B4. - Prunes the B3 block because of the
DATE(timestamp) = "2016-05-01"
filter predicate on thetimestamp
partitioning column. - Prunes the B1 block because of the
customer_id BETWEEN 20000 AND 23000
filter predicate on thecustomer_id
clustering column.
Query columnar formats on Cloud Storage
If your external data is stored in ORC or Parquet, the number of bytes charged is limited to the columns that BigQuery reads. Because the data types from an external data source are converted to BigQuery data types by the query, the number of bytes read is computed based on the size of BigQuery data types. For information about data type conversions, see the following pages:
Use the Google Cloud Pricing Calculator
The Google Cloud Pricing Calculator can help you create an overall monthly cost estimate for BigQuery based on projected usage.
On-demand
To estimate costs in the Google Cloud Pricing Calculator when using the on-demand pricing model, follow these steps:
- Open the Google Cloud Pricing Calculator.
- Click BigQuery.
- Click the On-Demand tab.
- For Table Name, type the name of the table. For example,
airports
. - For Storage Pricing, enter the estimated size of the table in the storage fields. You will only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
- For Query Pricing, enter the estimated bytes read from your dry run or the query validator.
- Click Add To Estimate.
- The estimate appears to the right. Notice that you can save or email the estimate.
For more information, see on-demand pricing.
Flat-rate
To estimate costs in the Google Cloud Pricing Calculator when using the capacity-based pricing model with flat-rate commitments, follow these steps:
- Open the Google Cloud Pricing Calculator.
- Click BigQuery.
- Click the Flat-Rate tab.
- Choose the location where the slots are used.
- Choose the Commitment period.
- Specify the number of slots
- Choose the location where the data is stored.
- Enter your estimations of storage usage for Active storage, Long-term storage, Streaming inserts, and Streaming reads. You will only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
- Click Add to Estimate.
For more information, see capacity-based pricing.
Editions
To estimate costs in the Google Cloud Pricing Calculator when using the capacity-based pricing model with BigQuery editions, follow these steps:
- Open the Google Cloud Pricing Calculator.
- Click BigQuery.
- Click the Editions tab.
- Choose the location where the slots are used.
- Choose your Edition.
- Choose the Maximum slots, Baseline slots, optional Commitment, and Estimated utilization of autoscaling.
- Choose the location where the data is stored.
- Enter your estimations of storage usage for Active storage, Long-term storage, Streaming inserts, and Streaming reads. You will only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
- Click Add to Estimate.
For more information, see capacity-based pricing.