Estimate storage and query costs
In BigQuery, on-demand queries are charged based on the number of bytes read. For current on-demand query pricing, see the Pricing page.
To estimate costs before running a query, you can use one of the following methods:
- Query validator in the Google Cloud console
--dry_run
flag in thebq
command-line tooldryRun
parameter when submitting a query job using the API- The Google Cloud Pricing Calculator
- Client libraries
The estimated costs provided by these methods might vary from the actual costs due to several factors. For example, consider the following scenarios:
- A query clause that filters data, such as a
WHERE
clause, might significantly reduce the number of bytes that are read. - Data that is added or deleted after the estimate is provided could increase or decrease the number of bytes that are read when the query is run.
Estimate query costs
To estimate query costs:
Console
When you enter a query in the Google Cloud console, the query validator verifies the query syntax and provides an estimate of the number of bytes read. You can use this estimate to calculate query cost in the Pricing Calculator.
bq
When you run a query in the bq
command-line tool, you can use the --dry_run
flag to
estimate the number of bytes read. You can use this estimate to calculate
query cost in the Pricing Calculator.
A bq
tool query that uses the --dry_run
flag looks like the following:
bq query \ --use_legacy_sql=false \ --dry_run \ 'SELECT column1, column2, column3 FROM `project_id.dataset.table` LIMIT 1000'
When you run the command, the response contains the estimated bytes read:
Query successfully validated. Assuming the tables are not modified, running
this query will process 10918 bytes of data.
API
To perform a dry run by using the API, submit a query job with
dryRun
set to true
.
go
Before trying this sample, follow the Go setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Go API
reference documentation.
Java
Before trying this sample, follow the Java setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Java API
reference documentation.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Node.js API
reference documentation.
PHP
Before trying this sample, follow the PHP setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery PHP API
reference documentation.
Python
Before trying this sample, follow the Python setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Python API
reference documentation.
True
.
Client.query()
always returns a completed
QueryJob
when provided a dry run query configuration.
Estimate query costs with the Google Cloud Pricing Calculator
To estimate on-demand query costs in the Google Cloud Pricing Calculator, enter the number of bytes that are processed by the query as B, KB, MB, GB, TB, or PB. If your query processes less than 1 TB, the estimate is $0 because BigQuery provides 1 TB of on-demand query processing free per month.
To estimate the cost of a query using the pricing calculator:
- Open the Google Cloud Pricing Calculator,
- Click BigQuery.
- Click the On-Demand tab.
- For Table Name, type the name of the table. For example,
airports
. - For Storage Pricing, enter
0
in the Storage field. - For Query Pricing, enter the estimated bytes read from your dry run or
the query validator.
- Click Add To Estimate.
- The estimate appears to the right. Notice that you can save or email the
estimate.
In this case, the number of bytes read by the query is below the 1 TB of on-demand processing provided through the free tier. As a result, the estimated cost is $0.
Include flat-rate pricing in the Pricing Calculator
If you have flat-rate pricing applied to your billing account, you can click the Flat-Rate tab, choose your flat-rate plan, and add your storage costs to the estimate.
For more information, see Flat-rate pricing.
Estimate storage costs with the Google Cloud Pricing Calculator
To estimate storage costs in the Google Cloud Pricing Calculator, enter the number of bytes that are stored as B, KB, MB, GB, TB, or PB. BigQuery provides 10 GB of storage free per month.
To estimate storage costs using the pricing calculator:
- Open the Google Cloud Pricing Calculator.
- Click BigQuery.
- Click the On-Demand tab.
- For Table Name, type the name of the table. For example,
airports
. - For Storage Pricing, enter
100
in the Storage field. Leave the measure set toGB
. - Click Add To Estimate.
- The estimate appears to the right. Notice that you can save or email the
estimate.
Query size calculation
This section describes how to calculate the number of bytes processed by various types of query using the on-demand billing model.
DML statements
If you use on-demand billing, BigQuery charges for data manipulation language (DML) statements based on the number of bytes processed by the statement.
Non-partitioned tables
For non-partitioned tables, the number of bytes processed is calculated as follows:
- q = The sum of bytes processed for the columns referenced in the tables scanned by the query.
- t = The sum of bytes for all columns in the updated table, at the time the query starts, regardless of whether those columns are referenced or modified in the query.
DML statement | Bytes processed |
---|---|
INSERT |
q |
UPDATE |
q + t |
DELETE |
q + t |
MERGE |
If there are only INSERT clauses: q. If there is an UPDATE or DELETE clause: q + t. |
Partitioned tables
For partitioned tables, the number of bytes processed is calculated as follows:
- q' = The sum of bytes processed for the columns referenced in all partitions scanned by the query.
- t' = The sum of bytes for all columns in the updated or scanned partitions for the updated rows, at the time the query starts, regardless of whether those columns are referenced or modified in the query.
DML statement | Bytes processed |
---|---|
INSERT |
q' |
UPDATE |
q' + t' |
DELETE |
q' + t' |
MERGE |
If there are only INSERT clauses in the MERGE statement: q'. If there is an UPDATE or DELETE clause in the MERGE statement: q' + t'. |
DDL statements
If you use on-demand billing, BigQuery charges for data definition language (DDL) queries based on the number of bytes processed by the query.
DDL statement | Bytes processed |
---|---|
CREATE TABLE |
None. |
CREATE TABLE ... AS SELECT ... |
The sum of bytes processed for all the columns referenced from the tables scanned by the query. |
CREATE VIEW |
None. |
DROP TABLE |
None. |
DROP VIEW |
None. |
Multi-statement queries
If you use on-demand billing, BigQuery charges for multi-statement queries based on the number of bytes processed during execution of the multi-statement queries.
The following pricing applies for these multi-statement queries:
DECLARE
: the sum of bytes scanned for any tables referenced in theDEFAULT
expression.DECLARE
statements with no table references do not incur a cost.SET
: the sum of bytes scanned for any tables referenced in the expression.SET
statements with no table references do not incur a cost.IF
: the sum of bytes scanned for any tables referenced in the condition expression.IF
condition expressions with no table reference do not incur a cost. Any statements within theIF
block that are not executed do not incur a cost.WHILE
: the sum of bytes scanned for any tables referenced in the condition expression.WHILE
statements with no table references in the condition expression do not incur a cost. Any statements within theWHILE
block that are not executed do not incur a cost.CONTINUE
orITERATE
: No associated cost.BREAK
orLEAVE
: No associated cost.BEGIN
orEND
: No associated cost.
Temporary tables do not incur charges for storage while a multi-statement query is running. However, regular pricing occurs for any statements that create, modify, or query them.
If a multi-statement query fails, the cost of any statements up until the failure still applies. The statement that failed does not incur any costs.
Multi-statement query pricing example
The following example contains comments preceding every statement that explain what cost, if any, is incurred by the following statement.
-- No cost, since no tables are referenced.
DECLARE x DATE DEFAULT CURRENT_DATE();
-- Incurs the cost of scanning string_col from dataset.table.
DECLARE y STRING DEFAULT (SELECT MAX(string_col) FROM dataset.table);
-- Incurs the cost of copying the data from dataset.big_table. Once the
-- table is created, you are not charged for storage while the rest of the
-- multi-statement query runs.
CREATE TEMP TABLE t AS SELECT * FROM dataset.big_table;
-- Incurs the cost of scanning column1 from temporary table t.
SELECT column1 FROM t;
-- No cost, since y = 'foo' doesn't reference a table.
IF y = 'foo' THEN
-- Incurs the cost of scanning all columns from dataset.other_table, if
-- y was equal to 'foo', or otherwise no cost since it is not executed.
SELECT * FROM dataset.other_table;
ELSE
-- Incurs the cost of scanning all columns from dataset.different_table, if
-- y was not equal to 'foo', or otherwise no cost since it is not executed.
UPDATE dataset.different_table
SET col = 10
WHERE true;
END IF;
-- Incurs the cost of scanning date_col from dataset.table for each
-- iteration of the loop.
WHILE x < (SELECT MIN(date_col) FROM dataset.table) DO
-- No cost, since the expression does not reference any tables.
SET x = DATE_ADD(x, INTERVAL 1 DAY);
-- No cost, since the expression does not reference any tables.
IF true THEN
-- LEAVE has no associated cost.
LEAVE;
END IF;
-- Never executed, since the IF branch is always taken, so does not incur
-- a cost.
SELECT * FROM dataset.big_table;
END WHILE;
Clustered tables
Clustered tables can help you to reduce query costs by pruning data so it is not processed by the query. This process is called block pruning.
Block pruning
BigQuery sorts the data in a clustered table based on the values in the clustering columns and organizes them into blocks.
When you run a query against a clustered table, and the query includes a filter on the clustered columns, BigQuery uses the filter expression and the block metadata to prune the blocks scanned by the query. This allows BigQuery to scan only the relevant blocks.
When a block is pruned, it is not scanned. Only the scanned blocks are used to calculate the bytes of data processed by the query. The number of bytes processed by a query against a clustered table equals the sum of the bytes read in each column referenced by the query in the scanned blocks.
If a clustered table is referenced multiple times in a query that uses several filters, BigQuery charges for scanning the columns in the appropriate blocks in each of the respective filters.
Clustered table pricing example
You have a clustered table named ClusteredSalesData
. The table is partitioned
by the timestamp
column, and it is clustered by the customer_id
column. The
data is organized into the following set of blocks:
Partition identifier | Block ID | Minimum value for customer_id in the block | Maximum value for customer_id in the block |
---|---|---|---|
20160501 | B1 | 10000 | 19999 |
20160501 | B2 | 20000 | 24999 |
20160502 | B3 | 15000 | 17999 |
20160501 | B4 | 22000 | 27999 |
You run the following query against the table. The query contains a filter on
the customer_id
column.
SELECT SUM(totalSale) FROM `mydataset.ClusteredSalesData` WHERE customer_id BETWEEN 20000 AND 23000 AND DATE(timestamp) = "2016-05-01"
This query:
- Scans the
timestamp
,customer_id
, andtotalSale
columns in blocks B2 and B4. - Prunes the B3 block because of the
DATE(timestamp) = "2016-05-01"
filter predicate on thetimestamp
partitioning column. - Prunes the B1 block because of the
customer_id BETWEEN 20000 AND 23000
filter predicate on thecustomer_id
clustering column.
Query columnar formats on Cloud Storage
If your external data is stored in ORC or Parquet, the number of bytes charged is limited to the columns that BigQuery reads. Because the data types from an external data source are converted to BigQuery data types by the query, the number of bytes read is computed based on the size of BigQuery data types. For information about data type conversions, see the following pages: