Estimating storage and query costs

In BigQuery, on-demand queries are charged based on the number of bytes read. For current on-demand query pricing, see the Pricing page.

To estimate costs before running a query, you can use one of the following methods:

  • Query validator in the Cloud Console
  • --dry_run flag in the bq command-line tool
  • dryRun parameter when submitting a query job using the API
  • The Google Cloud Pricing Calculator
  • Client libraries

The estimated costs provided by these methods might vary from the actual costs due to several factors. For example, consider the following scenarios:

  • A query clause that filters data, such as a WHERE clause, might significantly reduce the number of bytes that are read.
  • Data that is added or deleted after the estimate is provided could increase or decrease the number of bytes that are read when the query is run.

Estimating query costs

To estimate query costs:

Console

When you enter a query in the Cloud Console, the query validator verifies the query syntax and provides an estimate of the number of bytes read. You can use this estimate to calculate query cost in the Pricing Calculator.

Query validator

bq

When you run a query in the bq command-line tool, you can use the --dry_run flag to estimate the number of bytes read. You can use this estimate to calculate query cost in the Pricing Calculator.

A bq tool query that uses the --dry_run flag looks like the following:

bq query \
--use_legacy_sql=false \
--dry_run \
'SELECT
  column1,
  column2,
  column3
FROM
  `project_id.dataset.table`
LIMIT
  1000'

When you run the command, the response contains the estimated bytes read: Query successfully validated. Assuming the tables are not modified, running this query will process 10918 bytes of data.

API

To perform a dry run by using the API, submit a query job with dryRun set to true.

go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
)

// queryDryRun demonstrates issuing a dry run query to validate query structure and
// provide an estimate of the bytes scanned.
func queryDryRun(w io.Writer, projectID string) error {
	// projectID := "my-project-id"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	q := client.Query(`
	SELECT
		name,
		COUNT(*) as name_count
	FROM ` + "`bigquery-public-data.usa_names.usa_1910_2013`" + `
	WHERE state = 'WA'
	GROUP BY name`)
	q.DryRun = true
	// Location must match that of the dataset(s) referenced in the query.
	q.Location = "US"

	job, err := q.Run(ctx)
	if err != nil {
		return err
	}
	// Dry run is not asynchronous, so get the latest status and statistics.
	status := job.LastStatus()
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "This query will process %d bytes\n", status.Statistics.TotalBytesProcessed)
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.JobStatistics;
import com.google.cloud.bigquery.QueryJobConfiguration;

// Sample to run dry query on the table
public class QueryDryRun {

  public static void runQueryDryRun() {
    String query =
        "SELECT name, COUNT(*) as name_count "
            + "FROM `bigquery-public-data.usa_names.usa_1910_2013` "
            + "WHERE state = 'WA' "
            + "GROUP BY name";
    queryDryRun(query);
  }

  public static void queryDryRun(String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration queryConfig =
          QueryJobConfiguration.newBuilder(query).setDryRun(true).setUseQueryCache(false).build();

      Job job = bigquery.create(JobInfo.of(queryConfig));
      JobStatistics.QueryStatistics statistics = job.getStatistics();

      System.out.println(
          "Query dry run performed successfully." + statistics.getTotalBytesProcessed());
    } catch (BigQueryException e) {
      System.out.println("Query not performed \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function queryDryRun() {
  // Runs a dry query of the U.S. given names dataset for the state of Texas.

  const query = `SELECT name
    FROM \`bigquery-public-data.usa_names.usa_1910_2013\`
    WHERE state = 'TX'
    LIMIT 100`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
    // Location must match that of the dataset(s) referenced in the query.
    location: 'US',
    dryRun: true,
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);

  // Print the status and statistics
  console.log('Status:');
  console.log(job.metadata.status);
  console.log('\nJob Statistics:');
  console.log(job.metadata.statistics);
}

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To perform a dry run using the Python client library, set the QueryJobConfig.dry_run property to True. Client.query() always returns a completed QueryJob when provided a dry run query configuration.
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)

# Start the query, passing in the extra configuration.
query_job = client.query(
    (
        "SELECT name, COUNT(*) as name_count "
        "FROM `bigquery-public-data.usa_names.usa_1910_2013` "
        "WHERE state = 'WA' "
        "GROUP BY name"
    ),
    job_config=job_config,
)  # Make an API request.

# A dry run query completes immediately.
print("This query will process {} bytes.".format(query_job.total_bytes_processed))

Estimating query costs using the Google Cloud Pricing Calculator

To estimate on-demand query costs in the Google Cloud Pricing Calculator, enter the number of bytes that are processed by the query as B, KB, MB, GB, TB, or PB. If your query processes less than 1 TB, the estimate is $0 because BigQuery provides 1 TB of on-demand query processing free per month.

Pricing calculator

To estimate the cost of a query using the pricing calculator:

  1. Open the Google Cloud Pricing Calculator,
  2. Click BigQuery.
  3. Click the On-Demand tab.
  4. For Table Name, type the name of the table. For example, airports.
  5. For Storage Pricing, enter 0 in the Storage field.
  6. For Query Pricing, enter the estimated bytes read from your dry run or the query validator. Calculator
  7. Click Add To Estimate.
  8. The estimate appears to the right. Notice that you can save or email the estimate. On-demand calculator

In this case, the number of bytes read by the query is below the 1 TB of on-demand processing provided via the free tier. As a result, the estimated cost is $0.

Including flat-rate pricing in the Pricing Calculator

If you have flat-rate pricing applied to your billing account, you can click the Flat-Rate tab, choose your flat-rate plan, and add your storage costs to the estimate.

Flat-rate calculator

For more information, see Flat-rate pricing.

Estimating storage costs using the Google Cloud Pricing Calculator

To estimate storage costs in the Google Cloud Pricing Calculator, enter the number of bytes that are stored as B, KB, MB, GB, TB, or PB. BigQuery provides 10 GB of storage free per month.

To estimate storage costs using the pricing calculator:

  1. Open the Google Cloud Pricing Calculator.
  2. Click BigQuery.
  3. Click the On-Demand tab.
  4. For Table Name, type the name of the table. For example, airports.
  5. For Storage Pricing, enter 100 in the Storage field. Leave the measure set to GB.
  6. Click Add To Estimate.
  7. The estimate appears to the right. Notice that you can save or email the estimate. Pricing calculator

Query size calculation

This section describes how to calculate the number of bytes processed by various types of query using the on-demand billing model.

DML statements

If you use on-demand billing, BigQuery charges for data manipulation language (DML) statements based on the number of bytes processed by the statement.

Non-partitioned tables

For non-partitioned tables, the number of bytes processed is calculated as follows:

  • q = The sum of bytes processed for the columns referenced in the tables scanned by the query.
  • t = The sum of bytes for all columns in the updated table, at the time the query starts, regardless of whether those columns are referenced or modified in the query.
DML statement Bytes processed
INSERT q
UPDATE q + t
DELETE q + t
MERGE If there are only INSERT clauses: q.
If there is an UPDATE or DELETE clause: q + t.

Partitioned tables

For partitioned tables, the number of bytes processed is calculated as follows:

  • q' = The sum of bytes processed for the columns referenced in all partitions scanned by the query.
  • t' = The sum of bytes for all columns in the updated or scanned partitions for the updated rows, at the time the query starts, regardless of whether those columns are referenced or modified in the query.
DML statement Bytes processed
INSERT q'
UPDATE q' + t'
DELETE q' + t'
MERGE If there are only INSERT clauses in the MERGE statement: q'.
If there is an UPDATE or DELETE clause in the MERGE statement: q' + t'.

DDL statements

If you use on-demand billing, BigQuery charges for data definition language (DDL) queries based on the number of bytes processed by the query.

DDL statement Bytes processed
CREATE TABLE None.
CREATE TABLE ... AS SELECT ... The sum of bytes processed for all the columns referenced from the tables scanned by the query.
CREATE VIEW None.
DROP TABLE None.
DROP VIEW None.

Scripting

If you use on-demand billing, BigQuery charges for scripting based on the number of bytes processed during execution of the script.

The following pricing applies for scripting-specific statement types:

  • DECLARE: the sum of bytes scanned for any tables referenced in the DEFAULT expression. DECLARE statements with no table references do not incur a cost.
  • SET: the sum of bytes scanned for any tables referenced in the expression. SET statements with no table references do not incur a cost.
  • IF: the sum of bytes scanned for any tables referenced in the condition expression. IF condition expressions with no table reference do not incur a cost. Any statements within the IF block that are not executed do not incur a cost.
  • WHILE: the sum of bytes scanned for any tables referenced in the condition expression. WHILE statements with no table references in the condition expression do not incur a cost. Any statements within the WHILE block that are not executed do not incur a cost.
  • CONTINUE or ITERATE: No associated cost.
  • BREAK or LEAVE: No associated cost.
  • BEGIN or END: No associated cost.

Temporary tables do not incur charges for storage while the script is running. However, regular pricing occurs for any statements that create, modify, or query them.

If a script fails, the cost of any statements up until the failure still applies. The statement that failed does not incur any costs.

Scripting pricing example

The following example script contains comments above every statement that explain what cost, if any, is incurred by the following statement.

-- No cost, since no tables are referenced.
DECLARE x DATE DEFAULT CURRENT_DATE();
-- Incurs the cost of scanning string_col from dataset.table.
DECLARE y STRING DEFAULT (SELECT MAX(string_col) FROM dataset.table);
-- Incurs the cost of copying the data from dataset.big_table.  Once the
-- table is created, you are not charged for storage while the rest of the
-- script runs.
CREATE TEMP TABLE t AS SELECT * FROM dataset.big_table;
-- Incurs the cost of scanning column1 from temporary table t.
SELECT column1 FROM t;
-- No cost, since y = 'foo' doesn't reference a table.
IF y = 'foo' THEN
  -- Incurs the cost of scanning all columns from dataset.other_table, if
  -- y was equal to 'foo', or otherwise no cost since it is not executed.
  SELECT * FROM dataset.other_table;
ELSE
  -- Incurs the cost of scanning all columns from dataset.different_table, if
  -- y was not equal to 'foo', or otherwise no cost since it is not executed.
  UPDATE dataset.different_table
  SET col = 10
  WHERE true;
END IF;
-- Incurs the cost of scanning date_col from dataset.table for each
-- iteration of the loop.
WHILE x < (SELECT MIN(date_col) FROM dataset.table) DO
  -- No cost, since the expression does not reference any tables.
  SET x = DATE_ADD(x, INTERVAL 1 DAY);
  -- No cost, since the expression does not reference any tables.
  IF true THEN
    -- LEAVE has no associated cost.
    LEAVE;
  END IF;
  -- Never executed, since the IF branch is always taken, so does not incur
  -- a cost.
  SELECT * FROM dataset.big_table;
END WHILE;

Clustered tables

Clustered tables can help you to reduce query costs by pruning data so it is not processed by the query. This process is called block pruning.

Block pruning

BigQuery sorts the data in a clustered table based on the values in the clustering columns and organizes them into blocks.

When you run a query against a clustered table, and the query includes a filter on the clustered columns, BigQuery uses the filter expression and the block metadata to prune the blocks scanned by the query. This allows BigQuery to scan only the relevant blocks.

When a block is pruned, it is not scanned. Only the scanned blocks are used to calculate the bytes of data processed by the query. The number of bytes processed by a query against a clustered table equals the sum of the bytes read in each column referenced by the query in the scanned blocks.

If a clustered table is referenced multiple times in a query that uses several filters, BigQuery charges for scanning the columns in the appropriate blocks in each of the respective filters.

Clustered table pricing example

You have a clustered table named ClusteredSalesData. The table is partitioned by the timestamp column, and it is clustered by the customer_id column. The data is organized into the following set of blocks:

Partition identifier Block ID Minimum value for customer_id in the block Maximum value for customer_id in the block
20160501 B1 10000 19999
20160501 B2 20000 24999
20160502 B3 15000 17999
20160501 B4 22000 27999

You run the following query against the table. The query contains a filter on the customer_id column.

SELECT
  SUM(totalSale)
FROM
  `mydataset.ClusteredSalesData`
WHERE
  customer_id BETWEEN 20000
  AND 23000
  AND DATE(timestamp) = "2016-05-01"

This query:

  • Scans the timestamp, customer_id, and totalSale columns in blocks B2 and B4.
  • Prunes the B3 block because of the DATE(timestamp) = "2016-05-01" filter predicate on the timestamp partitioning column.
  • Prunes the B1 block because of the customer_id BETWEEN 20000 AND 23000 filter predicate on the customer_id clustering column.

Querying columnar formats on Cloud Storage

If your external data is stored in ORC or Parquet, the number of bytes charged is limited to the columns that BigQuery reads. Because the data types from an external data source are converted to BigQuery data types by the query, the number of bytes read is computed based on the size of BigQuery data types. For information about data type conversions, see the following pages: