Estimate query costs

BigQuery offers two compute (analysis) pricing models:

  • On-demand pricing: You pay for the data scanned by your queries. You have a fixed, per-project query processing capacity, and your cost is based on the number of bytes processed by each query.

  • Capacity-based pricing: You pay for dedicated or autoscaled query processing capacity, measured in slots, over a period of time. Multiple queries share the same slot capacity.

This page provides information about estimating your query costs in BigQuery.

Estimate on-demand query costs

To estimate costs before running a query using the on-demand pricing model, you can use one of the following methods:

  • Query validator in the Google Cloud console
  • --dry_run flag in the bq command-line tool
  • dryRun parameter when submitting a query job using the API or client libraries

The estimated costs provided by these methods might vary from the actual costs due to several factors. For example, consider the following scenarios:

  • A query clause that filters data, such as a WHERE clause, might significantly reduce the number of bytes that are read.
  • Data that is added or deleted after the estimate is provided could increase or decrease the number of bytes that are read when the query is run.

Console

When you enter a query in the Google Cloud console, the query validator verifies the query syntax and provides an estimate of the number of bytes read. You can use this estimate to calculate query cost in the Pricing Calculator.

Query validator

bq

When you run a query in the bq command-line tool, you can use the --dry_run flag to estimate the number of bytes read. You can use this estimate to calculate query cost in the Pricing Calculator.

A bq tool query that uses the --dry_run flag looks like the following:

bq query \
--use_legacy_sql=false \
--dry_run \
'SELECT
  column1,
  column2,
  column3
FROM
  `project_id.dataset.table`
LIMIT
  1000'

When you run the command, the response contains the estimated bytes read: Query successfully validated. Assuming the tables are not modified, running this query will process 10918 bytes of data.

API

To perform a dry run by using the API, submit a query job with dryRun set to true.

go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/bigquery"
)

// queryDryRun demonstrates issuing a dry run query to validate query structure and
// provide an estimate of the bytes scanned.
func queryDryRun(w io.Writer, projectID string) error {
	// projectID := "my-project-id"
	ctx := context.Background()
	client, err := bigquery.NewClient(ctx, projectID)
	if err != nil {
		return fmt.Errorf("bigquery.NewClient: %v", err)
	}
	defer client.Close()

	q := client.Query(`
	SELECT
		name,
		COUNT(*) as name_count
	FROM ` + "`bigquery-public-data.usa_names.usa_1910_2013`" + `
	WHERE state = 'WA'
	GROUP BY name`)
	q.DryRun = true
	// Location must match that of the dataset(s) referenced in the query.
	q.Location = "US"

	job, err := q.Run(ctx)
	if err != nil {
		return err
	}
	// Dry run is not asynchronous, so get the latest status and statistics.
	status := job.LastStatus()
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "This query will process %d bytes\n", status.Statistics.TotalBytesProcessed)
	return nil
}

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobInfo;
import com.google.cloud.bigquery.JobStatistics;
import com.google.cloud.bigquery.QueryJobConfiguration;

// Sample to run dry query on the table
public class QueryDryRun {

  public static void runQueryDryRun() {
    String query =
        "SELECT name, COUNT(*) as name_count "
            + "FROM `bigquery-public-data.usa_names.usa_1910_2013` "
            + "WHERE state = 'WA' "
            + "GROUP BY name";
    queryDryRun(query);
  }

  public static void queryDryRun(String query) {
    try {
      // Initialize client that will be used to send requests. This client only needs to be created
      // once, and can be reused for multiple requests.
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

      QueryJobConfiguration queryConfig =
          QueryJobConfiguration.newBuilder(query).setDryRun(true).setUseQueryCache(false).build();

      Job job = bigquery.create(JobInfo.of(queryConfig));
      JobStatistics.QueryStatistics statistics = job.getStatistics();

      System.out.println(
          "Query dry run performed successfully." + statistics.getTotalBytesProcessed());
    } catch (BigQueryException e) {
      System.out.println("Query not performed \n" + e.toString());
    }
  }
}

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

// Import the Google Cloud client library
const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

async function queryDryRun() {
  // Runs a dry query of the U.S. given names dataset for the state of Texas.

  const query = `SELECT name
    FROM \`bigquery-public-data.usa_names.usa_1910_2013\`
    WHERE state = 'TX'
    LIMIT 100`;

  // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query
  const options = {
    query: query,
    // Location must match that of the dataset(s) referenced in the query.
    location: 'US',
    dryRun: true,
  };

  // Run the query as a job
  const [job] = await bigquery.createQueryJob(options);

  // Print the status and statistics
  console.log('Status:');
  console.log(job.metadata.status);
  console.log('\nJob Statistics:');
  console.log(job.metadata.statistics);
}

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

use Google\Cloud\BigQuery\BigQueryClient;

/** Uncomment and populate these variables in your code */
// $projectId = 'The Google project ID';
// $query = 'SELECT id, view_count FROM `bigquery-public-data.stackoverflow.posts_questions`';

// Construct a BigQuery client object.
$bigQuery = new BigQueryClient([
    'projectId' => $projectId,
]);

// Set job configs
$jobConfig = $bigQuery->query($query);
$jobConfig->useQueryCache(false);
$jobConfig->dryRun(true);

// Extract query results
$queryJob = $bigQuery->startJob($jobConfig);
$info = $queryJob->info();

printf('This query will process %s bytes' . PHP_EOL, $info['statistics']['totalBytesProcessed']);

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

To perform a dry run using the Python client library, set the QueryJobConfig.dry_run property to True. Client.query() always returns a completed QueryJob when provided a dry run query configuration.
from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)

# Start the query, passing in the extra configuration.
query_job = client.query(
    (
        "SELECT name, COUNT(*) as name_count "
        "FROM `bigquery-public-data.usa_names.usa_1910_2013` "
        "WHERE state = 'WA' "
        "GROUP BY name"
    ),
    job_config=job_config,
)  # Make an API request.

# A dry run query completes immediately.
print("This query will process {} bytes.".format(query_job.total_bytes_processed))

On-demand query size calculation

This section describes how to calculate the number of bytes processed by various types of query using the on-demand billing model.

DML statements

If you use on-demand billing, BigQuery charges for data manipulation language (DML) statements based on the number of bytes processed by the statement.

Non-partitioned tables

For non-partitioned tables, the number of bytes processed is calculated as follows:

  • q = The sum of bytes processed by the DML statement itself, including any columns referenced in tables scanned by the DML statement.
  • t = The sum of bytes for all columns in the table updated by the DML statement at the time the query starts, regardless of whether those columns are referenced or modified by the DML statement.
DML statement Bytes processed
INSERT q
UPDATE q + t
DELETE q + t
MERGE If there are only INSERT clauses: q.
If there is an UPDATE or DELETE clause: q + t.

Partitioned tables

For partitioned tables, the number of bytes processed is calculated as follows:

  • q' = The sum of bytes processed by the DML statement itself, including any columns referenced in all partitions scanned by the DML statement.
  • t' = The sum of bytes for all columns in the partitions updated by the DML statement at the time the query starts, regardless of whether those columns are referenced or modified by the DML statement.
DML statement Bytes processed
INSERT q'
UPDATE q' + t'
DELETE q' + t'
MERGE If there are only INSERT clauses in the MERGE statement: q'.
If there is an UPDATE or DELETE clause in the MERGE statement: q' + t'.

DDL statements

If you use on-demand billing, BigQuery charges for data definition language (DDL) queries based on the number of bytes processed by the query.

DDL statement Bytes processed
CREATE TABLE None.
CREATE TABLE ... AS SELECT ... The sum of bytes processed for all the columns referenced from the tables scanned by the query.
CREATE VIEW None.
DROP TABLE None.
DROP VIEW None.

Multi-statement queries

If you use on-demand billing, BigQuery charges for multi-statement queries based on the number of bytes processed during execution of the multi-statement queries.

The following pricing applies for these multi-statement queries:

  • DECLARE: the sum of bytes scanned for any tables referenced in the DEFAULT expression. DECLARE statements with no table references do not incur a cost.
  • SET: the sum of bytes scanned for any tables referenced in the expression. SET statements with no table references do not incur a cost.
  • IF: the sum of bytes scanned for any tables referenced in the condition expression. IF condition expressions with no table reference do not incur a cost. Any statements within the IF block that are not executed do not incur a cost.
  • WHILE: the sum of bytes scanned for any tables referenced in the condition expression. WHILE statements with no table references in the condition expression do not incur a cost. Any statements within the WHILE block that are not executed do not incur a cost.
  • CONTINUE or ITERATE: No associated cost.
  • BREAK or LEAVE: No associated cost.
  • BEGIN or END: No associated cost.

If a multi-statement query fails, the cost of any statements up until the failure still applies. The statement that failed does not incur any costs.

Multi-statement query pricing example

The following example contains comments preceding every statement that explain what cost, if any, is incurred by the following statement.

-- No cost, since no tables are referenced.
DECLARE x DATE DEFAULT CURRENT_DATE();
-- Incurs the cost of scanning string_col from dataset.table.
DECLARE y STRING DEFAULT (SELECT MAX(string_col) FROM dataset.table);
-- Incurs the cost of copying the data from dataset.big_table.  Once the
-- table is created, you are not charged for storage while the rest of the
-- multi-statement query runs.
CREATE TEMP TABLE t AS SELECT * FROM dataset.big_table;
-- Incurs the cost of scanning column1 from temporary table t.
SELECT column1 FROM t;
-- No cost, since y = 'foo' doesn't reference a table.
IF y = 'foo' THEN
  -- Incurs the cost of scanning all columns from dataset.other_table, if
  -- y was equal to 'foo', or otherwise no cost since it is not executed.
  SELECT * FROM dataset.other_table;
ELSE
  -- Incurs the cost of scanning all columns from dataset.different_table, if
  -- y was not equal to 'foo', or otherwise no cost since it is not executed.
  UPDATE dataset.different_table
  SET col = 10
  WHERE true;
END IF;
-- Incurs the cost of scanning date_col from dataset.table for each
-- iteration of the loop.
WHILE x < (SELECT MIN(date_col) FROM dataset.table) DO
  -- No cost, since the expression does not reference any tables.
  SET x = DATE_ADD(x, INTERVAL 1 DAY);
  -- No cost, since the expression does not reference any tables.
  IF true THEN
    -- LEAVE has no associated cost.
    LEAVE;
  END IF;
  -- Never executed, since the IF branch is always taken, so does not incur
  -- a cost.
  SELECT * FROM dataset.big_table;
END WHILE;

Clustered tables

Clustered tables can help you to reduce query costs by pruning data so it is not processed by the query. This process is called block pruning.

Block pruning

BigQuery sorts the data in a clustered table based on the values in the clustering columns and organizes them into blocks.

When you run a query against a clustered table, and the query includes a filter on the clustered columns, BigQuery uses the filter expression and the block metadata to prune the blocks scanned by the query. This allows BigQuery to scan only the relevant blocks.

When a block is pruned, it is not scanned. Only the scanned blocks are used to calculate the bytes of data processed by the query. The number of bytes processed by a query against a clustered table equals the sum of the bytes read in each column referenced by the query in the scanned blocks.

If a clustered table is referenced multiple times in a query that uses several filters, BigQuery charges for scanning the columns in the appropriate blocks in each of the respective filters.

Clustered table pricing example

You have a clustered table named ClusteredSalesData. The table is partitioned by the timestamp column, and it is clustered by the customer_id column. The data is organized into the following set of blocks:

Partition identifier Block ID Minimum value for customer_id in the block Maximum value for customer_id in the block
20160501 B1 10000 19999
20160501 B2 20000 24999
20160502 B3 15000 17999
20160501 B4 22000 27999

You run the following query against the table. The query contains a filter on the customer_id column.

SELECT
  SUM(totalSale)
FROM
  `mydataset.ClusteredSalesData`
WHERE
  customer_id BETWEEN 20000
  AND 23000
  AND DATE(timestamp) = "2016-05-01"

This query:

  • Scans the timestamp, customer_id, and totalSale columns in blocks B2 and B4.
  • Prunes the B3 block because of the DATE(timestamp) = "2016-05-01" filter predicate on the timestamp partitioning column.
  • Prunes the B1 block because of the customer_id BETWEEN 20000 AND 23000 filter predicate on the customer_id clustering column.

Query columnar formats on Cloud Storage

If your external data is stored in ORC or Parquet, the number of bytes charged is limited to the columns that BigQuery reads. Because the data types from an external data source are converted to BigQuery data types by the query, the number of bytes read is computed based on the size of BigQuery data types. For information about data type conversions, see the following pages:

Use the Google Cloud Pricing Calculator

The Google Cloud Pricing Calculator can help you create an overall monthly cost estimate for BigQuery based on projected usage.

On-demand

To estimate costs in the Google Cloud Pricing Calculator when using the on-demand pricing model, follow these steps:

  1. Open the Google Cloud Pricing Calculator.
  2. Click BigQuery.
  3. Click the On-Demand tab.
  4. For Table Name, type the name of the table. For example, airports.
  5. For Storage Pricing, enter the estimated size of the table in the storage fields. You will only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
  6. For Query Pricing, enter the estimated bytes read from your dry run or the query validator.
  7. Click Add To Estimate.
  8. The estimate appears to the right. Notice that you can save or email the estimate.

For more information, see on-demand pricing.

Flat-rate

To estimate costs in the Google Cloud Pricing Calculator when using the capacity-based pricing model with flat-rate commitments, follow these steps:

  1. Open the Google Cloud Pricing Calculator.
  2. Click BigQuery.
  3. Click the Flat-Rate tab.
  4. Choose the location where the slots are used.
  5. Choose the Commitment period.
  6. Specify the number of slots
  7. Choose the location where the data is stored.
  8. Enter your estimations of storage usage for Active storage, Long-term storage, Streaming inserts, and Streaming reads. You will only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
  9. Click Add to Estimate.

For more information, see capacity-based pricing.

Editions

To estimate costs in the Google Cloud Pricing Calculator when using the capacity-based pricing model with BigQuery editions, follow these steps:

  1. Open the Google Cloud Pricing Calculator.
  2. Click BigQuery.
  3. Click the Editions tab.
  4. Choose the location where the slots are used.
  5. Choose your Edition.
  6. Choose the Maximum slots, Baseline slots, optional Commitment, and Estimated utilization of autoscaling.
  7. Choose the location where the data is stored.
  8. Enter your estimations of storage usage for Active storage, Long-term storage, Streaming inserts, and Streaming reads. You will only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
  9. Click Add to Estimate.

For more information, see capacity-based pricing.