Estimate and control costs
This page describes how to estimate cost and lists best practices for controlling costs in BigQuery. BigQuery offers two types of pricing models, on-demand and capacity-based pricing. For information about pricing, see BigQuery pricing.
With BigQuery, you can estimate the cost of running a query, calculate the byte processed by various queries, and get a monthly cost estimate based on your projected usage. To control cost, you must also follow the best practices for optimizing query computation and BigQuery storage. For cost-specific best practices, see Control query costs.
To monitor query costs and BigQuery usage, analyze BigQuery audit logs.
Estimate query costs
BigQuery provides various methods to estimate cost:
- Use the query dry run option to estimate costs before running a query using the on-demand pricing model.
- Calculate the number of bytes processed by various types of query.
- Get the monthly cost based on projected usage by using the Google Cloud Pricing Calculator.
On-demand query size calculation
To calculate the number of bytes processed by the various types of query using the on-demand billing model, see the following sections:
Query columnar formats on Cloud Storage
If your external data is stored in ORC or Parquet, the number of bytes charged is limited to the columns that BigQuery reads. Because the data types from an external data source are converted to BigQuery data types by the query, the number of bytes read is computed based on the size of BigQuery data types. For information about data type conversions, see the following pages:
Use the Google Cloud Pricing Calculator
The Google Cloud Pricing Calculator can help you create an overall monthly cost estimate for BigQuery based on projected usage.
On-demand
To estimate costs in the Google Cloud Pricing Calculator when using the on-demand pricing model, follow these steps:
- Open the Google Cloud Pricing Calculator.
- Click BigQuery.
- Click the On-Demand tab.
- For Storage Pricing, enter the estimated size of the table in the storage fields. You only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
- For Query Pricing, enter the estimated bytes read from your dry run or the query validator.
- Click Add To Estimate.
- The estimate appears to the right. Notice that you can save or email the estimate.
For more information, see on-demand pricing.
Editions
To estimate costs in the Google Cloud Pricing Calculator when using the capacity-based pricing model with BigQuery editions, follow these steps:
- Open the Google Cloud Pricing Calculator.
- Click BigQuery.
- Click the Editions tab.
- Choose the location where the slots are used.
- Choose your Edition.
- Choose the Maximum slots, Baseline slots, optional Commitment, and Estimated utilization of autoscaling.
- Choose the location where the data is stored.
- Enter your estimations of storage usage for Active storage, Long-term storage, Streaming inserts, and Streaming reads. You only need to estimate either physical storage or logical storage, depending on the dataset storage billing model.
- Click Add to Estimate.
For more information, see capacity-based pricing.
Control query costs
To optimize query costs, ensure that you have optimized storage and query computation. For additional methods to control the query cost, see the following sections:
Check the query cost before running them
Best practice: Before running queries, preview them to estimate costs.
Queries are billed according to the number of bytes read. To estimate costs before running a query:
- Use the query validator in the Google Cloud console.
- Use the Google Cloud Pricing Calculator.
- Perform a dry run for queries.
Use the query validator
When you enter a query in the Google Cloud console, the query validator verifies the query syntax and provides an estimate of the number of bytes read. You can use this estimate to calculate query cost in the pricing calculator.
If your query is not valid, then the query validator displays an error message. For example:
Not found: Table myProject:myDataset.myTable was not found in location US
If your query is valid, then the query validator provides an estimate of the number of bytes required to process the query. For example:
This query will process 623.1 KiB when run.
Perform a dry run
To perform a dry run, do the following:
Console
Go to the BigQuery page.
Enter your query in the query editor.
If the query is valid, then a check mark automatically appears along with the amount of data that the query will process. If the query is invalid, then an exclamation point appears along with an error message.
bq
Enter a query like the following using the --dry_run
flag.
bq query \ --use_legacy_sql=false \ --dry_run \ 'SELECT COUNTRY, AIRPORT, IATA FROM `project_id`.dataset.airports LIMIT 1000'
For a valid query, the command produces the following response:
Query successfully validated. Assuming the tables are not modified, running this query will process 10918 bytes of data.
API
To perform a dry run by using the API, submit a query job with
dryRun
set to true
in the
JobConfiguration
type.
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.js
Before trying this sample, follow the Node.js setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery Node.js API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
PHP
Before trying this sample, follow the PHP setup instructions in the
BigQuery quickstart using
client libraries.
For more information, see the
BigQuery PHP API
reference documentation.
To authenticate to BigQuery, set up Application Default Credentials.
For more information, see
Set up authentication for client libraries.
Python
Set the
QueryJobConfig.dry_run
property to True
.
Client.query()
always returns a completed
QueryJob
when provided a dry run query configuration.
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Avoid running queries to explore table data
Best practice: Don't run queries to explore or preview table data.
If you are experimenting with or exploring your data, you can use table preview options to view data for free and without affecting quotas.
BigQuery supports the following data preview options:
- In the Google Cloud console, on the table details page, click the Preview tab to sample the data.
- In the bq command-line tool, use the
bq head
command and specify the number of rows to preview. - In the API, use
tabledata.list
to retrieve table data from a specified set of rows. - Avoid using
LIMIT
in non-clustered tables. For non-clustered tables, aLIMIT
clause won't reduce compute costs.
Restrict the number of bytes billed
Best practice: Use the maximum bytes billed setting to limit query costs.
You can limit the number of bytes billed for a query using the maximum bytes billed setting. When you set maximum bytes billed, the number of bytes that the query reads is estimated before the query execution. If the number of estimated bytes is beyond the limit, then the query fails without incurring a charge.
For clustered tables, the estimation of the number of bytes billed for a query is an upper bound, and can be higher than the actual number of bytes billed after running the query. So in some cases, if you set the maximum bytes billed, a query on a clustered table can fail, even though the actual bytes billed wouldn't exceed the maximum bytes billed setting.
If a query fails because of the maximum bytes billed setting, an error similar to following is returned:
Error: Query exceeded limit for bytes billed: 1000000. 10485760 or higher
required.
To set the maximum bytes billed:
Console
- In the Query editor, click More > Query settings > Advanced options.
- In the Maximum bytes billed field, enter an integer.
- Click Save.
bq
Use the bq query
command with the --maximum_bytes_billed
flag.
bq query --maximum_bytes_billed=1000000 \ --use_legacy_sql=false \ 'SELECT word FROM `bigquery-public-data`.samples.shakespeare'
API
Set the maximumBytesBilled
property in JobConfigurationQuery
or
QueryRequest
.
Avoid using LIMIT
in non-clustered tables
Best practice: For non-clustered tables, don't use a LIMIT
clause as a
method of cost control.
For non-clustered tables, applying a LIMIT
clause to a query doesn't affect
the amount of data that is read. You are billed for reading all bytes in the
entire table as indicated by the query, even though the query returns only a
subset. With a clustered table, a LIMIT
clause can reduce the number of bytes
scanned, because scanning stops when enough blocks are scanned to get the
result. You are billed for only the bytes that are scanned.
View costs using a dashboard and query your audit logs
Best practice: Create a dashboard to view your billing data so you can make adjustments to your BigQuery usage. Also consider streaming your audit logs to BigQuery so you can analyze usage patterns.
You can export your billing data to BigQuery and visualize it in a tool such as Looker Studio. For a tutorial about creating a billing dashboard, see Visualize Google Cloud billing using BigQuery and Looker Studio.
You can also stream your audit logs to BigQuery and analyze the logs for usage patterns such as query costs by user.
Materialize query results in stages
Best practice: If possible, materialize your query results in stages.
If you create a large, multi-stage query, each time you run it, BigQuery reads all the data that is required by the query. You are billed for all the data that is read each time the query is run.
Instead, break your query into stages where each stage materializes the query results by writing them to a destination table. Querying the smaller destination table reduces the amount of data that is read and lowers costs. The cost of storing the materialized results is much less than the cost of processing large amounts of data.
Use table expiration for destination tables
Best practice: If you are writing large query results to a destination table, use the default table expiration time to remove the data when it's no longer needed.
Keeping large result sets in BigQuery storage has a cost. If you don't need permanent access to the results, use the default table expiration to automatically delete the data for you.
For more information, see storage pricing.
What's next
- Learn about BigQuery pricing.
- Learn how to optimize query.
- Learn how to optimize storage.
To learn about billing, alerts, and visualizing data, see the following topics: