Reliability: Query data
This document explains how to build reliable solutions with BigQuery and looks at how to reliably query data within your BigQuery environment.
When BigQuery runs a query job, it converts the submitted declarative SQL statement into a graph of execution, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps. BigQuery uses a heavily distributed parallel architecture to run these queries, and the stages model the units of work that many potential workers can execute in parallel. The resources of these workers are called slots.
In the simplest of terms, a BigQuery slot is a virtual CPU used by BigQuery to execute SQL queries. For more information about slots and how they function, see Understanding slots.
BigQuery is able to achieve a 99.99% SLA for queries by providing redundant slot capacity in two separate zones. This distribution protects customers from zonal failures.
The compute resources that are used to run queries are purchased in either an on-demand or flat-rate pricing model. Under the on-demand model, compute resources are charged based on the number of bytes processed by the submitted queries. Under the flat-rate model, you purchase a dedicated amount of query processing capacity which provides a stable cost model.
On-demand analysis model
With BigQuery's on-demand pricing model, you are charged solely on usage based on the amount of bytes processed, also referred to as "bytes read," by the queries you submit. Projects that use the BigQuery on-demand pricing model have access to a default capacity of slots, which is typically more than sufficient, and are subject to per-project slot quotas.
Flat-rate slot model
Customers who prefer a stable cost for queries rather than paying for on-demand pricing model should want to consider the flat-rate pricing model. When enrolled in flat-rate pricing, you purchase dedicated query processing capacity, measured in BigQuery slots. Your queries consume this capacity, and you are not billed for bytes processed.
The flat-rate structure enables customers to purchase slots at different lengths of commitment time (yearly, monthly, or no commitment), prioritize slots to specified projects or folders with slot reservations, and share idle slots across multiple reservations. For more information, see Introduction to reservations.
The flat-rate model explicitly sets a maximum number of slot resources available to your folders or projects, so buying too few slots may impact the performance of your queries, while buying too many slots may introduce idle processing capacity, resulting in underutilized resources and unnecessary costs.
Considerations for flat-rate slot models
While cost considerations are often the biggest driving factor between choosing the on-demand model and the flat-rate model, this decision can also affect reliability. For example, scalability can differ between the on-demand pricing model, which provides a baseline of 2,000 available slots per project during peak load, and the flat-rate model which can be configured with up to hundreds of thousands of slots. The number of slots consumed by your project depends on query complexity, the amount of data scanned, use of special functions like user defined functions (UDFs), and the number of concurrent queries submitted. However the 2,000 slots available to on-demand customers is typically more than sufficient for users starting off with BigQuery. For more information about how to choose between the BigQuery on-demand and flat-rate models, see Choosing between BigQuery on-demand and flat rate pricing.
BigQuery's 99.99% availability SLA is applicable to both the on-demand and flat-rate models, but the number of slot resources is only guaranteed under the flat-rate model as the on-demand model uses a pool of shared slot resources. Note that BigQuery doesn't provide any guarantees on the availability of reserved slots until they have been purchased and provisioned. For this reason, Google does not recommend that you rely on acquiring reserved slots just-in-time for business critical workloads. The larger the slot request, the higher the likelihood of requiring lead time to provision slot resources.
It is also possible to combine both the on-demand and flat-rate model within your organization or project in order to best fit your needs. This is possible through BigQuery's concept of BigQuery Reservations. A good example of this might be using a flat-rate reservation for a production US multi-region dataset and on-demand capacity for a smaller development dataset based in an alternative region.
BigQuery Reservations allow you to customize your allocations of slot capacity across your organization, and through doing so, prioritize resources based on your workloads. For example, you might create a reservation named prod for production workloads, and a separate reservation named test for testing. That way, your test jobs won't compete for resources that your production workloads require. A reservation specifies the prioritization of slot resources to the assigned project, folder, or organization. Each level in the resource hierarchy inherits the assignment from the level above it, unless overridden. In other words, a project inherits the assignment of its parent folder, and a folder inherits the assignment of its organization.
BigQuery reservations optimize resource utilization by also sharing idle slot resources across other reservations by default. This allows for non-prioritized workloads like dev/test environments to benefit from better query performance during times where production reservations may not be fully utilized. In the event where a dev/test workload is consuming unused slots from a production reservation, and that production reservation now becomes fully utilized, the BigQuery Slot Scheduler will intelligently pull slot resources from the in-flight dev/test queries and make them available to the production reservation. The dev/test queries will continue to run, albeit slower as they have less compute resources to use.
Depending on the nature of your analytics needs, it often makes sense to create multiple BigQuery Reservations within your environment, by separating workloads based on business unit, function (prod vs dev/test), or application type (business intelligence dashboard vs scheduled queries/jobs vs ad-hoc queries from users).
Please note that idle slots are only shared within the reservations created within a single BigQuery Administration project.
Size a reservation
Accurately rightsizing a reservation is typically best done by monitoring current and historical BigQuery slot usage through either BigQuery Admin Resource Charts or Cloud Monitoring Dashboard. Admin Resource Charts provide real-time and historical data going back 14 days with details around slot usage, job concurrency, and job performance at the organization, folder, project, reservation, user, or job level.
Optimize job reliability
Two types of queries can be submitted to analyze data:
- Interactive queries: a submitted interactive query is immediately dispatched for execution.
- Batch queries: a submitted batch query is queued and started as soon as sufficient slot resources are available. If BigQuery hasn't started the query within 24 hours, BigQuery changes the job priority to interactive and immediately dispatches the query.
Both interactive and batch queries use the same slot resources. In fact after a batch query has been started by the BigQuery scheduler, there is no difference in priority between an interactive or batch query. However, interactive and batch queries do affect query limits and quotas differently. Batch queries don't count towards your concurrent rate limit, which can make it easier to start many queries at once. Your project can run up to 10 concurrent batch queries.
If the concurrent query limit is reached, further interactive queries will fail
quotaExceeded error. This quota is in place for your
protection, so as long as available slots haven't been fully consumed, this
quota can generally be increased by contacting
Customer Care or
However, high degrees of concurrently running queries reduce overall
availability of compute resources for each query and therefore can introduce
resource contention and degrade per-query processing throughput. As such, you
shouldn't raise the concurrent query quota beyond the point of slot resource
saturation. If concurrent query errors are encountered, you can retry your query
with exponential backoff logic to avoid negatively impacting query job latency.
Possible options to reduce the number of concurrent queries in your environment include:
- Run queries in dry run mode which estimates the number of bytes read but does not actually process the query.
- When experimenting or exploring data, rather than running queries themselves, preview table data with BigQuery's table preview capability.
- Use cached query results. All query results, including both interactive and batch queries, are cached in temporary tables for approximately 24 hours with some exceptions. While running a cached query does still count against your concurrent query limit, queries that use cached results are significantly faster because BigQuery does not need to compute the result set.
- Use BI Engine, BigQuery's fast in-memory analysis service. BI Engine lets you build interactive dashboards and reports with tools like Google Data Studio and SQL interface and offers improved concurrency for data stored in BigQuery. This is especially effective for large numbers of small queries.
Other considerations relating to query concurrency and job performance are job isolation and staggering jobs. Isolating query jobs by application use case across projects or reservations can reduce noisy neighbor concerns where a query consumes a large quantity of resources, which can negatively affect other queries' performance. This also reduces the likelihood of running into per-project concurrent query limits.
Staggering jobs, or the act of submitting multiple jobs in sequence rather than submitting multiple jobs simultaneously, may also benefit overall performance and reduce error rates. As an example, consider a set of five complex jobs. Let's assume running these five queries concurrently will consume a high number of slots and complete execution in an hour. Submitting the same five jobs in sequence may still execute in an hour, but will free up slot resources to be used by other queries and reduce the chance of running into concurrent query limits.
BigQuery uses fair scheduling to allocate resources among competing queries within a project or reservation. This means that every query has equal access to all available slots at any time, and capacity is dynamically and automatically re-allocated among active queries as each query's capacity demands change. For this reason, running business critical production and dev/test queries within the same project or reservation may not be ideal as each query has equal access to slots so non-critical queries may impact the performance of critical inflight queries. It is best practice to prioritize queries prior to query execution.
DML considerations for reliability optimization
BigQuery Data Manipulation Language (DML) statements enable you to update, insert, and delete data from your BigQuery tables and have their own guidelines around reliability.
- BigQuery is fully atomic, so each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement.
- DML UPDATE, DELETE, or MERGE statements cannot be used against a table, which has data in the write-optimized storage. Attempting to do so will result in a query failure.
- DML INSERT, UPDATE, DELETE, and MERGE statements do not have concurrent quota limits. However, once a certain threshold of concurrent jobs is reached, new jobs will be queued in a pending status.
- In the event of concurrently running mutating DML statements on the same table, BigQuery determines if DMLs can mutate the same backend files and only allows one DML to proceed, while it retries the others up to three times. If the subsequent three DML retries fail, the DML statement(s) will fail due to conflicts in the changes they attempted to make.
- While BigQuery fully supports DML UPDATE, DELETE, and MERGE operations, it is not intended to be a transactional OLTP-style database.
As a best practice, avoid submitting large numbers of individual DML row updates or insertions. Instead, group DML operations together when possible.
UDF considerations for reliability optimization
- UDFs typically consume more slot resources compared to standard SQL queries and may have an impact on job performance. If the function can be expressed in SQL, it is often more optimal to run the code as a standard SQL query job.
- UDFs are subject to their own quotas and concurrent query limits. For more information, see UDF limits.
Manage quotas and limits
As called out previously, query jobs are subject to a number of limits and quotas. These limits vary by query type as well as other factors, but the maximum query or script execution time limit is 6 hours. This limit cannot be changed. In some cases, queries can be retried. When this happens, the retried query can run for an additional six hours and can be retried up to three times.
Monitor query jobs
Monitoring BigQuery jobs is crucial for running reliable
applications. Monitoring can be accomplished through
BigQuery Admin Resource Charts,
Cloud Monitoring Dashboards,
The BigQuery Admin Resource Charts and Cloud Monitoring Dashboards allow for native monitoring of bytes queried, number of concurrent queries executed, slot consumption, query latency, etc. While the INFORMATION_SCHEMA tables provide additional metadata around job types, specific error messages, cache rates, job complexity, etc. Leveraging the INFORMATION_SCHEMA tables provides additional layers of customization like the below query which provides a list of the current queries that are either pending or currently running ordered by the length of time since they were created:
SELECT creation_time, project_id, user_email, job_id, job_type, priority, state, TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), start_time,second) as running_time_sec FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP() AND state != "DONE" ORDER BY running_time_sec DESC
Use cases for querying data
In this use case, streaming data is being used to feed a real-time BI dashboard, which is being used by a large number of users who are running interactive queries.
In this use case, you should keep the BI dashboard queries isolated from other general purpose queries by using separate projects or reservations in order to avoid concerns with the BI dashboard becoming a noisy neighbor and impacting the performance of other general purpose query jobs. In addition, the monitoring of in-flight queries, expensive queries, and SQL queries that exhibit BigQuery anti-patterns should be achieved by reviewing INFORMATION_SCHEMA tables and corrected to avoid unnecessary spend and query latency.
Consider leveraging BI Engine if high numbers of concurrent query jobs are submitted against the BI dashboard or if query latency is a concern.
Consider creating custom cost control limits to limit the amount of data that can be processed by day per user or per table.
Batch data processing
In this use case, complex nightly batch jobs are processed to conduct thorough analysis on the entire day's worth of data and are joined with other data sources for data enrichment.
Similar to the guidance for real-time use cases, it is recommended to keep these complex batch queries isolated from other, general-purpose queries by using separate projects or reservations in order to avoid concerns with these jobs becoming a noisy neighbor and impacting the performance of other query jobs. In addition, long in-flight queries, expensive queries, and SQL queries that exhibit BigQuery anti-patterns should be monitored with INFORMATION_SCHEMA tables and corrected to avoid unnecessary spend and/or query latency.
Consider creating alerts in Cloud Monitoring to alert on things like low slot availability, high query times, and high bytes scanned in order to alert on underperforming jobs.
- Introduction to reliability
- Reliability: Importing data
- Reliability: Reading data
- Reliability: Disaster recovery