Understand slots

A BigQuery slot is a virtual CPU used by BigQuery to execute SQL queries. During the query execution, BigQuery automatically calculates how many slots a query requires, depending on the query size and complexity.

You have a choice of using an on-demand pricing model or a capacity-based pricing model. Both models use slots for data processing. With a capacity-based model, you can pay for dedicated or autoscaled query processing capacity. The capacity-based model gives you explicit control over slots and analytics capacity, whereas the on-demand model does not.

Customers on the capacity-based pricing model explicitly choose how many slots to reserve. Your queries run within that capacity, and you pay for that capacity continuously every second it's deployed. For example, if you purchase 2,000 BigQuery slots, your queries in aggregate are limited to using 2,000 virtual CPUs at any given time. You have this capacity until you delete it, and you pay for 2,000 slots until you delete them.

Projects on the BigQuery on-demand pricing model are subject to per-project slot quota with transient burst capability. Most users on the on-demand model find the default slot capacity more than sufficient. Depending on the workload, access to more slots improves query performance. To check how many slots your account uses, see BigQuery monitoring.

Estimate how many slots to purchase

BigQuery is architected to scale efficiently with increased resources. Depending on the workload, incremental capacity is likely to give you incremental benefits. Therefore, choosing the optimal number of slots to purchase depends on your requirements for performance, throughput, and utility.

You can experiment with baseline and autoscaling slots to determine the best configuration of slots. For example, you can test your workload with 500 baseline slots, then 1,000, then 1,500, and 2,000, and observe the impact on performance.

You can also examine the current slot usage of your projects, along with the chosen monthly price that you want to pay. On-demand workloads have a soft slot cap of 2,000 slots, but it is important to check how many slots are actually being used by your projects by using INFORMATION_SCHEMA.JOBS* views, Cloud Logging, the Jobs API, or BigQuery Audit logs. For more information, see Visualizing slots available and slots allocated.

Slot usage timeline.

After you purchase slots and run your workloads for at least seven days, you can use the slot estimator to analyze performance and model the effect of adding or reducing slots. For more information, see Estimate slot capacity requirements.

Query execution using slots

When BigQuery executes a query job, it converts the declarative SQL statement into a graph of execution, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps. BigQuery uses a heavily distributed parallel architecture to run these queries, and the stages model the units of work that many potential workers may execute in parallel. Stages communicate with one another by using a fast distributed shuffle architecture, which is discussed in more detail on the Google Cloud blog.

BigQuery query execution is dynamic, which means that the query plan can be modified while a query is in flight. Stages that are introduced while a query is running are often used to improve data distribution throughout query workers.

BigQuery can run multiple stages concurrently. BigQuery can use speculative execution to accelerate a query, and BigQuery can dynamically repartition a stage to achieve optimal parallelization.

BigQuery slots execute individual units of work at each stage of the query. For example, if BigQuery determines that a stage's optimal parallelization factor is 10, it requests 10 slots to process that stage.

Query slots.

GoogleSQL query is a dynamic DAG

Query execution under slot resource economy

If a query requests more slots than currently available, BigQuery queues up individual units of work and waits for slots to become available. As progress on query execution is made, and as slots free up, these queued up units of work get dynamically picked up for execution.

BigQuery can request any number of slots for a particular stage of a query. The number of slots requested is not related to the amount of capacity you purchase, but rather an indication of the most optimal parallelization factor chosen by BigQuery for that stage. Units of work queue up and get executed as slots become available.

When query demands exceed slots you committed to, you are not charged for additional slots, and you are not charged for additional on-demand rates. Your individual units of work queue up.

For example,

A query stage requests 2,000 slots, but only 1,000 are available.
BigQuery consumes all 1,000 slots and queues up the other 1,000 slots.
Thereafter, if 100 slots finish their work, they dynamically pick up 100 units of work from the 1,000 queued up units of work. 900 units of queued up work remain.
Thereafter, if 500 slots finish their work, they dynamically pick up 500 units of work from the 900 queued up units of work. 400 units of queued up work remain.

Slot scheduling.

BigQuery slots queued up if demand exceeds availability

Idle slots

At any given time, some slots might be idle. This can include:

Slot commitments that are not allocated to any reservation baseline.
Slots that are allocated to a reservation baseline but aren't in use.

By default, queries running in a reservation automatically use idle slots from other reservations within the same administration project. BigQuery immediately allocates slots to an assigned reservation when they are needed. Idle slots that were in use by another reservation are quickly preempted. There might be a short time when you see total slot consumption exceed the maximum you specified across all reservations, but you aren't charged for this additional slot usage.

For example, suppose you have the following reservation setup:

project_a is assigned to reservation_a, which has 500 baseline slots with no autoscaling.
project_b is assigned to reservation_b, which has 100 baseline slots with no autoscaling.
Both reservations are in the same administrative project and there are no other projects assigned to these reservations.

You run query_b in project_b. If no query is running in project_a, then query_b has access to the 500 idle slots from reservation_a. While query_b is still running, it may use up to 600 slots: 100 baseline slots plus 500 idle slots.

While query_b is running, suppose you run query_a in project_a that can use 500 slots.

Since you have 500 baseline slots reserved for project_a, query_a immediately starts and is allocated 500 slots.
The number of slots allocated to query_b quickly decreases to 100 baseline slots.
Additional queries run in project_b share those 100 slots. If subsequent queries don't have enough slots to start, then they queue up until currently running queries complete and slots become available.

In this example, if project_b was assigned to a reservation with no baseline slots or autoscaling, then query_b would have no slots after query_a starts running. BigQuery would pause query_b until idle slots are available or the query times out. Additional queries in project_b would queue up until idle slots are available.

To ensure a reservation only uses its provisioned slots, set ignore_idle_slots to true. Reservations with ignore_idle_slots set to true can, however, share their idle slots with other reservations.

You cannot share idle slots between reservations of different editions. You can share only the baseline slots or committed slots. Autoscaled slots might be temporarily available but are not shareable as idle slots for other reservations because they might scale down.

As long as ignore_idle_slots is false, a reservation can have a slot count of 0 and still have access to unused slots. If you use only the default reservation, toggle off ignore_idle_slots as a best practice. You can then assign a project or folder to that reservation and it will only use idle slots.

Assignments of type ML_EXTERNAL are an exception in that slots used by BigQuery ML external model creation jobs are not preemptible. The slots in a reservation with both ML_EXTERNAL and QUERY assignment types are only available for other query jobs when the slots are not occupied by the ML_EXTERNAL jobs. Moreover, these jobs cannot use idle slots from other reservations.

Slot allocation within reservations

BigQuery allocates slot capacity within a single reservation using an algorithm called fair scheduling.

The BigQuery scheduler enforces the equal sharing of slots among projects with running queries within a reservation, and then within jobs of a given project. The scheduler provides eventual fairness. During short periods, some jobs might get a disproportionate share of slots, but the scheduler eventually corrects this. The goal of the scheduler is to find a balance between aggressively evicting running tasks (which results in wasting slot time) and being too lenient (which results in jobs with long running tasks getting a disproportionate share of the slot time).

If an important job consistently needs more slots than it receives from the scheduler, consider creating an additional reservation with a guaranteed number of slots and assigning the job to that reservation.

Excess Slot Usage

When a job holds onto slots for too long it can receive an unfair share of slots as described here above. To prevent delays, other jobs can borrow additional slots resulting in periods of total slot use above your specified slot capacity. Any excess slot usage is attributed only to the jobs that receive more than their fair share.

The excess slots are not billed directly to you. Instead, jobs continue to run and accrue slot usage at their fair share until all of their excess usage is covered by your regular capacity. Excess slots are excluded from reported slot usage with the exception of certain detailed execution statistics.

Note that some preemptive borrowing of slots can occur to reduce future delays and to provide other benefits such as reduced slot cost variability and reduced tail latency. Slot borrowing is limited to a small fraction of your total slot capacity.

Fair scheduling in BigQuery

Slots are distributed fairly among projects and then within the jobs in the project. This means that every query has access to all available slots at any time, and capacity is dynamically and automatically re-allocated among active queries as each query's capacity demands change. Queries complete and new queries get submitted for execution under the following conditions:

Whenever a new query is submitted, capacity is automatically re-allocated across executing queries. Individual units of work can be gracefully paused, resumed, and queued up as more capacity becomes available to each query.
Whenever a query completes, capacity consumed by that query automatically becomes immediately available for all other queries to use.
Whenever a query's capacity demands change due to changes in query's dynamic DAG, BigQuery automatically re-evaluates capacity availability for this and all other queries, re-allocating and pausing slots as necessary.

Multiple query scheduling.

Fair scheduling in BigQuery

Depending on complexity and size, a query might not require all the slots it has the right to, or it may require more. BigQuery dynamically ensures that, given fair scheduling, all slots can be fully used at any point in time.