Query Plan and Timeline

Embedded within query jobs, BigQuery includes diagnostic query plan and timing information. This is similar to the information provided by statements such as EXPLAIN in other database and analytical systems. This information can be retrieved from the API responses of methods such as jobs.get.

For long running queries, BigQuery will periodically update these statistics. These updates happen independently of the rate at which the job status is polled, but typically will not happen more frequently than every 30 seconds. Additionally, query jobs that do not leverage execution resources, such as dry run requests or results that can be served from cached results will not include the additional diagnostic information, though other statistics may be present.

Background

When BigQuery executes a query job, it converts the declarative SQL statement into a graph of execution, broken up into a series of query stages, which themselves are composed of more granular sets of execution steps. BigQuery leverages a heavily distributed parallel architecture to run these queries, and stages model the units of work that many potential workers may execute in parallel. Stages communicate with one another via a fast distributed shuffle architecture, which has been discussed in more detail elsewhere.

Within the query plan, the terms of work units and workers are used, as the plan is conveying information specifically about parallelism. Elsewhere within BigQuery you may encounter the term "slot", which is an abstracted representation of multiple facets of query execution, including compute, memory, and I/O resources. Top level job statistics provide the estimate of individual query cost using the totalSlotMs estimate of the query using this abstracted accounting.

Another important property of the query execution architecture is that it is dynamic, which means that the query plan may be modified while a query is in flight. Stages that are introduced while a query is running are often used to improve data distribution throughout query workers. In query plans where this occurs, these are typically labeled as Repartition stages.

In addition to the query plan, query jobs also expose a timeline of execution, which provides an accounting of units of work completed, pending and active within query workers. A query may have multiple stages with active workers simultaneously, so the timeline is intended to show overall progress of the query.

Viewing information with the BigQuery web UI

If you use the BigQuery web UI, you can see details of query plan for a completed query by clicking the Details button (to the right of the Results button).

Screenshot of the query plan

For queries that are long-running, you can view the query plan as it progresses by clicking on link within the query status line below the query composer pane.

Screenshot of the elapsed time status line

Query plan information

Within the API response, query plans are represented as a list of query stages, which expose per-stage overview statistics, detailed step information, and stage timing classifications. Not all details are rendered within the web UI, but may be present within the API responses.

Stage overview

The overview fields for each stage may include the following:

API Field Description
id Unique numeric ID for the stage.
name Simple summary name for the stage. The steps within the stage provide additional details about execution steps.
status Execution status of the stage. Possible states include PENDING, RUNNING, COMPLETE, FAILED, and CANCELLED.
inputStages A list of the IDs that form the dependency graph of the stage. For example, a JOIN stage often needs two dependent stages that prepare the data on the left and right side of the JOIN relationship.
startMs Timestamp, in epoch milliseconds, that represents when the first worker within the stage began execution.
endMs Timestamp, in epoch milliseconds, that represents when the last worker completed execution.
steps More detailed list of execution steps within the stage. See next section for more information.
recordsRead Input size of the stage as number of records, across all stage workers.
recordsWritten Output size of the stage as number of records, across all stage workers.
parallelInputs Number of parallelizable units of work for the stage. Depending on the stage and query, this may represent the number columnar segments within a table, or the number of partitions within an intermediate shuffle.
completedParallelInputs Number of units work within the stage that were completed. For some queries, not all inputs within a stage need to be completed for the stage to complete.
shuffleOutputBytes Represents the total bytes written across all workers within a query stage.
shuffleOutputBytesSpilled Queries that transmit significant data between stages may need to fallback to disk-based transmission. The spilled bytes statistic communicates how much data was spilled to disk.

Per-stage step information

Steps represent the more granular operations that each worker within a stage must execute, presented as an ordered list of operations. Steps are categorized, with some operations providing more detailed information. The operation categories present in the query plan include the following:

Step Description
READ A read of one or more columns from an input table or intermediate shuffle.
WRITE A write of one or more columns to an output table or intermediate result. For HASH partitioned outputs from a stage, this also includes the columns used as the partition key.
COMPUTE Operations such as expression evaluation and SQL functions.
FILTER Operator implementing the WHERE, OMIT IF and HAVING clauses.
SORT Sort or Order-By operation, includes the column keys and the direction.
AGGREGATE An aggregation operation, such as GROUP BY or COUNT.
LIMIT Operator implementing the LIMIT clause.
JOIN A JOIN operation, which includes the join type and the columns used.
ANALYTIC_FUNCTION An invocation of an analytic function (also known as “window functions”).
USER_DEFINED_FUNCTION A call to a user-defined function.

Per-stage timing classification

The query stages also provide stage timing classifications, in both relative in absolute form. As each stage of execution represents work undertaken by one or more independent workers, information is provided in both average and worst-case times, representing the average performance for all workers in a stage as well as the long-tail slowest worker performance for a given classification. The average and max times are furthermore broken down into absolute and relative representations. For the ratio-based statistics, the data is provided as a fraction of the longest time spent by any worker in any segment.

The Web UI presents stage timing using the relative timing representations.

The stage timing information is reported as follows:

Relative Timing Absolute Timing Web UI*
waitRatioAvg waitMsAvg waitRatioAvg dark yellow bar
waitRatioMax waitMsMax waitRatioMax yellow bar
readRatioAvg readMsAvg readRatioAvg dark purple bar
readRatioMax readMsMax readRatioMax purple bar
computeRatioAvg computeMsAvg computeRatioAvg dark orange bar
computeRatioMax computeMsMax computeRatioMax orange bar
writeRatioAvg writeMsAvg writeRatioAvg dark blue bar
writeRatioMax writeMsMax writeRatioMax blue bar

* The labels 'AVG' and 'MAX' are for illustration only and do not appear in the web UI.

Timeline Metadata

The query timeline reports progress at specific points into time, providing snapshot views of overall query progress. The timeline is represented as a series of samples, which report the following details:

Field Description
elapsedMs Milliseconds elapsed since the start of query execution.
totalSlotMs A cumulative representation of the slot-milliseconds used by the query.
pendingUnits Total units of work scheduled and waiting for execution.
activeUnits Total active units of work currently being processed by workers.
completedUnits Total units of work that have been completed while executing this query.

An Example Query

If you run a simple query that counts the number of rows in the Shakespeare public data set, and a second conditional count that restricts results to rows that reference hamlet:

#StandardSQL
SELECT
  COUNT(1) as rowcount,
  COUNTIF(corpus = 'hamlet') as rowcount_hamlet
FROM `publicdata.samples.shakespeare`

You can click Details to see the following information about the query plan. First, let's examine the first section, which contains the query timeline:

Screenshot hamlet query timeline

In this example, we're dealing with a very small sample table and simple query, so there's only two units of work total. With this example, all work is completed almost immediately.

So, scanning further down, let's look at the query plan:

Screenshot of the hamlet query plan

In this example, the color indicators show the relative timings for all stages. From the parallel input information, we see each stage only required a single worker each, so there's no variance between average and slowest timings.

We can also see that, for this trivial query, the longest time in any segment was the time the single worker in Stage 01 spent waiting for Stage 00 to complete. This is because Stage 01 was dependent on Stage 00's input, and could not start until the first stage wrote its output (1 row, ~18 bytes) into intermediate shuffle.

Now, let's examine the steps of our execution stages in more detail. To the left of the stage label, click the triangle to expand details for the stage:

Screenshot of the hamlet query plan step details

In this case, we can see the execution plan for the single worker that completed the work for Stage 00. First, data was READ from the "corpus" column of the referenced shakespeare table. Next, AGGREGATIONS were established for the COUNT and COUNTIF projections. Scanning the data required a COMPUTE step, which provided data for both the normal and conditional counts, and the output was written to intermediate shuffle output, labeled as __stage00_output in this plan.

Error reporting

It is possible for query jobs to fail mid-execution. Because plan information is updated periodically, you can observe where within the execution graph the failure occurred. Within the UI, successful and failed stages are labelled via checkmark and exclamation point next to the stage name(s).

For further information about interpreting and addressing errors, please see the troubleshooting guide.

API Sample Representation

Query plan information is automatically embedded in the job response information without needing to call additional methods, and can be retrieved simply by calling jobs.get to retrieve job details. For example, the following is an excerpt of a JSON response for a job returning the sample hamlet query showing both the query plan and timeline information.

"statistics": {
  "query": {
    "cacheHit": false,
    "queryPlan": [
      {
        "completedParallelInputs": "1",
        "computeMsAvg": "25",
        "computeMsMax": "25",
        "computeRatioAvg": 0.17857142857142858,
        "computeRatioMax": 0.17857142857142858,
        "endMs": "1522787349945",
        "id": "0",
        "name": "S00: Input",
        "parallelInputs": "1",
        "readMsAvg": "28",
        "readMsMax": "28",
        "readRatioAvg": 0.2,
        "readRatioMax": 0.2,
        "recordsRead": "164656",
        "recordsWritten": "1",
        "shuffleOutputBytes": "18",
        "shuffleOutputBytesSpilled": "0",
        "startMs": "1522787349898",
        "status": "COMPLETE",
        "steps": [
          {
            "kind": "READ",
            "substeps": [
              "$1:corpus",
              "FROM publicdata.samples.shakespeare"
            ]
          },
          {
            "kind": "AGGREGATE",
            "substeps": [
              "$20 := COUNT($30)",
              "$21 := COUNTIF($31)"
            ]
          },
          {
            "kind": "COMPUTE",
            "substeps": [
              "$30 := 1",
              "$31 := equal($1, 'hamlet')"
            ]
          },
          {
            "kind": "WRITE",
            "substeps": [
              "$20, $21",
              "TO __stage00_output"
            ]
          }
        ],
        "waitMsAvg": "0",
        "waitMsMax": "0",
        "waitRatioAvg": 0.0,
        "waitRatioMax": 0.0,
        "writeMsAvg": "5",
        "writeMsMax": "5",
        "writeRatioAvg": 0.03571428571428571,
        "writeRatioMax": 0.03571428571428571
      },
      {
        "completedParallelInputs": "1",
        "computeMsAvg": "14",
        "computeMsMax": "14",
        "computeRatioAvg": 0.1,
        "computeRatioMax": 0.1,
        "endMs": "1522787350180",
        "id": "1",
        "inputStages": [
          "0"
        ],
        "name": "S01: Output",
        "parallelInputs": "1",
        "readMsAvg": "0",
        "readMsMax": "0",
        "readRatioAvg": 0.0,
        "readRatioMax": 0.0,
        "recordsRead": "1",
        "recordsWritten": "1",
        "shuffleOutputBytes": "16",
        "shuffleOutputBytesSpilled": "0",
        "startMs": "1522787350038",
        "status": "COMPLETE",
        "steps": [
          {
            "kind": "READ",
            "substeps": [
              "$20, $21",
              "FROM __stage00_output"
            ]
          },
          {
            "kind": "AGGREGATE",
            "substeps": [
              "$10 := SUM_OF_COUNTS($20)",
              "$11 := SUM_OF_COUNTS($21)"
            ]
          },
          {
            "kind": "WRITE",
            "substeps": [
              "$10, $11",
              "TO __stage01_output"
            ]
          }
        ],
        "waitMsAvg": "140",
        "waitMsMax": "140",
        "waitRatioAvg": 1.0,
        "waitRatioMax": 1.0,
        "writeMsAvg": "129",
        "writeMsMax": "129",
        "writeRatioAvg": 0.9214285714285714,
        "writeRatioMax": 0.9214285714285714
      }
    ],
    "referencedTables": [
      {
        "datasetId": "samples",
        "projectId": "publicdata",
        "tableId": "shakespeare"
      }
    ],
    "statementType": "SELECT",
    "timeline": [
      {
        "activeUnits": "0",
        "completedUnits": "2",
        "elapsedMs": "999",
        "pendingUnits": "0",
        "totalSlotMs": "185"
      },
      {
        "activeUnits": "0",
        "completedUnits": "2",
        "elapsedMs": "1197",
        "pendingUnits": "0",
        "totalSlotMs": "185"
      }
    ],
    "totalBytesBilled": "10485760",
    "totalBytesProcessed": "2464625",
    "totalPartitionsProcessed": "0",
    "totalSlotMs": "127"
  },
  "totalBytesProcessed": "2464625"
},

Using execution information

BigQuery query plans provides information about how the service executes queries, but the managed nature of the service limits whether some details are directly actionable. Many optimizations happen automatically simply by using the service, which may differ from other environments where tuning, provisioning, and monitoring may require dedicated, knowledgeable staff.

For specific techniques that can improve query execution and performance, see the best practices documentation. The query plan and timeline statistics can help you understand if certain stages dominate resource utilization. For example, a JOIN stage that generates far more output rows than input rows may indicate an opportunity to filter earlier in the query.

Additionally, timeline information can help identify if a given query is slow in isolation or due to effects from other queries contending for the same resources. If you observe that the number of active units remains limited throughout the lifetime of the query but the amount of queued units of work remains high, this may represent cases where reducing the number of concurrent queries will significantly improve overall execution time for certain queries.

Was this page helpful? Let us know how we did:

Send feedback about...

Need help? Visit our support page.