Query execution reference

This page explains the output of a query executed with Query Explain. To learn how to execute a query with Query Explain, see Analyze query execution with Query Explain.

Common Concepts

The following common concepts and terms are used throughout the execution tree.

Rows and records

The terms row and record are used to generically refer to a document or index entry.

Variables

$ denotes a variable, which is created or referenced in the execution tree. For example: $foo_1. These variables are typically used to refer to the contents of a document or the value of an expression evaluated during the execution of a query.

The following internal variables can appear in the execution nodes:

  • $__key__-the key is an internal identifier for a document. This is an absolute, unique identifier with the project, database, and the full path of the document.
  • $__id__-the ID is a unique identifier for a document within its collection. This is unique within a single collection.

Consider an example where a Compute node is used to compute the __id__ from the document __key__:

Compute
    |  $__id__1: _id($__key__)
    |  records returned: 1

Constraints and ranges

Some scan nodes use constraints and ranges attributes to describe the range of values that are scanned. These attributes use a range tree format which contains a list of values. These values correspond to the ordered list of keys which appear in the index definition. For example, the first range which appears in the tree, here (1..5], corresponds to the constraints on the first key, here a, in the ordered list of keys:

| index: type=CollectionGroupIndex, id=CICAgOjXh#EK, keys=[a ASC, b ASC, __key__ ASC]
| constraints: /
               |----(1..5]
                    |----[1L]

Each level of indentation indicates the constraint applying to the next key in the list. Square brackets represent an inclusive range, rounded brackets are an exclusive range. In this case, the constraint translates to 1 < "a" <= 5, and "b" = 1.

In the following example with multiple branches for a, the constraint corresponds to 1 < a <= 5 OR a = 10:

| constraints: /
               |----(1L, 5L]
               |----[10L]

Key Variables

In some scan nodes (such as SequentialScan), there is both a list of keys as part of the index attribute, and a separate keys attribute in the Scan node. The keys attribute in the Scan node denotes the variable name of each key in the index definition, in order. The variables can be used to reference the runtime values of the scanned field further up in the execution tree.

In the following example, the value of the user field for the current document maps to variable $user_1 and the value of date_placed to $date_placed_1.

index: type=CollectionGroupIndex, id=CICAgOjXh4EK, keys=[user ASC, date_placed ASC, __key__ ASC]
keys: [user ASC, date_placed ASC, __key__ ASC]

Execution Nodes

A query execution tree can contain the following nodes.

SeekingScan

Represents a dynamic scan where the rows returned may not be along a single sequential range of the index, and multiple distinct scans must be performed to satisfy the query.

For example, a query where a exists and b equals 1 working on an index of ["a" ASC, "b" ASC], would need to scan and return a separate, potentially non-sequential range for each distinct value of a. This is more efficient than a full TableScan, but less efficient than a single SequentialScan on a composite index of ["b" ASC, "a" ASC].

• SeekingScan
| constraints: /
               |----(-∞..+∞)
                    |----[1L]
| index: type=CollectionGroupIndex, id=CAE, keys=[user ASC, quantity ASC, __key__ ASC]
| keys: [user ASC, quantity ASC, __key__ ASC]
| properties: Selection { user }
| records returned: 1
| records scanned: 1

SequentialScan

Represents a scan of a static, sequential range of rows in storage that can be performed in a single read operation.

The key ordering length refers to the number of keys that must be preserved and returned in original key order. For a schema of [k1, k2, k3], a key ordering length of 0 means the scan can return in any order, 1 means order by k1, but rows with the same k1 value can come with any order, 3 returns documents in exact sorted order.

• SequentialScan
| index: type=CollectionGroupIndex, id=CAE, keys=[user ASC, date_placed ASC, __key__ ASC]
| key ordering length: 3
| keys: [user ASC, date_placed ASC, __key__ ASC]
| limit: 10
| properties: Selection { a }
| ranges: /
| records returned: 1
| records scanned: 1

UniqueScan

Represents a scan of a static, sequential range of rows in storage with in-memory deduplication of rows.

• UniqueScan
| index: type=CollectionGroupIndex, id=CAE, keys=[user ASC, date_placed ASC, __key__ ASC]
| keys: [user ASC, date_placed ASC, __key__ ASC]
| properties: Selection { a }
| ranges: /
          |----(-∞..+∞)
| records returned: 1
| records scanned: 1

TableAccess

Back-joins the supplied row's identifier to the actual row contents from primary storage. TableAccess is required if a parent node (or the final query result) requires a subset of fields from the documents.

• TableAccess
|  order: PRESERVE_INPUT_ORDER
|  peak memory usage: 4.00 KiB (4,096 B)
|  properties: *
|  records returned: 1

TableScan

A full, unordered scan of a collection. Used when a query is run without an associated index.

Order can be either STABLE or UNDEFINED, with STABLE denoting a deterministic ordering.

• TableScan
|  order: STABLE
|  properties: *
|  records returned: 1
|  records scanned: 1
|  source: (default)#/**/collection

HashAggregate

Hash-backed implementation of aggregate operations. Requires materializing the full group in-memory before returning the result and must not exceed the the query memory limit.

• HashAggregate
|  aggregations: [sum($b_1) AS total]
|  groups: [$a_1]
|  peak memory usage: 4.00 KiB (4,096 B)
|  records returned: 0

StreamAggregate

Specialized aggregate node which only maintains state for a single group at a time, reducing peak memory usage. Used when the underlying child node will return groups sequentially. For example, when grouping by distinct values of a field while using an index on that field.

• StreamAggregate
|  keys: [foo ASC, bar ASC]
|  properties: Selection { baz }
|  aggregations: [$sum($foo_1) AS baz]

MajorSort

Performs a sort operation on a fixed set of properties. Materializes all records in memory at once and returns the sorted values in order, the size of the sort set is limited by the query memory limit.

When a subsequent limit is provided, a top-k sorting algorithm is used to reduce the memory usage. With it, sorts can be performed on an arbitrarily large set of records so long as the memory used by storing the k considered elements does not exceed the limit.

• MajorSort
|  fields: [a ASC, b DESC]
|  limit: 10
|  peak memory usage: 4.00 KiB (4,096 B)
|  records returned: 1

Concat

Concatenates the results of multiple child nodes and returns the result to the parent node. This node does not deduplicate results that appear in multiple children, and the order of returned results is nondeterministic.

• Concat
├── • TableAccess
...
├── • TableAccess

Compute

Evaluates a set of expressions, assigning the results to a set of variables.

• Compute
|  $user_1: user
|  $full_name_1: str_concat($first_name_1, " ", $last_name_1)
|  $address_1: UNSET
|  records returned: 1

Filter

Selectively returns rows if and only if they match the supplied expression.

• Filter
|  expression: $eq(foo, "bar")
|  records returned: 1

Values

Produces a sequence of literal values to work on. Used primarily when a set list of documents is provided as the input to a query.

• Values
| expression: [{__key__=/col/1}, {__key__=/col/2}]

ReplaceWith

Replaces the fields of rows produced by the child node with the fields from the supplied map expression.

• ReplaceWith
|  map: map("full_name", str_concat(first_name, " ", last_name)), current_context())
|  records returned: 1

Unnest

Unnests the value produced by the child node.

• Unnest
|  expression: foo AS unnested_foo

Limit

Limits the number of rows returned to the parent node.

• Limit
|  limit: 10
|  records returned: 1

Offset

Skips a set number of rows produced by the child node.

• Offset
|  offset: 10
|  records returned: 1