Tuning a query using the query plan visualizer

The query plan visualiser allows you to quickly understand the structure of the plan chosen by Cloud Spanner to evaluate a query. In this guide, we'll look at how to use this visual representation of a query plan to help you understand the execution of your queries.

Before you begin

To familiarise yourself with the parts of the Cloud Console user interface mentioned in this guide, read the following:

Running a query in Cloud Console

  1. Navigate to the Cloud Spanner Instances page in Cloud Console.

    Go to the Instances page

  2. Select the name of the instance containing the database you want to query.

    Cloud Console displays the instance's Overview page.

  3. Select the name of the database you want to query.

    Cloud Console displays the database's Overview page.

  4. In the side menu, click Query.

    Cloud Console displays the database's Query page.

  5. Enter the SQL query in the editor pane.
  6. Click Run.

    Cloud Spanner runs the query.

  7. Click the Explanation tab to see the query plan visualization.

A tour of the query editor

The Query page provides query tabs that let you type or paste SQL query and DML statements, run them against your database, and view their results and query execution plans. The key components of the Query page are numbered in the following screenshot.

Annotated query page.
Figure 7. Annotated Query page.
  1. The tab bar shows the query tabs you have open. To create a new tab, click New tab.

    The tab bar also provides a list of Query templates you can use to paste queries that provide insights about database queries, transactions, reads and more, as described in Overview of introspection tools.

  2. The editor commands bar provides these options:
    • The Run command executes the statements entered in the editing pane, producing query results in the Results tab and query execution plans in the Explanation tab. Change the default behavior using the drop down to produce Results only or Explanation only.

      Highlighting something in the editor changes the Run command to Run selected, allowing you to execute just what you have selected.

    • The Clear query command deletes all text in the editor and clears the Results and Explanation subtabs.
    • The Format query command formats statements in the editor so that they are easier to read.
    • The Shortcuts command displays the set of keyboard shortcuts you can use in the editor.
    • The SQL query help link opens a browser tab to documentation about SQL query syntax.

    Queries are validated automatically any time they are updated in the editor. If the statements are valid, the editor commands bar displays a confirmation check mark and the message Valid. If there are any issues, it displays an error message with details.

  3. The editor is where you enter SQL query and DML statements. They are color-coded and line numbers are automatically added for multi-line statements.

    If you enter more than one statement in the editor, you must use a terminating semicolon after each statement except the last one.

  4. The bottom pane of a query tab provides three subtabs:
    • The Schema subtab shows the tables in the database and their schemas. Use it as a quick reference when composing statements in the editor.
    • The Results subtab shows the results when you run the statements in the editor. For queries it shows a results table, and for DML statements like INSERT and >UPDATE it shows a message about how many rows were affected.
    • The Explanation subtab shows visual graphs of the query plans created when you run the statements in the editor.
  5. The Results and Explanation subtabs both provide a statement selector you use to choose which statement's results or query plan you want to view.

A tour of the query plan visualizer

The key components of the visualizer are annotated in the following screenshot and described in more detail. After running a query in a query tab, select the EXPLANATION tab below the query editor to open the query execution plan visualizer.

The dataflow in the following diagram is bottom-up, that is, all the tables and indexes are at the bottom of the diagram and the final output is at the top.

Annotated query plan visualizer
Figure 8. Annotated query plan visualizer.
  1. The visualization of your plan can be large, depending on the query you executed. To hide and show details toggle the EXPANDED/COMPACT view selector. You can customize how much of the plan you see at any one time using the zoom control.
  2. The algebra that explains how Cloud Spanner runs the query is drawn as an acyclic graph, where each node corresponds to an iterator that consumes rows from its inputs and produces rows to its parent. A sample plan is shown in Figure 9. Click on the diagram to see an expanded view of some of the details of the plan.

    Visual plan screenshot thumbnail
    Figure 9. Sample visual plan (Click to zoom in).
    Visual plan zoomed-in screenshot

    Each node, or card, on the graph represents an iterator and contains the following information:

    • The iterator name. An iterator consumes rows from its input and produces rows.
    • Runtime statistics telling you how many rows were returned, what the latency was, and how much CPU was consumed.
    • We provide the following visual cues to help you identify potential issues within the query execution plan.
    • Red bars in a node are visual indicators of the percentage of latency or CPU time for this iterator compared to the total for the query.
    • The thickness of lines connecting each node represents the row count. The thicker the line, the larger the number of rows passed to the next node. The actual number of rows is displayed in each card and when you hover over a connector.
    • A warning triangle is displayed on a node where a full table scan was performed. More details in the information panel include recommendations such as adding an index, or revising the query or schema in other ways if possible in order to avoid a full scan.
    • Select a card in the plan to see details in the information panel on the right (5).

  3. The execution plan mini-map shows a zoomed-out view of the full plan and is useful for determining overall shape of the execution plan and for navigating to different parts of the plan quickly. Drag directly on the mini-map or click where you'd like to focus, in order to navigate to another part of the visual plan.
  4. Select DOWNLOAD JSON to download a JSON version of the execution plan, which is useful when you contact the Cloud Spanner team for support.
  5. The information panel shows detailed contextual information about the selected node in the query plan diagram. The information is organized into the following categories.
    • Iterator information provides details, as well as runtime statistics, for the iterator card you selected in the graph.
    • Query summary provides details about the number of rows returned and the time it took to run the query. Prominent operators are those that exhibit significant latency, consume significant CPU relative to other operators and return significant numbers of data rows.
    • Query execution timeline is a time-based graph that shows how long each machine group was running its portion of the query. A machine group may not necessarily be running for the entire duration of the query's running time. It is also possible that a machine group ran multiple times during the course of running the query, but the timeline here only represents the start of the first time it ran and the end of the last time it ran.

Tuning a query that exhibits poor performance

Imagine your company runs an online movie database that contains information about movies such as cast, production companies, movie details, and more. The service runs on Cloud Spanner, but has been experiencing some performance issues lately.

As lead developer for the service, you are asked to investigate these performance issues because they are causing poor ratings for the service. You open the Cloud Console, navigate to your database instance and then open the query editor. You enter the following query into the editor and run it.

SELECT
  t.title,
  MIN(t.production_year) AS year,
  ANY_VALUE(mc.note HAVING MIN t.production_year) AS note
FROM
  title AS t
JOIN
  movie_companies AS mc
ON
  t.id = mc.movie_id
WHERE
  t.title LIKE '% the %'
GROUP BY
  title;

The result of running this query is shown in the following screenshot. We formatted the query in the editor by selecting FORMAT QUERY. There is also a note in the top right of the screen telling us that the query is valid.

Query editor displaying original query
Figure 1. Query editor displaying the original query.

The RESULTS tab below the query editor shows that the query completed in just over two minutes. You decide to look closer at the query to see whether the query is efficient.

Analyzing our slow query with the query plan visualizer

At this point, we know that the query in the preceding step takes over two minutes, but we don't know whether the query is as efficient as possible and, therefore, whether this duration is expected.

You select the EXPLANATION tab just below the query editor to view a visual representation of the execution plan that Cloud Spanner created to run the query and return results.

The plan shown in the following screenshot is relatively large but, even at this zoom level, you can make the following observations.

  • Based on the Query summary in the information panel on the right, we learn that nearly 3 million rows were scanned and under 64K were ultimately returned.

  • We can also see from the Query execution timeline panel that 4 machine groups were involved in the query. A machine group is responsible for the execution of a portion of the query. Operators may execute on one or more machines. Selecting a machine group in the timeline highlights on the visual plan what part of the query was executed on that group.

Query plan visualizer displaying visual explanation of the original query
Figure 2. Query plan visualizer showing the visual plan of the original query.

Because of these factors, you decide that an improvement in performance may be possible by changing the join from an apply join, which Cloud Spanner chose by default, to a hash join.

Improving the query

To improve the performance of the query, you use a join hint to change the join method to a hash join. This join implementation executes set-based processing.

Here's the updated query:

SELECT
  t.title,
  MIN(t.production_year) AS year,
  ANY_VALUE(mc.note HAVING MIN t.production_year) AS note
FROM
  title AS t
JOIN
  @{join_method=hash_join} movie_companies AS mc
ON
  t.id = mc.movie_id
WHERE
  t.title LIKE '% the %'
GROUP BY
  title;

The following screenshot illustrates the updated query. As shown in the screenshot, the query completed in less than 5 seconds, a significant improvement over 120 seconds runtime before this change.

Query editor displaying improved query
Figure 3. Query editor displaying the improved query.

Let's examine the new visual plan, shown in the following diagram, to see what it tells us about this improvement.

Query visualization in Cloud Console UI
Figure 4. Query plan visualization after the query improvements (Click to zoom in).

Visual plan zoomed-in screenshot

Immediately, you notice some differences:

  • Only one machine group was involved in this query execution.

  • The number of aggregations has been reduced dramatically.

Conclusion

In this scenario, we ran a slow query and looked at its visual plan to look for inefficiencies. The following is a summary of the queries and plans before and after any changes were made. Each tab shows the query that was run and a compact view of the full query execution plan visualization.

Before

SELECT
  t.title,
  MIN(t.production_year) AS year,
  ANY_VALUE(mc.note
  HAVING
    MIN t.production_year) AS note
FROM
  title AS t
JOIN
  movie_companies AS mc
ON
  t.id = mc.movie_id
WHERE
  t.title LIKE '% the %'
GROUP BY
  title;
Compact view of visual plan before improvements.
Figure 5. Compact view of the visual plan before improvements.

After

SELECT
  t.title,
  MIN(t.production_year) AS year,
  ANY_VALUE(mc.note
  HAVING
    MIN t.production_year) AS note
FROM
  title AS t
JOIN
  @{join_method=hash_join} movie_companies AS mc
ON
  t.id = mc.movie_id
WHERE
  t.title LIKE '% the %'
GROUP BY
  title;
Compact view of visual plan after improvements.
Figure 6. Compact view of the visual plan after improvements.

An indicator that something could be improved in this scenario was that a large proportion of the rows from the table title qualified the filter LIKE '% the %'. Seeking into another table with so many rows is likely to be expensive. Changing our join implementation to a hash join improved performance significantly.

What's Next