Use Query insights to improve query performance

Stay organized with collections Save and categorize content based on your preferences.

This page describes how to use the Query insights dashboard to detect and analyze performance problems.

Introduction

Query insights helps you detect, diagnose, and prevent query performance problems for Cloud SQL databases. It supports intuitive monitoring and provides diagnostic information that helps you go beyond detection to identify the root cause of performance problems.

With Query insights, you can monitor performance at an application level and trace the source of a problematic query across the application stack by model, view, controller, route, user, and host. The Query insights tool can integrate with your existing application monitoring (APM) tools and Google Cloud services by using open standards and APIs. This way, you can monitor and troubleshoot query problems by using your favorite tool.

Query insights helps you improve Cloud SQL query performance by guiding you through the following steps:

  1. View the database load for top queries.
  2. Identify a potentially problematic query or tag.
  3. Examine the query or tag to identify issues.
  4. Trace the source of the problem.

Query insights is supported on all Cloud SQL machine types and available in all Google Cloud regions.

Pricing

There's no additional cost for Query insights. You can access one week of data on the Query insights dashboard.

Query insights doesn't occupy any storage space in your Cloud SQL instance storage space. Metrics are stored in Cloud Monitoring. For API requests, see the Cloud Monitoring Pricing. Cloud Monitoring has a tier that you can use at no additional cost.

Before you begin

To view a query plan or perform end-to-end tracing, you need specific IAM permissions. Create a custom role and add the cloudtrace.traces.get IAM permission to it. Then, add this role to each user account that needs to use Query insights.

Enable Query insights

Query insights metrics are encrypted at rest. Users who have access to the Cloud SQL dashboard can access Query insights metrics on the Query insights dashboard. If you have permission to update instances, you can configure Query insights. For a list of permissions required for Cloud SQL instances, see Cloud SQL project access control.

Console

To enable Query insights for a Cloud SQL instance by using the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to the Cloud SQL Instances page.

    Go to Cloud SQL Instances

  2. To open the Overview page of an instance, click the instance name.
  3. On the Configuration tile, click Edit configuration.
  4. In the Configuration options section, expand Query insights.
  5. Select the Enable Query insights checkbox.
  6. Optional: Select one or more of the following Query insights options:

    Store client IP addresses

    Stores the client IP addresses where queries are coming from and helps you group that data to run metrics against it. Queries come from more than one host. Reviewing graphs for queries from client IP addresses can help identify the source of a problem.

    Store application tags

    Stores application tags that help you determine the APIs and model-view-controller (MVC) routes that are making requests and group the data to run metrics against it. This option requires you to comment queries with a specific set of tags using the sqlcommenter open source object-relational mapping (ORM) auto-instrumentation library. This information helps Query insights identify the source of a problem and the MVC the problem is coming from. Application paths help you with application monitoring.

    Customize query lengths

    Sets the default query length limit to a specified value from 256 bytes to 4500 bytes. The default query length is 1024 bytes. Higher query lengths are more useful for analytical queries, but they also require more memory. Changing the query length requires you to restart the instance. You can still add tags to queries that exceed the length limit.

    Set the maximum sampling rate

    By default, a maximum of 5 executed query plan samples are captured per minute across all databases on the instance. Change this value to a number from 0 (to disable sampling) to 20. Increasing the sampling rate is likely to give you more data points but might add a performance overhead.

  7. Click Save.

gcloud

To enable Query insights for a Cloud SQL instance by using gcloud, run gcloud sql instances patch with the --insights-config-query-insights-enabled flag as follows after replacing INSTANCE_ID with the ID of the instance.

gcloud sql instances patch INSTANCE_ID \
--insights-config-query-insights-enabled
  

Also, use one or more of the following optional flags:

  • --insights-config-record-client-address

    Stores the client IP addresses where queries are coming from and helps you group that data to run metrics against it. Queries come from more than one host. Reviewing graphs for queries from client IP addresses can help identify the source of a problem.

  • --insights-config-record-application-tags

    Stores application tags that help you determine the APIs and model-view-controller (MVC) routes that are making requests and group the data to run metrics against it. This option requires you to comment queries with a specific set of tags. You can do this by using the sqlcommenter open source object-relational mapping (ORM) auto-instrumentation library. This information helps Query insights identify the source of a problem and the MVC the problem is coming from. Application paths help you with application monitoring.

  • --insights-config-query-string-length

    Sets the default query length limit to a specified value from 256 to 4500 bytes. The default query length is 1024 bytes. Higher query lengths are more useful for analytical queries, but they also require more memory. Changing the query length requires you to restart the instance. You can still add tags to queries that exceed the length limit.

  • --query_plans_per_minute

    By default, a maximum of 5 executed query plan samples are captured per minute across all databases on the instance. Change this value to a number from 0 (to disable sampling) to 20. Increasing the sampling rate is likely to give you more data points but might add a performance overhead.

Replace the following:

  • INSIGHTS_CONFIG_QUERY_STRING_LENGTH: The query string length to be stored, in bytes.
  • API_TIER_STRING: The custom instance configuration to use for the instance.
  • REGION: The region for the instance.
gcloud sql instances patch INSTANCE_ID \
--insights-config-query-insights-enabled \
--insights-config-query-string-length=INSIGHTS_CONFIG_QUERY_STRING_LENGTH \
--query_plans_per_minute=QUERY_PLANS_PER_MINUTE \
--insights-config-record-application-tags \
--insights-config-record-client-address \
--tier=API_TIER_STRING \
--region=REGION
  

REST v1

To enable Query insights for a Cloud SQL instance by using the REST API, call the instances.patch method with insightsConfig settings.

Before using any of the request data, make the following replacements:

  • project-id: The project ID.
  • instance-id: The instance ID.

HTTP method and URL:

PATCH https://sqladmin.googleapis.com/sql/v1beta4/projects/project-id/instances/instance-id

Request JSON body:

{
  "settings" : { "insightsConfig" : { "queryInsightsEnabled" : true } }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "kind": "sql#operation",
  "targetLink": "https://sqladmin.googleapis.com/sql/v1beta4/projects/project-id/instances/instance-id",
  "status": "PENDING",
  "user": "user@example.com",
  "insertTime": "2021-01-28T22:43:40.009Z",
  "operationType": "UPDATE",
  "name": "operation-id",
  "targetId": "instance-id",
  "selfLink": "https://sqladmin.googleapis.com/sql/v1beta4/projects/project-id/operations/operation-id",
  "targetProject": "project-id"
}

Metrics are expected to be available in Query insights within minutes of query completion. Review the Cloud Monitoring data retention policy. Query insights traces are stored in Cloud Trace. Review the Cloud Trace data retention policy.

View the Query insights dashboard

The Query insights dashboard shows the query load based on factors that you select. Query load is a measurement of the total work for all the queries in the instance in the selected time range. The dashboard provides a series of filters that help you view query load.

To open the Query insights dashboard, follow these steps:

  1. To open the Overview page of an instance, click the instance name.
  2. Either select the Query insights tab in the left navigation panel or click the Go to Query insights for more in-depth info on queries and performance link.

The Query insights dashboard opens. It shows details about the instance at the top.

Shows the Query insights dashboard, with drop-down menus
         for databases, users, and addresses. To the right of the drop-down
         menus, there's a filter for setting a time range. Also,
         a graph shows the database load for top queries. At the bottom
         of the graph, there are selection boxes for CPU capacity, CPU and
         CPU wait, IO Wait, and Lock Wait and a tab each for queries and tags.

The areas of the dashboard include:

  1. Databases: Filters query load on a specific database or all databases.
  2. User: Filters query load from a specific user account.
  3. Client address: Filters query load from a specific IP address.
  4. Time range: Filters query load by time ranges, such as hour, day, week, or a custom range.
  5. Database load graph: Displays the query load graph, based on filtered data.
  6. CPU capacity, CPU and CPU wait, IO wait, and Lock wait: Filters loads based on the options that you select. See View the database load for top queries for details about each of these filters.
  7. Queries and tags. Filters query load by either a selected query or a selected SQL query tag. See Filter the database load.

View the database load for all queries

Database query load is a measure of the work (in CPU seconds) that the executed queries in your selected database perform over time. Each running query is either using or waiting for CPU resources, IO resources, or lock resources. Database query load is the ratio of the amount of time taken by all the queries that are completed in a given time window to the wall-clock time.

The top-level Query insights dashboard shows the Database load — all top queries graph. Drop-down menus on the dashboard let you filter the graph for a specific database, user, or client address.

Shows the database load graph with a load for CPU capacity, CPU and
         CPU wait, IO Wait, and Lock Wait.

Colored lines in the graph show the query load, split into four categories:

  • CPU capacity: The number of CPUs available on the instance.
  • CPU and CPU Wait: The ratio of the time taken by queries in an active state to wall-clock time. IO and Lock waits don't block queries that are in an active state. This metric might mean that the query is either using the CPU or waiting for the Linux scheduler to schedule the server process running the query while other processes are using the CPU.

  • IO Wait: The ratio of time taken by queries that are waiting for IO to wall-clock time. IO wait includes Read IO Wait and Write IO Wait.

    See the PostgreSQL event table.

    If you want a breakdown of information for IO waits, you can see it in Cloud Monitoring. See Cloud SQL metrics for more information.

  • Lock Wait: The ratio of time taken by queries that are waiting for Locks to wall-clock time. It includes Lock Waits, LwLock Waits, and Buffer pin Lock waits. To see a breakdown of information for lock waits, use Cloud Monitoring. See Cloud SQL metrics for more information.

Review the graph and use the filtering options to explore these questions:

  1. Is the query load high? Is the graph spiking or elevated over time? If you don't see a high load, then the problem isn't with your query.
  2. How long has the load been high? Is it high only now or has it been high for a long time? Use the range selector to select various time periods to find out how long the problem has lasted. Zoom in to view a time window where query load spikes are observed. Zoom out to view up to one week of the timeline.
  3. What's causing the high load? You can select options to examine the CPU capacity, CPU and CPU wait, Lock wait, or IO wait. The graph for each of these options is a different color so that you can easily spot the one with the highest load. The dark blue line on the graph shows the maximum CPU capacity of the system. It lets you compare the query load with the maximum CPU system capacity. This comparison helps you determine whether an instance is running out of CPU resources.
  4. Which database is experiencing the load? Select different databases from the Databases drop-down menu to find the databases with the highest loads.
  5. Do specific users or IP addresses cause higher loads? Select different users and addresses from the drop-down menus to identify the ones that are causing higher loads.

Filter the database load

You can filter the database load by queries or tags.

Filter by queries

The Queries table provides an overview of the queries that cause the most query load. The table shows all the normalized queries for the time window and options selected on the Query insights dashboard. It sorts queries by the total execution time during the time window that you selected.

Shows the database load graph with a load for queries, with
         filters selected for CPU capacity, CPU and
         CPU wait, IO Wait, and Lock Wait.

To sort the table, select a column heading or a property from Filter queries. The table shows the following properties:

  • Query: The normalized query string. Query insights shows only 1024 characters in the query string by default.

    Queries labeled UTILITY COMMAND usually include BEGIN, COMMIT, and EXPLAIN commands or wrapper commands.

  • Database: The database against which the query was run.

  • Load by total time/Load by CPU/Load by IO wait/Load by lock wait: The options by which you can filter specific queries to find the largest load.

  • Avg execution time (ms): The average time for the query to execute.

  • Times called: The number of times the application called the query.

  • Avg rows returned: The average number of rows returned for the query.

Query insights stores and displays only normalized queries. By default, Query insights doesn't collect IP addresses or tag information. You can enable Query insights to collect this information and, when required, disable collection. Query plan traces don't collect or store any constant values and removes any PII information that the constant might show.

For PostgreSQL 9.6 and 10, Query insights displays normalized queries, that is, ? replaces the literal constant value. In the following example, the name constant is removed and ? replaces it.

UPDATE
  "demo_customer"
SET
  "customer_id" = ?::uuid,
  "name" = ?,
  "address" = ?,
  "rating" = ?,
  "balance" = ?,
  "current_city" = ?,
  "current_location" = ?
WHERE
  "demo_customer"."id" = ?

For PostgreSQL version 11 and later, $1, $2, and so on, replace literal constant values.

UPDATE
  "demo_customer"
SET
  "customer_id" = $1::uuid,
  "name" = $2,
  "address" = $3,
  "rating" = $4,
  "balance" = $5,
  "current_city" = $6,
  "current_location" = $7
WHERE
  "demo_customer"."id" = $8

Filter by query tags

To troubleshoot an application, you must first add tags to your SQL queries. Query load tags provide a breakdown of the query load of the selected tag over time.

Query insights provides application-centric monitoring to diagnose performance problems for applications built using ORMs. If you're responsible for the entire application stack, Query insights provides query monitoring from an application view. Query tagging helps you find issues at higher-level constructs, such as with the business logic or a microservice.

You might tag queries by the business logic, for example, the payment, inventory, business analytics, or shipping tags. You can then find the query load that the various business logics create. For example, you might observe unexpected events, such as spikes for a business analytics tag at 1 PM or abnormal growth for a payment service trending over the previous week.

To calculate the Database load for tag, Query insights uses the amount of time taken by every query that uses the tag that you select. The tool calculates the completion time at the minute boundary by using wall-clock time.

On the Query insights dashboard, to view the tags table, select Tags. The table sorts tags by their total load by total time.

Shows the Query insights dashboard, with load for tags and a list of tags.

You can sort the table by selecting a property from Filter tags, or by clicking a column heading. The table shows the following properties:

  • Action, Controller, Framework, Route, Application, DB driver: Each property that you added to your queries appears as a column. At least one of these properties must be added if you want to filter by tags.
  • Load by total time/Load by CPU/Load by IO wait/Load by lock wait: Options to filter specific queries to find the largest load for each option.
  • Avg execution time (ms): The average time for the query to run.
  • Avg rows returned: The average number of rows returned for the query.
  • Times called: The number of times the application called the query.
  • Database: The database against which the query was run.

Examine a specific query or tag

To determine whether a query or a tag is the root cause of the problem, do the following from the Queries tab or Tags tab, respectively:

  1. To sort the list in descending order, click the Load by total time header.
  2. Click the query or tag at the top of the list. It has the highest load and is taking more time than the others.

A dashboard opens showing the details of the selected query or tag.

Examine a specific query load

The dashboard for a selected query appears as follows:

Shows the database load and latency graphs for a
         specific query.

The Database load — specific query graph shows a measure of the work (in CPU seconds) that your normalized query has performed in your selected query over time. To calculate load, it uses the amount of time taken by the normalized queries that are completed at the minute boundary to the wall-clock time. At the top of the table, the first 1024 characters of the normalized query, with literals removed for aggregation and PII reasons, are displayed.

Shows the database load graph with a load for a specific query, with
         filters selected for CPU capacity, CPU and CPU wait, IO wait, and Lock
         wait.

As with the total queries graph, you can filter the load for a specific query by Database, User, and Client address. Query load is split into CPU capacity, CPU and CPU wait, IO wait, and Lock wait.

Examine a specific tagged query load

The dashboard for a selected tag appears as follows. For example, if all queries from a microservices payment are tagged as payment, you can see the amount of query load that's trending by viewing the tag payment.

Shows the database load and latency graphs on the page for a
         specific tag.

The Database load — specific tags graph shows a measure of the work (in CPU seconds) that queries matching your selected tags have performed in your selected database over time. As with the total queries graph, you can filter the load for a specific tag by Database, User, and Client address.

Examine operations in a sampled query plan

A query plan takes a sample of your query and breaks it down into individual operations. It explains and analyzes each operation in the query.

The Query plan samples graph shows all the query plans running at particular times and the amount of time each plan took to run. You can change the rate at which query plan samples are captured per minute. See Enable Query insights.

A graph for sample query plans, with the time when they
         were run at the bottom of the graph (x axis) and the number of seconds
         they ran on the right (y axis).

By default, the panel on the right shows the details for the sample query plan that takes the longest time, as visible on the Query plan samples graph. To see the details for another sample query plan, click the relevant circle on the graph. Expanded details show a model of all the operations in the query plan. Each operation shows the latency, rows returned, and the cost of the operation. When you select an operation, you can see more details, such as shared hit blocks, the type of schema, loops, and plan rows.

The query plan shows the latency and cost for each operation
         run for the query. It starts with an aggregate, which returns 48 rows,
         with a latency of 31.06 ms and a cost. of 296.34. The next operation is
         a nested loop, which splits into another nested loop and a materialize.
         The nested loop splits into another nested loop and an index scan. The
         materialize leads to a sequence scan.

Try to narrow down the problem by looking into the following questions:

  1. What's the resource consumption?
  2. How does it relate to other queries?
  3. Does consumption change over time?

Examine latency

Latency is the time taken for the normalized query to complete, in wall-clock time. You use the Latency graph to examine latency on the query or tag. The latency dashboard shows the 50th, 95th, and 99th percentile latencies to find outlier behaviors.

The following image shows the database load graph at the 50th percentile for a specific query with filters selected for CPU capacity, CPU and CPU wait, IO wait, and Lock wait.

Shows the query latency graph for a specific
         query with filters selected for CPU capacity, CPU and CPU wait,
         IO wait, and Lock wait.

The latency of parallel queries is measured in wall-clock time even though the query load can be higher for the query due to multiple cores being used to run part of the query.

Try to narrow down the problem by looking into the following questions:

  1. What's causing the high load? Select options to look at the CPU capacity, CPU and CPU wait, I/O wait, or Lock wait.
  2. How long has the load been high? Is it only high now? Or has it been high for a long time? Change the time range to find the date and time that the load started performing poorly.
  3. Were there spikes in latency? Change the time window to study the historical latency for the normalized query.

Trace the source of the problem

When you find the areas and times where the load was the highest, identify the source of the problem by using tracing to drill down further.

To help you identify the specific source of the problem, such as a model, view, controller, route, host, or user, Query insights provides an in-context end-to-end application trace view. This view helps you understand what's going on at the Database Layer for a specific request and to find the source of a problematic query by model, view, controllers, and route.

If you enable OpenCensus or OpenTelemetry, opencensus span information is sent to the database along with the tag information inside SQL comments. Any traces from the application to Cloud Logging are linked with database query plan traces to help identify the source of the problem.

Click the End to end tab in the Sample Query screen to look at the in-context trace.

Select an End-to-end tag to view specific information
         about the tag. The Summary shows the RPCs and Total Duration in ms for
         each operation for that tag.

To determine the client and user causing the problem, use the Top client addresses and Top users tables to find the highest loads. You can add a user or IP address to the filter to further analyze a specific user or client address. The details in the tables include the percentage of the query load, the average execution time in milliseconds, and the times called.

The image shows that for top client addresses, the load was
         100%, the average execution time was 19,568 seconds, and the times
         called was 1,226. For top users, the user postgres had 100% of the load,
         had an average execution time of 19,568 ms, and was called 1,226
         times.

You can use Cloud Trace to see end-to-end tracing for each step in the query plan. On the Query insights dashboard, click the View in trace link to open the Cloud Trace tool. The trace graph shows all the traces that have been run for the selected period.

The trace graph shows all the traces that have been run for
         the selected period, in this case, one hour. The
         page also has a table that shows latency, HTTP method, URL, and the
         time when the trace was run.cd

For details, see Finding and viewing traces.

Add tags to SQL queries

Tagging SQL queries simplifies application troubleshooting. You can use sqlcommenter to add tags to your SQL queries either automatically or manually.

Use sqlcommenter with ORM

When you use ORM instead of directly writing SQL queries, you might not find application code that's causing performance challenges. You might also have trouble analyzing how your application code affects query performance. To tackle this issue, Query insights provides an open source library called sqlcommenter. This library is useful for developers and administrators using ORM tools to detect which application code is causing performance problems.

If you're using ORM and sqlcommenter together, the tags are automatically created. You don't need to add or change code in your application.

You can install sqlcommenter on the application server. The instrumentation library allows application information related to your MVC framework to be propagated to the database along with the queries as a SQL comment. The database picks up these tags and starts recording and aggregating statistics by tags, which are orthogonal to statistics aggregated by normalized queries. Query insights shows the tags so that you know which application is causing the query load and can find the application code that's causing performance problems.

When you examine results in SQL database logs, they appear as follows:

SELECT * from USERS /*action='run+this',
controller='foo%3',
traceparent='00-01',
tracestate='rojo%2'*/

Supported tags include the controller name, route, framework, and action.

The set of ORM tools in sqlcommenter is supported for the following programming languages:

Python
  • Django
  • psycopg2
  • Sqlalchemy
  • Flask
Java
  • Hibernate
  • Spring
Ruby
  • Rails
Node.js
  • Knex.js
  • Sequelize.js
  • Express.js

For more information about sqlcommenter and how to use it in your ORM framework, see the sqlcommenter documentation.

Use sqlcommenter to add tags

If you're not using ORM, you must manually add sqlcommenter tags or comments in the correct SQL comment format to your SQL query. You must also augment each SQL statement with a comment containing a serialized key-value pair. Use at least one of the following keys:

  • action=''
  • controller=''
  • framework=''
  • route=''
  • application=''
  • db driver=''

Query insights drops all other keys.

Disable Query insights

Console

To disable Query insights for a Cloud SQL instance by using the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to the Cloud SQL Instances page.

    Go to Cloud SQL Instances

  2. To open the Overview page of an instance, click the instance name.
  3. On the Configuration tile, click Edit configuration.
  4. In the Configuration options section, expand Query insights.
  5. Clear the Enable Query insights checkbox.
  6. Click Save.

gcloud

To disable Query insights for a Cloud SQL instance by using gcloud, run gcloud sql instances patch with the --no-insights-config-query-insights-enabled flag as follows, after replacing INSTANCE_ID with the ID of the instance.

gcloud sql instances patch INSTANCE_ID \
--no-insights-config-query-insights-enabled
  

REST

To disable Query insights for a Cloud SQL instance by using the REST API, call the instances.patch method with queryInsightsEnabled set to false as follows.

Before using any of the request data, make the following replacements:

  • project-id: The project ID.
  • instance-id: The instance ID.

HTTP method and URL:

PATCH https://sqladmin.googleapis.com/sql/v1beta4/projects/project-id/instances/instance-id

Request JSON body:

{
  "settings" : { "insightsConfig" : { "queryInsightsEnabled" : false } }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "kind": "sql#operation",
  "targetLink": "https://sqladmin.googleapis.com/sql/v1beta4/projects/project-id/instances/instance-id",
  "status": "PENDING",
  "user": "user@example.com",
  "insertTime": "2021-01-28T22:43:40.009Z",
  "operationType": "UPDATE",
  "name": "operation-id",
  "targetId": "instance-id",
  "selfLink": "https://sqladmin.googleapis.com/sql/v1beta4/projects/project-id/operations/operation-id",
  "targetProject": "project-id"
}

What's next

  • See Cloud SQL metrics. The Query insights metric type strings start with database/postgresql/insights.