Read statistics

Cloud Spanner provides built-in tables that store statistics about reads. You can retrieve statistics from these SPANNER_SYS.READ_STATS* tables using SQL statements.

When to use read statistics

Read statistics provide insight into how an application is using the database, and are useful when investigating performance issues. For example, you can check what read shapes are running against a database, how frequently they run and explain the performance characteristics of these read shapes. You can use the read statistics for your database to identify read shapes that result in high CPU usage. At a high level, read statistics will help you to understand the behavior of the traffic going into a database in terms of resource usage.

Limitations

  • This tool is best suited for analyzing streams of similar reads that account for most of the CPU usage. It is not good for searching for reads that were run only one time.

  • The CPU usage tracked in these statistics represents Cloud Spanner server side CPU usage, excluding prefetch CPU usage and some other overheads.

  • Statistics are collected on a best-effort basis. As a result, it is possible for statistics to be missed if there are issues with underlying systems. For instance, if there is internal networking issues, it is possible for some statistics to be missed.

Availability

SPANNER_SYS data is available only through SQL interfaces (for example, executeQuery and gcloud spanner databases execute-sql); other single read methods Cloud Spanner provides do not support SPANNER_SYS.

CPU usage grouped by read shape

The following tables track the read shapes with the highest CPU usage during a specific time period:

  • SPANNER_SYS.READ_STATS_TOP_MINUTE: Read shape statistics aggregated across 1 minute intervals.
  • SPANNER_SYS.READ_STATS_TOP_10MINUTE: Read shape statistics aggregated across 10 minute intervals.
  • SPANNER_SYS.READ_STATS_TOP_HOUR: Read shape statistics aggregated across 1 hour intervals.

These tables have the following properties:

  • Each table contains data for non-overlapping time intervals of the length the table name specifies.

  • Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour. After each interval, Cloud Spanner collects data from all servers and then makes the data available in the SPANNER_SYS tables shortly thereafter.

    For example, at 11:59:30 AM, the most recent intervals available to SQL queries are:

    • 1 minute: 11:58:00–11:58:59 AM
    • 10 minute: 11:40:00–11:49:59 AM
    • 1 hour: 10:00:00–10:59:59 AM
  • Cloud Spanner groups the statistics by read shape. FPRINT (Fingerprint) is calculated based on the columns involved in the reads.

  • Each row contains statistics for all executions of a particular read shape that Cloud Spanner captures statistics for during the specified interval.

  • If Cloud Spanner is unable to store information about every distinct read shape run during the interval, the system prioritizes read shapes with the highest CPU usage during the specified interval.

Table schema

Column name Type Description
INTERVAL_END TIMESTAMP End of the time interval that the included read executions occurred in.
READ_COLUMNS ARRAY<STRING> The set of columns that were read. These are in alphabetical order.
FPRINT INT64 Hash of the read column names.
EXECUTION_COUNT INT64 Number of times Cloud Spanner executed the read shape during the interval.
AVG_ROWS FLOAT64 Average number of rows that the read returned.
AVG_BYTES FLOAT64 Average number of data bytes that the read returned, excluding transmission encoding overhead.
AVG_CPU_SECONDS FLOAT64 Average number of Cloud Spanner server side CPU seconds executing the read, excluding prefetch CPU and other overhead.
AVG_LOCKING_DELAY_SECONDS FLOAT64 Average number of seconds spent waiting due to locking.
AVG_CLIENT_WAIT_SECONDS FLOAT64 Average number of seconds spent waiting due to the client not consuming data as fast as Cloud Spanner could generate it.
AVG_LEADER_REFRESH_DELAY_SECONDS FLOAT64 Average number of seconds spent waiting to confirm with the Paxos leader that all writes have been observed..

Example queries

This section includes several example SQL statements that retrieve read statistics. You can run these SQL statements using the client libraries, the gcloud command-line tool, or the Cloud Console.

Listing the basic statistics for each read shape in a given time period

The following query returns the raw data for the top read shapes in the most recent 1-minute time intervals.

SELECT fprint,
       read_columns,
       execution_count,
       avg_cpu_seconds,
       avg_rows,
       avg_bytes,
       avg_locking_delay_seconds,
       avg_client_wait_seconds
FROM spanner_sys.read_stats_top_minute
ORDER BY interval_end DESC LIMIT 3;
Query output
fprint read_columns execution_count avg_cpu_seconds avg_rows avg_bytes avg_locking_delay_seconds avg_client_wait_seconds
125062082139 ["Singers.id", "Singers.name"] 8514387 0.000661355290396507 310.79 205 8.3232564943763752e-06 0
151238888745 ["Singers.singerinfo"] 3341542 6.5992827184280315e-05 12784 54 4.6859741349028595e-07 0
14105484 ["Albums.id", "Albums.title"] 9306619 0.00017855774721667873 1165.4 2964.71875 1.4328191393074178e-06 0

Listing the read shapes, ordered by highest total CPU usage

The following query returns the read shapes with the highest CPU usage in the most recent hour:

SELECT read_columns,
       execution_count,
       avg_cpu_seconds,
       execution_count * avg_cpu_seconds AS total_cpu
FROM spanner_sys.read_stats_top_hour
WHERE interval_end =
  (SELECT MAX(interval_end)
   FROM spanner_sys.read_stats_top_hour)
ORDER BY total_cpu DESC LIMIT 3;
Query output
read_columns execution_count avg_cpu_seconds total_cpu
["ID", "NAME", "ADDRESS"] 1647 0.00023380297430622681 0.2579
["Routes"] 720 0.00016738889440282034 0.221314999999999
["id", "no"] 3223 0.00037764625882302246 0.188053

Aggregate statistics

SPANNER_SYS also contains tables to store aggregate read statistics captured by Cloud Spanner in a specific time period:

  • SPANNER_SYS.READ_STATS_TOTAL_MINUTE: Aggregate statistics for all read shapes during 1 minute intervals.
  • SPANNER_SYS.READ_STATS_TOTAL_10MINUTE: Aggregate statistics for all read shapes during 10 minute intervals.
  • SPANNER_SYS.READ_STATS_TOTAL_HOUR: Aggregate statistics for all read shapes during 1 hour intervals.

Aggregate statistics tables have the following properties:

  • Each table contains data for non-overlapping time intervals of the length the table name specifies.

  • Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour.

    For example, at 11:59:30 AM, the most recent intervals available to SQL queries on aggregate read statistics are:

    • 1 minute: 11:58:00–11:58:59 AM
    • 10 minute: 11:40:00–11:49:59 AM
    • 1 hour: 10:00:00–10:59:59 AM
  • Each row contains statistics for all read shapes executed over the database during the specified interval, aggregated together. There is only one row per time interval.

  • The statistics captured in the SPANNER_SYS.READ_STATS_TOTAL_* tables might include read shapes that Cloud Spanner did not capture in the SPANNER_SYS.READ_STATS_TOP_* tables.

Table schema

Column name Type Description
INTERVAL_END TIMESTAMP End of the time interval that the included read shape executions occurred in.
EXECUTION_COUNT INT64 Number of times Cloud Spanner executed the read shape during the interval.
AVG_ROWS FLOAT64 Average number of rows that the reads returned.
AVG_BYTES FLOAT64 Average number of data bytes that the reads returned, excluding transmission encoding overhead.
AVG_CPU_SECONDS FLOAT64 Average number of Cloud Spanner server side CPU seconds executing the read, excluding prefetch CPU and other overhead.
AVG_LOCKING_DELAY_SECONDS FLOAT64 Average number of seconds spent waiting due to locking.
AVG_CLIENT_WAIT_SECONDS FLOAT64 Average number of seconds spent waiting due to throttling.
AVG_LEADER_REFRESH_DELAY_SECONDS FLOAT64 Average number of seconds spent coordinating the reads across instances in multi-region configurations.

Example queries

This section includes several example SQL statements that retrieve aggregate read statistics. You can run these SQL statements using the client libraries, the gcloud command-line tool, or the Cloud Console.

Finding the total CPU usage across all read shapes

The following query returns the number of CPU hours consumed by read shapes in the most recent hour:

SELECT (avg_cpu_seconds * execution_count / 60 / 60)
  AS total_cpu_hours
FROM spanner_sys.read_stats_total_hour
WHERE interval_end =
  (SELECT MAX(interval_end)
   FROM spanner_sys.read_stats_total_hour);
Query output
total_cpu_hours
0.00026186111111111115

Finding the total execution count in a given time period

The following query returns the total number of read shapes executed in the most recent complete 1-minute interval:

SELECT interval_end,
       execution_count
FROM spanner_sys.read_stats_total_minute
WHERE interval_end =
  (SELECT MAX(interval_end)
   FROM spanner_sys.read_stats_total_minute);
Query output
interval_end execution_count
2020-05-28 11:02:00-07:00 12861966

Data retention

At a minimum, Cloud Spanner keeps data for each table for the following time periods:

  • SPANNER_SYS.READ_STATS_TOP_MINUTE and SPANNER_SYS.READ_STATS_TOTAL_MINUTE: Intervals covering the previous 6 hours.

  • SPANNER_SYS.READ_STATS_TOP_10MINUTE and SPANNER_SYS.READ_STATS_TOTAL_10MINUTE: Intervals covering the previous 4 days.

  • SPANNER_SYS.READ_STATS_TOP_HOUR and SPANNER_SYS.READ_STATS_TOTAL_HOUR: Intervals covering the previous 30 days.

Troubleshooting high CPU usage with read statistics

Cloud Spanner read statistics come in handy in cases where you need to investigate high CPU usage on your Cloud Spanner database or when you are just trying to understand the CPU-heavy read shapes on your database. Inspection of read shapes that use significant amounts of database resources gives Cloud Spanner users a potential way to reduce operational costs and possibly improve general system latencies. Using the following steps, we'll show you how to use read statistics to investigate high CPU usage in your database.

Selecting a time period to investigate

Start your investigation by looking for a time when your application began to experience high CPU usage. For example, let's say the issue started occurring around 5:20pm on May 28th 2020.

Gathering read statistics for the selected time period

Having selected a time period to start our investigation, we'll look at statistics gathered in the READ_STATS_TOTAL_10MINUTE table around that time. The results of this query might give us clues about how CPU and other read statistics changed over that period of time. The following query returns the aggregated read statistics from 4:30 pm to 7:30 pm (inclusive).

SELECT
  interval_end,
  ROUND(avg_cpu_seconds,4) as avg_cpu_seconds,
  execution_count,
  avg_locking_delay_seconds
FROM SPANNER_SYS.READ_STATS_TOTAL_10MINUTE
WHERE
  interval_end >= "2020-05-28T16:30:00"
  AND interval_end <= "2020-05-28T19:30:00"
ORDER BY interval_end;

Let's take the following data as an example of the result we get back from our query.

interval_end avg_cpu_seconds execution_count avg_locking_delay_seconds
2020-05-28 16:40:00-07:00 0.0004 11111421 8.3232564943763752e-06
2020-05-28 16:50:00-07:00 0.0002 8815637 8.98734051776406e-05
2020-05-28 17:00:00-07:00 0.0001 8260215 6.039129247846453e-06
2020-05-28 17:10:00-07:00 0.0001 8514387 9.0535466616680686e-07
2020-05-28 17:20:00-07:00 0.0006 13715466 2.6801485272173765e-06
2020-05-28 17:30:00-07:00 0.0007 12861966 4.6859741349028595e-07
2020-05-28 17:40:00-07:00 0.0007 3755954 2.7131391918005383e-06
2020-05-28 17:50:00-07:00 0.0006 4248137 1.4328191393074178e-06
2020-05-28 18:00:00-07:00 0.0006 3986198 2.6973481999639748e-06
2020-05-28 18:10:00-07:00 0.0006 3510249 3.7577083563017905e-06
2020-05-28 18:20:00-07:00 0.0004 3341542 4.0940589703795433e-07
2020-05-28 18:30:00-07:00 0.0002 8695147 1.9914494947583975e-05
2020-05-28 18:40:00-07:00 0.0003 11679702 1.8331461539001595e-05
2020-05-28 18:50:00-07:00 0.0003 9306619 1.2527332321222135e-05
2020-05-28 19:00:00-07:00 0.0002 8520508 6.2268448078447915e-06
2020-05-28 19:10:00-07:00 0.0006 13715466 2.6801485272173765e-06
2020-05-28 19:20:00-07:00 0.0005 11947323 3.3029114639321295e-05
2020-05-28 19:30:00-07:00 0.0002 8514387 9.0535466616680686e-07

Here we see that average CPU time, avg_cpu_seconds, is higher in the highlighted intervals. The interval_end with the value 2020-05-28 19:20:00 has a higher CPU time, so we'll choose that interval to investigate further in the next step.

Finding which read shapes are causing high CPU usage

Digging a little deeper, we now query the READ_STATS_TOP_10MINUTE table for the interval which was picked in the preceding step. The results of this query can help indicate which read shapes cause high CPU usage.

SELECT
  read_columns,
  ROUND(avg_cpu_seconds,4) as avg_cpu_seconds,
  execution_count,
  avg_rows
FROM SPANNER_SYS.READ_STATS_TOP_10MINUTE
WHERE
  interval_end = "2020-05-28T19:20:00"
ORDER BY avg_cpu_seconds DESC LIMIT 3;

The following data as an example of the result we get back from our query, returning information about the top three read shapes ranked by avg_cpu_seconds. Note the use of ROUND in our query to restrict the output of avg_cpu_seconds to 4 decimal places.

read_columns avg_cpu_seconds execution_count avg_rows
[TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.score,globalTagAffinity.shares] 0.4192 1182 11650.42216582
[TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.likes,globalTagAffinity.score] 0.0852 4 12784
[TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.score,globalTagAffinity.ugcCount] 0.0697 1140 310.7921052631

One reason for high CPU usage might be that you start to execute a few read shapes more frequently (execution_count). Perhaps the average number of rows that the read returned has increased (avg_rows). If none of those properties of the read shape reveal anything interesting, you can examine other properties such as avg_locking_delay_seconds, avg_client_wait_seconds or avg_bytes.

Applying best practices to reduce high CPU usage

When you have gone through the preceding steps, consider whether supplying any of these best practices will help your situation.

  • The number of times Cloud Spanner executed read shapes during the interval is a good example of a metric that needs a baseline to tell you if a measurement is reasonable or a sign of a problem. Having established a baseline for the metric, you'll be able to detect and investigate the cause of any unexpected deviations from normal behavior.

  • If CPU Usage is relatively constant most of the time, but suddenly shows a spike that can be correlated with a similar sudden spike in user requests or application behavior, it might be an indication that everything is working as expected.

  • Try the following query to find the top read shapes ranked by the number of times Cloud Spanner executed for each read shape:

    SELECT interval_end, read_columns, execution_count
    FROM SPANNER_SYS.READ_STATS_TOP_MINUTE
    ORDER BY execution_count DESC
    LIMIT 10;
    
  • If you are looking for the lowest possible read latencies, especially when using multi-region instance configurations, use stale reads instead of strong reads to reduce/remove the AVG_LEADER_REFRESH_DELAY_SECONDS component of read latency.

  • If you are only doing reads, and you can express your read using a single read method, you should use that single read method. Single reads do not lock, unlike read-write transactions, therefore you should use read-only transactions over more expensive read-write transactions when you are not writing data.

What's next