Spanner provides built-in tables that store statistics about reads. You can
retrieve statistics from these SPANNER_SYS.READ_STATS*
tables using SQL
statements.
When to use read statistics
Read statistics provide insight into how an application is using the database, and are useful when investigating performance issues. For example, you can check what read shapes are running against a database, how frequently they run and explain the performance characteristics of these read shapes. You can use the read statistics for your database to identify read shapes that result in high CPU usage. At a high level, read statistics will help you to understand the behavior of the traffic going into a database in terms of resource usage.
Limitations
This tool is best suited for analyzing streams of similar reads that account for most of the CPU usage. It is not good for searching for reads that were run only one time.
The CPU usage tracked in these statistics represents Spanner server side CPU usage, excluding prefetch CPU usage and some other overheads.
Statistics are collected on a best-effort basis. As a result, it is possible for statistics to be missed if there are issues with underlying systems. For example, if there is internal networking issues, it is possible for some statistics to be missed.
Availability
SPANNER_SYS
data is available only through SQL interfaces; for example:
A database's Spanner Studio page in the Google Cloud console
The
gcloud spanner databases execute-sql
commandThe
executeQuery
API
Other single read methods that Spanner provides don't support
SPANNER_SYS
.
CPU usage grouped by read shape
The following tables track the read shapes with the highest CPU usage during a specific time period:
SPANNER_SYS.READ_STATS_TOP_MINUTE
: Read shape statistics aggregated across 1 minute intervals.SPANNER_SYS.READ_STATS_TOP_10MINUTE
: Read shape statistics aggregated across 10 minute intervals.SPANNER_SYS.READ_STATS_TOP_HOUR
: Read shape statistics aggregated across 1 hour intervals.
These tables have the following properties:
Each table contains data for non-overlapping time intervals of the length the table name specifies.
Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour. After each interval, Spanner collects data from all servers and then makes the data available in the SPANNER_SYS tables shortly thereafter.
For example, at 11:59:30 AM, the most recent intervals available to SQL queries are:
- 1 minute: 11:58:00–11:58:59 AM
- 10 minute: 11:40:00–11:49:59 AM
- 1 hour: 10:00:00–10:59:59 AM
Spanner groups the statistics by read shape. If a tag is present, FPRINT is the hash of the tag. Otherwise, it is the hash of the
READ_COLUMNS
value.Each row contains statistics for all executions of a particular read shape that Spanner captures statistics for during the specified interval.
If Spanner is unable to store information about every distinct read shape run during the interval, the system prioritizes read shapes with the highest CPU usage during the specified interval.
Table schema
Column name | Type | Description |
---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval that the included read executions occurred in. |
REQUEST_TAG |
STRING |
The optional request tag for this read operation. For more information about using tags, see Troubleshooting with request tags. Statistics for multiple reads that have the same tag string are grouped in a single row with the `REQUEST_TAG` matching that tag string. |
READ_TYPE |
STRING |
Indicates if a read is a PARTITIONED_READ or
READ . A read with a
partitionToken obtained from the PartitionRead API is represented by the
PARTITIONED_READ read type and the other read APIs by
READ .
|
READ_COLUMNS |
ARRAY<STRING> |
The set of columns that were read. These are in alphabetical order. |
FPRINT |
INT64 |
The hash of the REQUEST_TAG value if present; Otherwise,
the hash of the READ_COLUMNS value. |
EXECUTION_COUNT |
INT64 |
Number of times Spanner executed the read shape during the interval. |
AVG_ROWS |
FLOAT64 |
Average number of rows that the read returned. |
AVG_BYTES |
FLOAT64 |
Average number of data bytes that the read returned, excluding transmission encoding overhead. |
AVG_CPU_SECONDS |
FLOAT64 |
Average number of Spanner server side CPU seconds executing the read, excluding prefetch CPU and other overhead. |
AVG_LOCKING_DELAY_SECONDS |
FLOAT64 |
Average number of seconds spent waiting due to locking. |
AVG_CLIENT_WAIT_SECONDS |
FLOAT64 |
Average number of seconds spent waiting due to the client not consuming data as fast as Spanner could generate it. |
AVG_LEADER_REFRESH_DELAY_SECONDS |
FLOAT64 |
Average number of seconds spent waiting to confirm with the Paxos leader that all writes have been observed. |
RUN_IN_RW_TRANSACTION_EXECUTION_COUNT |
INT64 |
The number of times the read was run as part of a read-write transaction. This column helps you determine if you can avoid lock contentions by moving the read to a read-only transaction. |
Example queries
This section includes several example SQL statements that retrieve read statistics. You can run these SQL statements using the client libraries, the gcloud spanner, or the Google Cloud console.
List the basic statistics for each read shape in a given time period
The following query returns the raw data for the top read shapes in the most recent 1-minute time intervals.
SELECT fprint,
read_columns,
execution_count,
avg_cpu_seconds,
avg_rows,
avg_bytes,
avg_locking_delay_seconds,
avg_client_wait_seconds
FROM spanner_sys.read_stats_top_minute
ORDER BY interval_end DESC LIMIT 3;
Query output
fprint | read_columns | execution_count | avg_cpu_seconds | avg_rows | avg_bytes | avg_locking_delay_seconds | avg_client_wait_seconds |
---|---|---|---|---|---|---|---|
125062082139 |
["Singers.id", "Singers.name"] |
8514387 |
0.000661355290396507 |
310.79 |
205 |
8.3232564943763752e-06 |
0 |
151238888745 |
["Singers.singerinfo"] |
3341542 |
6.5992827184280315e-05 |
12784 |
54 |
4.6859741349028595e-07 |
0 |
14105484 |
["Albums.id", "Albums.title"] |
9306619 |
0.00017855774721667873 |
1165.4 |
2964.71875 |
1.4328191393074178e-06 |
0 |
List the read shapes, ordered by highest total CPU usage
The following query returns the read shapes with the highest CPU usage in the most recent hour:
SELECT read_columns,
execution_count,
avg_cpu_seconds,
execution_count * avg_cpu_seconds AS total_cpu
FROM spanner_sys.read_stats_top_hour
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.read_stats_top_hour)
ORDER BY total_cpu DESC LIMIT 3;
Query output
read_columns | execution_count | avg_cpu_seconds | total_cpu |
---|---|---|---|
["Singers.id", "Singers.name"] |
1647 |
0.00023380297430622681 |
0.2579 |
["Albums.id", "Albums.title"] |
720 |
0.00016738889440282034 |
0.221314999999999 |
["Singers.singerinfo""] |
3223 |
0.00037764625882302246 |
0.188053 |
Aggregate statistics
SPANNER_SYS
also contains tables to store aggregate read statistics captured
by Spanner in a specific time period:
SPANNER_SYS.READ_STATS_TOTAL_MINUTE
: Aggregate statistics for all read shapes during 1 minute intervals.SPANNER_SYS.READ_STATS_TOTAL_10MINUTE
: Aggregate statistics for all read shapes during 10 minute intervals.SPANNER_SYS.READ_STATS_TOTAL_HOUR
: Aggregate statistics for all read shapes during 1 hour intervals.
Aggregate statistics tables have the following properties:
Each table contains data for non-overlapping time intervals of the length the table name specifies.
Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour.
For example, at 11:59:30 AM, the most recent intervals available to SQL queries on aggregate read statistics are:
- 1 minute: 11:58:00–11:58:59 AM
- 10 minute: 11:40:00–11:49:59 AM
- 1 hour: 10:00:00–10:59:59 AM
Each row contains statistics for all read shapes executed over the database during the specified interval, aggregated together. There is only one row per time interval.
The statistics captured in the
SPANNER_SYS.READ_STATS_TOTAL_*
tables might include read shapes that Spanner did not capture in theSPANNER_SYS.READ_STATS_TOP_*
tables.Some columns in these tables are exposed as metrics in Cloud Monitoring. The exposed metrics are:
- Rows returned count
- Read execution count
- Read CPU time
- Locking delays
- Client wait time
- Leader refresh delay
- Bytes returned count
For more information, see Spanner metrics.
Table schema
Column name | Type | Description |
---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval that the included read shape executions occurred in. |
EXECUTION_COUNT |
INT64 |
Number of times Spanner executed the read shape during the interval. |
AVG_ROWS |
FLOAT64 |
Average number of rows that the reads returned. |
AVG_BYTES |
FLOAT64 |
Average number of data bytes that the reads returned, excluding transmission encoding overhead. |
AVG_CPU_SECONDS |
FLOAT64 |
Average number of Spanner server side CPU seconds executing the read, excluding prefetch CPU and other overhead. |
AVG_LOCKING_DELAY_SECONDS |
FLOAT64 |
Average number of seconds spent waiting due to locking. |
AVG_CLIENT_WAIT_SECONDS |
FLOAT64 |
Average number of seconds spent waiting due to throttling. |
AVG_LEADER_REFRESH_DELAY_SECONDS |
FLOAT64 |
Average number of seconds spent coordinating the reads across instances in multi-region configurations. |
RUN_IN_RW_TRANSACTION_EXECUTION_COUNT |
INT64 |
The number of times that reads were run as part of read-write transactions. This column helps you determine if you can avoid lock contentions by moving some reads to read-only transactions. |
Example queries
This section includes several example SQL statements that retrieve aggregate read statistics. You can run these SQL statements using the client libraries, the gcloud spanner, or the Google Cloud console.
Find the total CPU usage across all read shapes
The following query returns the number of CPU hours consumed by read shapes in the most recent hour:
SELECT (avg_cpu_seconds * execution_count / 60 / 60)
AS total_cpu_hours
FROM spanner_sys.read_stats_total_hour
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.read_stats_total_hour);
Query output
total_cpu_hours |
---|
0.00026186111111111115 |
Find the total execution count in a given time period
The following query returns the total number of read shapes executed in the most recent complete 1-minute interval:
SELECT interval_end,
execution_count
FROM spanner_sys.read_stats_total_minute
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.read_stats_total_minute);
Query output
interval_end | execution_count |
---|---|
2020-05-28 11:02:00-07:00 |
12861966 |
Data retention
At a minimum, Spanner keeps data for each table for the following time periods:
SPANNER_SYS.READ_STATS_TOP_MINUTE
andSPANNER_SYS.READ_STATS_TOTAL_MINUTE
: Intervals covering the previous 6 hours.SPANNER_SYS.READ_STATS_TOP_10MINUTE
andSPANNER_SYS.READ_STATS_TOTAL_10MINUTE
: Intervals covering the previous 4 days.SPANNER_SYS.READ_STATS_TOP_HOUR
andSPANNER_SYS.READ_STATS_TOTAL_HOUR
: Intervals covering the previous 30 days.
Troubleshoot high CPU usage with read statistics
Spanner read statistics come in handy in cases where you need to investigate high CPU usage on your Spanner database or when you are just trying to understand the CPU-heavy read shapes on your database. Inspection of read shapes that use significant amounts of database resources gives Spanner users a potential way to reduce operational costs and possibly improve general system latencies. Using the following steps, we'll show you how to use read statistics to investigate high CPU usage in your database.
Select a time period to investigate
Start your investigation by looking for a time when your application began to experience high CPU usage. For example, in the following scenario, the issue started occurring around 5:20pm on May 28th 2020.
Gather read statistics for the selected time period
Having selected a time period to start our investigation, we'll look at
statistics gathered in the READ_STATS_TOTAL_10MINUTE
table around that time.
The results of this query might give us clues about how CPU and other read
statistics changed over that period of time. The following query returns the
aggregated read statistics from 4:30 pm
to 7:30 pm
(inclusive).
SELECT
interval_end,
ROUND(avg_cpu_seconds,4) as avg_cpu_seconds,
execution_count,
avg_locking_delay_seconds
FROM SPANNER_SYS.READ_STATS_TOTAL_10MINUTE
WHERE
interval_end >= "2020-05-28T16:30:00"
AND interval_end <= "2020-05-28T19:30:00"
ORDER BY interval_end;
The following data is an example of the result we get back from our query.
interval_end | avg_cpu_seconds | execution_count | avg_locking_delay_seconds |
---|---|---|---|
2020-05-28 16:40:00-07:00 | 0.0004 | 11111421 | 8.3232564943763752e-06 |
2020-05-28 16:50:00-07:00 | 0.0002 | 8815637 | 8.98734051776406e-05 |
2020-05-28 17:00:00-07:00 | 0.0001 | 8260215 | 6.039129247846453e-06 |
2020-05-28 17:10:00-07:00 | 0.0001 | 8514387 | 9.0535466616680686e-07 |
2020-05-28 17:20:00-07:00 | 0.0006 | 13715466 | 2.6801485272173765e-06 |
2020-05-28 17:30:00-07:00 | 0.0007 | 12861966 | 4.6859741349028595e-07 |
2020-05-28 17:40:00-07:00 | 0.0007 | 3755954 | 2.7131391918005383e-06 |
2020-05-28 17:50:00-07:00 | 0.0006 | 4248137 | 1.4328191393074178e-06 |
2020-05-28 18:00:00-07:00 | 0.0006 | 3986198 | 2.6973481999639748e-06 |
2020-05-28 18:10:00-07:00 | 0.0006 | 3510249 | 3.7577083563017905e-06 |
2020-05-28 18:20:00-07:00 | 0.0004 | 3341542 | 4.0940589703795433e-07 |
2020-05-28 18:30:00-07:00 | 0.0002 | 8695147 | 1.9914494947583975e-05 |
2020-05-28 18:40:00-07:00 | 0.0003 | 11679702 | 1.8331461539001595e-05 |
2020-05-28 18:50:00-07:00 | 0.0003 | 9306619 | 1.2527332321222135e-05 |
2020-05-28 19:00:00-07:00 | 0.0002 | 8520508 | 6.2268448078447915e-06 |
2020-05-28 19:10:00-07:00 | 0.0006 | 13715466 | 2.6801485272173765e-06 |
2020-05-28 19:20:00-07:00 | 0.0005 | 11947323 | 3.3029114639321295e-05 |
2020-05-28 19:30:00-07:00 | 0.0002 | 8514387 | 9.0535466616680686e-07 |
Here we see that average CPU time, avg_cpu_seconds
, is higher in the
highlighted intervals. The interval_end
with the value
2020-05-28 19:20:00 has a higher CPU time, so we'll choose that interval to
investigate further in the next step.
Find which read shapes are causing high CPU usage
Digging a little deeper, we now query the READ_STATS_TOP_10MINUTE
table for
the interval which was picked in the preceding step. The results of this query
can help indicate which read shapes cause high CPU usage.
SELECT
read_columns,
ROUND(avg_cpu_seconds,4) as avg_cpu_seconds,
execution_count,
avg_rows
FROM SPANNER_SYS.READ_STATS_TOP_10MINUTE
WHERE
interval_end = "2020-05-28T19:20:00"
ORDER BY avg_cpu_seconds DESC LIMIT 3;
The following data as an example of the result we get back from our query,
returning information about the top three read shapes ranked by
avg_cpu_seconds
. Note the use of ROUND
in our query to restrict
the output of avg_cpu_seconds
to 4 decimal places.
read_columns | avg_cpu_seconds | execution_count | avg_rows |
---|---|---|---|
[TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.score,globalTagAffinity.shares] 1 |
0.4192 | 1182 | 11650.42216582 |
[TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.likes,globalTagAffinity.score] |
0.0852 | 4 | 12784 |
[TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.score,globalTagAffinity.ugcCount] |
0.0697 | 1140 | 310.7921052631 |
1 _exists
is an internal field that is used to check whether a
certain row exists or not.
One reason for high CPU usage might be that you start to execute a few read
shapes more frequently (execution_count
). Perhaps the average number of rows
that the read returned has increased (avg_rows
). If none of those properties
of the read shape reveal anything interesting, you can examine other properties
such as avg_locking_delay_seconds
, avg_client_wait_seconds
or avg_bytes
.
Apply best practices to reduce high CPU usage
When you have gone through the preceding steps, consider whether supplying any of these best practices will help your situation.
The number of times Spanner executed read shapes during the interval is a good example of a metric that needs a baseline to tell you if a measurement is reasonable or a sign of a problem. Having established a baseline for the metric, you'll be able to detect and investigate the cause of any unexpected deviations from normal behavior.
If CPU Usage is relatively constant most of the time, but suddenly shows a spike that can be correlated with a similar sudden spike in user requests or application behavior, it might be an indication that everything is working as expected.
Try the following query to find the top read shapes ranked by the number of times Spanner executed for each read shape:
SELECT interval_end, read_columns, execution_count FROM SPANNER_SYS.READ_STATS_TOP_MINUTE ORDER BY execution_count DESC LIMIT 10;
If you are looking for the lowest possible read latencies, especially when using multi-region instance configurations, use stale reads instead of strong reads to reduce or remove the
AVG_LEADER_REFRESH_DELAY_SECONDS
component of read latency.If you are only doing reads, and you can express your read using a single read method, you should use that single read method. Single reads don't lock, unlike read-write transactions, therefore you should use read-only transactions over more expensive read-write transactions when you are not writing data.
What's next
- Learn about other Introspection tools.
- Learn about other information Spanner stores for each database in the database's information schema tables.
- Learn more about SQL best practices for Spanner.
- Learn more about Investigating high CPU utilization.