Cloud Spanner provides built-in tables that store statistics about transactions. You
can retrieve statistics from these SPANNER_SYS.TXN_STATS*
tables using SQL
statements.
When to use transaction statistics
Transaction statistics are useful when investigating performance issues. For example, you can check whether there are any slow running transactions that might be impacting performance or Queries Per Second (QPS) in your database. Another scenario is when your client applications are experiencing high transaction execution latency. Analyzing transaction statistics may help to discover potential bottlenecks, such as large volumes of updates to a particular column, which might be impacting latency.
Availability
SPANNER_SYS
data is available only through SQL interfaces (for example,
executeQuery
and gcloud spanner databases execute-sql
); other single read
methods Cloud Spanner provides do not support SPANNER_SYS
.
Latency statistics grouped by transaction
The following tables track the statistics for TOP
resource consuming
transactions during a specific time period.
SPANNER_SYS.TXN_STATS_TOP_MINUTE
: Transaction statistics aggregated across 1 minute intervals.SPANNER_SYS.TXN_STATS_TOP_10MINUTE
: Transaction statistics aggregated across 10 minute intervals.SPANNER_SYS.TXN_STATS_TOP_HOUR
: Transaction statistics aggregated across 1 hour intervals.
These tables have the following properties:
Each table contains data for non-overlapping time intervals of the length the table name specifies.
Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour.
For example, at 11:59:30 AM, the most recent intervals available to SQL queries are:
- 1 minute: 11:58:01–11:59:00 AM
- 10 minute: 11:40:01–11:50:00 AM
- 1 hour: 10:00:01–11:00:00 AM
Cloud Spanner groups the statistics by FPRINT (Fingerprint) of the transactions. FPRINT is the hash calculated based on the operations involved in the transaction.
Since statistics are grouped based on FPRINT, if the same transaction is executed multiple times within any time interval, we still see only one entry for that transaction in these tables.
Each row contains statistics for all executions of a particular transaction that Cloud Spanner captures statistics for during the specified interval.
If Cloud Spanner is unable to store statistics for all transactions run during the interval in these tables, the system prioritizes transactions with the highest latency, commit attempts and bytes written during the specified interval.
Table schema
Column name | Type | Description |
---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval in which the included transaction executions occurred. |
FPRINT |
INT64 |
Fingerprint is the hash calculated based on the operations involved in the transaction. INTERVAL_END and FPRINT together act as an unique key for these tables. |
READ_COLUMNS |
ARRAY<STRING> |
The set of columns that were read by the transaction. |
WRITE_CONSTRUCTIVE_COLUMNS |
ARRAY<STRING> |
The set of columns that were constructively written (i.e. assigned to new values) by the transaction. |
WRITE_DELETE_TABLES |
ARRAY<STRING> |
The set of tables that had rows deleted or replaced by the transaction. |
COMMIT_ATTEMPT_COUNT |
INT64 |
Total number of commit attempts on the transaction. |
COMMIT_FAILED_PRECONDITION_COUNT |
INT64 |
Total number of precondition failures (FAILED_PRECONDITION ) for the transaction. |
COMMIT_ABORT_COUNT |
INT64 |
Number of times the commits were aborted for the transaction. |
AVG_PARTICIPANTS |
FLOAT64 |
Average number of participants in each commit attempt. To learn more about participants, see Life of Cloud Spanner Reads & Writes. |
AVG_TOTAL_LATENCY_SECONDS |
FLOAT64 |
Average seconds taken from the first operation of the transaction to commit/abort. |
AVG_COMMIT_LATENCY_SECONDS |
FLOAT64 |
Average seconds taken to perform the commit operation. |
AVG_BYTES |
FLOAT64 |
Average number of bytes written by the transaction. |
Example queries
This section includes several example SQL statements that retrieve transaction
statistics. You can run these SQL statements using the
client libraries, the
gcloud
command-line
tool, or the Cloud Console.
Listing the basic statistics for each transaction in a given time period
The following query returns the raw data for the top transactions in the previous minute.
SELECT fprint,
read_columns,
write_constructive_columns,
write_delete_tables,
avg_total_latency_seconds,
avg_commit_latency_seconds,
avg_bytes
FROM spanner_sys.txn_stats_top_minute
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.txn_stats_top_minute);
Query output
fprint | read_columns | write_constructive_columns | write_delete_tables | avg_total_latency_seconds | avg_commit_latency_seconds | avg_bytes |
---|---|---|---|---|---|---|
40015598317 |
[] |
["Routes"] |
["Users"] |
0.006578737 |
0.006547737 |
25286 |
20524969030 |
["id", "no"] |
[] |
[] |
0.001732442 |
0.000247442 |
0 |
77848338483 |
[] |
[] |
["Cars", "Routes"] |
0.033467418 |
0.000251418 |
0 |
Listing the transactions with the highest average commit latency
The following query returns the transactions with high average commit latency in the previous hour, ordered from highest to lowest average commit latency.
SELECT fprint,
read_columns,
write_constructive_columns,
write_delete_tables,
avg_total_latency_seconds,
avg_commit_latency_seconds,
avg_bytes
FROM spanner_sys.txn_stats_top_hour
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.txn_stats_top_hour)
ORDER BY avg_commit_latency_seconds DESC;
Query output
fprint | read_columns | write_constructive_columns | write_delete_tables | avg_total_latency_seconds | avg_commit_latency_seconds | avg_bytes |
---|---|---|---|---|---|---|
40015598317 |
[] |
["Routes"] |
["Users"] |
0.006578737 |
0.006547737 |
25286 |
77848338483 |
[] |
[] |
["Cars", "Routes"] |
0.033467418 |
0.000251418 |
0 |
20524969030 |
["id", "no"] |
[] |
[] |
0.001732442 |
0.000247442 |
0 |
Finding the average latency of transactions which read certain columns
The following query returns the average latency information for transactions that read the column ADDRESS from 1 hour stats:
SELECT fprint,
read_columns,
write_constructive_columns,
write_delete_tables,
avg_total_latency_seconds
FROM spanner_sys.txn_stats_top_hour
WHERE 'ADDRESS' IN UNNEST(read_columns)
ORDER BY avg_total_latency_seconds DESC;
Query output
fprint | read_columns | write_constructive_columns | write_delete_tables | avg_total_latency_seconds |
---|---|---|---|---|
77848338483 |
["ID", "ADDRESS"] |
[] |
["Cars", "Routes"] |
0.033467418 |
40015598317 |
["ID", "NAME", "ADDRESS"] |
[] |
["Users"] |
0.006578737 |
Listing transactions by the average number of bytes modified
The following query returns the transactions sampled in the last hour, ordered by the average number of bytes modified by the transaction.
SELECT fprint,
read_columns,
write_constructive_columns,
write_delete_tables,
avg_bytes
FROM spanner_sys.txn_stats_top_hour
ORDER BY avg_bytes DESC;
Query output
fprint | read_columns | write_constructive_columns | write_delete_tables | avg_bytes |
---|---|---|---|---|
40015598317 |
[] |
[] |
["Users"] |
25286 |
77848338483 |
[] |
[] |
["Cars", "Routes"] |
12005 |
20524969030 |
["ID", "ADDRESS"] |
[] |
["Users"] |
10923 |
Aggregate statistics
SPANNER_SYS
also contains tables to store aggregate data for all the transactions for
which Cloud Spanner captured statistics in a specific time period:
SPANNER_SYS.TXN_STATS_TOTAL_MINUTE
: Aggregate statistics for all transactions during 1 minute intervalsSPANNER_SYS.TXN_STATS_TOTAL_10MINUTE
: Aggregate statistics for all transactions during 10 minute intervalsSPANNER_SYS.TXN_STATS_TOTAL_HOUR
: Aggregate statistics for all transactions during 1 hour intervals
Aggregate statistics tables have the following properties:
Each table contains data for non-overlapping time intervals of the length the table name specifies.
Intervals are based on clock times. 1 minute intervals end on the minute, 10 minute intervals end every 10 minutes starting on the hour, and 1 hour intervals end on the hour.
For example, at 11:59:30 AM, the most recent intervals available to SQL queries on aggregate transaction statistics are:
- 1 minute: 11:58:01–11:59:00 AM
- 10 minute: 11:40:01–11:50:00 AM
- 1 hour: 10:00:01–11:00:00 AM
Each row contains statistics for all transactions executed over the database during the specified interval, aggregated together. There is only one row per time interval.
The statistics captured in the
SPANNER_SYS.TXN_STATS_TOTAL_*
tables might include transactions that Cloud Spanner did not capture in theSPANNER_SYS.TXN_STATS_TOP_*
tables.
Table schema
Column name | Type | Description |
---|---|---|
INTERVAL_END |
TIMESTAMP |
End of the time interval in which this statistic was captured. |
COMMIT_ATTEMPT_COUNT |
INT64 |
Total number of commit attempts on the transaction. |
COMMIT_FAILED_PRECONDITION_COUNT |
INT64 |
Total number of precondition failures (FAILED_PRECONDITION ) for the transaction. |
COMMIT_ABORT_COUNT |
INT64 |
Number of times the commits were aborted for the transaction. |
AVG_PARTICIPANTS |
FLOAT64 |
Average number of participants in each commit attempt. To learn more about participants, see Life of Cloud Spanner Reads & Writes. |
AVG_TOTAL_LATENCY_SECONDS |
FLOAT64 |
Average seconds taken from the first operation of the transaction to commit/abort. |
AVG_COMMIT_LATENCY_SECONDS |
FLOAT64 |
Average seconds taken to perform commit operation. |
AVG_BYTES |
FLOAT64 |
Average number of bytes written by the transaction. |
Example queries
This section includes several example SQL statements that retrieve transaction
statistics. You can run these SQL statements using the
client libraries, the
gcloud
command-line
tool, or the Cloud Console.
Finding the total number of commit attempts for a transaction in a given time period
The following query returns the total number of commit attempts for all transactions in the most recent complete 1-minute interval:
SELECT interval_end,
commit_attempt_count
FROM spanner_sys.txn_stats_total_minute
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.txn_stats_total_minute)
ORDER BY interval_end;
Query output
interval_end | commit_attempt_count |
---|---|
2020-01-17 11:46:00-08:00 |
21 |
Note that there is only one row in the result because aggregated stats have only
one entry per interval_end
for any time duration.
Finding the total commit latency across all transactions
The following query returns the total commit latency across all transactions in the previous 10 minutes:
SELECT (avg_commit_latency_seconds * commit_attempt_count / 60 / 60)
AS total_commit_latency_hours
FROM spanner_sys.txn_stats_total_10minute
WHERE interval_end =
(SELECT MAX(interval_end)
FROM spanner_sys.txn_stats_total_10minute);
Query output
total_commit_latency_hours |
---|
0.8967 |
Note that there is only one row in the result because aggregated stats have only
one entry per interval_end
for any time duration.
Data retention
At a minimum, Cloud Spanner keeps data for each table for the following time periods:
SPANNER_SYS.TXN_STATS_TOP_MINUTE
andSPANNER_SYS.TXN_STATS_TOTAL_MINUTE
: Intervals covering the previous 6 hours.SPANNER_SYS.TXN_STATS_TOP_10MINUTE
andSPANNER_SYS.TXN_STATS_TOTAL_10MINUTE
: Intervals covering the previous 4 days.SPANNER_SYS.TXN_STATS_TOP_HOUR
andSPANNER_SYS.TXN_STATS_TOTAL_HOUR
: Intervals covering the previous 30 days.
Transaction statistics in Cloud Spanner give insight into how an application is using the database, and are useful when investigating performance issues. For example, you can check whether there are any slow running transactions that might be causing contention, or you can identify potential sources of high load, such as large volumes of updates to a particular column. Using the following steps, we'll show you how to use transaction statistics to investigate contentions in your database.
Troubleshooting database contentions using transaction statistics
Selecting a time period to investigate
This can be found from the application which is using Cloud Spanner.
For the purposes of this exercise, let's say that the issue started occurring at around 5:20pm on the 17th May, 2020.
Gathering transaction statistics for the selected time period
To start our investigation, we'll query the TXN_STATS_TOTAL_10MINUTE
table
around the start of the issue. The results of this query will show us how
latency and other transaction statistics changed over that period of time.
For example, the following query returns the aggregated transaction statistics
from 4:30 pm
to 7:40 pm
(inclusive).
SELECT
interval_end,
ROUND(avg_total_latency_seconds,4) as avg_total_latency_seconds,
commit_attempt_count,
commit_abort_count
FROM SPANNER_SYS.TXN_STATS_TOTAL_10MINUTE
WHERE
interval_end >= "2020-05-17T16:40:00"
AND interval_end <= "2020-05-17T19:40:00"
ORDER BY interval_end;
The following table lists example data returned from our query.
interval_end | avg_total_latency_seconds | commit_attempt_count | commit_abort_count |
---|---|---|---|
2020-05-17 16:40:00-07:00 | 0.0284 | 315691 | 5170 |
2020-05-17 16:50:00-07:00 | 0.0250 | 302124 | 3828 |
2020-05-17 17:00:00-07:00 | 0.0460 | 346087 | 11382 |
2020-05-17 17:10:00-07:00 | 0.0864 | 379964 | 33826 |
2020-05-17 17:20:00-07:00 | 0.1291 | 390343 | 52549 |
2020-05-17 17:30:00-07:00 | 0.1314 | 456455 | 76392 |
2020-05-17 17:40:00-07:00 | 0.1598 | 507774 | 121458 |
2020-05-17 17:50:00-07:00 | 0.1641 | 516587 | 115875 |
2020-05-17 18:00:00-07:00 | 0.1578 | 552711 | 122626 |
2020-05-17 18:10:00-07:00 | 0.1750 | 569460 | 154205 |
2020-05-17 18:20:00-07:00 | 0.1727 | 613571 | 160772 |
2020-05-17 18:30:00-07:00 | 0.1588 | 601994 | 143044 |
2020-05-17 18:40:00-07:00 | 0.2025 | 604211 | 170019 |
2020-05-17 18:50:00-07:00 | 0.1615 | 601622 | 135601 |
2020-05-17 19:00:00-07:00 | 0.1653 | 596804 | 129511 |
2020-05-17 19:10:00-07:00 | 0.1414 | 560023 | 112247 |
2020-05-17 19:20:00-07:00 | 0.1367 | 570864 | 100596 |
2020-05-17 19:30:00-07:00 | 0.0894 | 539729 | 65316 |
2020-05-17 19:40:00-07:00 | 0.0820 | 479151 | 40398 |
Here we see that aggregated latency and abort count is higher in the highlighted
periods. We can pick any 10 minute interval where aggregated latency
and/or abort count are high. Let's choose the interval ending at
2020-05-17T18:40:00
and use it in the next step to identify which transactions
are contributing to high latency and abort count.
Identifying transactions that are experiencing high latency
Let's now query the TXN_STATS_TOP_10MINUTE
table for the interval which was
picked in the previous step. Using this data, we can start to identify which
transactions are experiencing high latency and/or high abort count.
Execute the following query to get top performance-impacting transactions in
descending order of total latency for our example interval ending at
2020-05-17T18:40:00
.
SELECT
interval_end,
fprint,
ROUND(avg_total_latency_seconds,4) as avg_total_latency_seconds,
ROUND(avg_commit_latency_seconds,4) as avg_commit_latency_seconds,
commit_attempt_count,
commit_abort_count
FROM SPANNER_SYS.TXN_STATS_TOP_10MINUTE
WHERE
interval_end = "2020-05-17T18:40:00"
ORDER BY avg_total_latency_seconds DESC;
interval_end | fprint | avg_total_latency_seconds | avg_commit_latency_seconds | commit_attempt_count | commit_abort_count |
---|---|---|---|---|---|
2020-05-17 18:40:00-07:00 | 15185072816865185658 | 0.3508 | 0.0139 | 278802 | 142205 |
2020-05-17 18:40:00-07:00 | 15435530087434255496 | 0.1633 | 0.0142 | 129012 | 27177 |
2020-05-17 18:40:00-07:00 | 14175643543447671202 | 0.1423 | 0.0133 | 5357 | 636 |
2020-05-17 18:40:00-07:00 | 898069986622520747 | 0.0198 | 0.0158 | 6 | 0 |
2020-05-17 18:40:00-07:00 | 10510121182038036893 | 0.0168 | 0.0125 | 7 | 0 |
2020-05-17 18:40:00-07:00 | 9287748709638024175 | 0.0159 | 0.0118 | 4269 | 1 |
2020-05-17 18:40:00-07:00 | 7129109266372596045 | 0.0142 | 0.0102 | 182227 | 0 |
2020-05-17 18:40:00-07:00 | 15630228555662391800 | 0.0120 | 0.0107 | 58 | 0 |
2020-05-17 18:40:00-07:00 | 7907238229716746451 | 0.0108 | 0.0097 | 65 | 0 |
2020-05-17 18:40:00-07:00 | 10158167220149989178 | 0.0095 | 0.0047 | 3454 | 0 |
2020-05-17 18:40:00-07:00 | 9353100217060788102 | 0.0093 | 0.0045 | 725 | 0 |
2020-05-17 18:40:00-07:00 | 9521689070912159706 | 0.0093 | 0.0045 | 164 | 0 |
2020-05-17 18:40:00-07:00 | 11079878968512225881 | 0.0064 | 0.0019 | 65 | 0 |
We can see clearly that the first row (highlighted) in the preceding table shows a transaction experiencing high latency because of a high number of commit aborts. In the next step, we'll investigate further to see what is causing this issue.
Discovering what columns are involved in a transaction experiencing high latency
In this step, we'll check whether high latency transactions are operating on the
same set of columns by fetching read_columns
, write_constructive_columns
and
write_delete_tables
data for transactions with high abort count. The FPRINT
value will also be useful in the next step.
SELECT
fprint,
read_columns,
write_constructive_columns,
write_delete_tables
FROM SPANNER_SYS.TXN_STATS_TOP_10MINUTE
WHERE
interval_end = "2020-05-17T18:40:00"
ORDER BY avg_total_latency_seconds DESC LIMIT 3;
fprint | read_columns | write_constructive_columns | write_delete |
---|---|---|---|
15185072816865185658 | [TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.score,globalTagAffinity.shares] |
[TestHigherLatency._exists,TestHigherLatency.shares,TestHigherLatency_lang_status_score_index.shares] |
[] |
15435530087434255496 | [TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.likes,globalTagAffinity.score] |
[TestHigherLatency._exists,TestHigherLatency.likes,TestHigherLatency_lang_status_score_index.likes] |
[] |
14175643543447671202 | [TestHigherLatency._exists,TestHigherLatency.lang_status,TestHigherLatency.score,globalTagAffinity.ugcCount] |
[TestHigherLatency._exists,TestHigherLatency.ugcCount,TestHigherLatency_lang_status_score_index.ugcCount] |
[] |
As the output shows in the preceding table, the transactions with the highest
average total latency are reading the same columns. We can also observe some
write contention since the transactions are writing to the same column, that is
to say, TestHigherLatency._exists
.
Determining how the transaction performance has changed over time
We can see how the statistics associated with this transaction shape have changed over a period of time. Use the following query, where $FPRINT is the fingerprint of the high latency transaction from the previous step.
SELECT
interval_end,
ROUND(avg_total_latency_seconds, 3) AS latency,
ROUND(avg_commit_latency_seconds, 3) AS commit_latency,
commit_attempt_count,
commit_abort_count,
commit_failed_precondition_count,
avg_bytes
FROM SPANNER_SYS.TXN_STATS_TOP_10MINUTE
WHERE
interval_end >= "2020-05-17T16:40:00"
AND interval_end <= "2020-05-17T19:40:00"
AND fprint = $FPRINT
ORDER BY interval_end;
interval_end | latency | commit_latency | commit_attempt_count | commit_abort_count | commit_failed_precondition_count | avg_bytes |
---|---|---|---|---|---|---|
2020-05-17 16:40:00-07:00 | 0.095 | 0.010 | 53230 | 4752 | 0 | 91 |
2020-05-17 16:50:00-07:00 | 0.069 | 0.009 | 61264 | 3589 | 0 | 91 |
2020-05-17 17:00:00-07:00 | 0.150 | 0.010 | 75868 | 10557 | 0 | 91 |
2020-05-17 17:10:00-07:00 | 0.248 | 0.013 | 103151 | 30220 | 0 | 91 |
2020-05-17 17:20:00-07:00 | 0.310 | 0.012 | 130078 | 45655 | 0 | 91 |
2020-05-17 17:30:00-07:00 | 0.294 | 0.012 | 160064 | 64930 | 0 | 91 |
2020-05-17 17:40:00-07:00 | 0.315 | 0.013 | 209614 | 104949 | 0 | 91 |
2020-05-17 17:50:00-07:00 | 0.322 | 0.012 | 215682 | 100408 | 0 | 90 |
2020-05-17 18:00:00-07:00 | 0.310 | 0.012 | 230932 | 106728 | 0 | 91 |
2020-05-17 18:10:00-07:00 | 0.309 | 0.012 | 259645 | 131049 | 0 | 91 |
2020-05-17 18:20:00-07:00 | 0.315 | 0.013 | 272171 | 137910 | 0 | 90 |
2020-05-17 18:30:00-07:00 | 0.292 | 0.013 | 258944 | 121475 | 0 | 91 |
2020-05-17 18:40:00-07:00 | 0.350 | 0.013 | 278802 | 142205 | 0 | 91 |
2020-05-17 18:50:00-07:00 | 0.302 | 0.013 | 256259 | 115626 | 0 | 91 |
2020-05-17 19:00:00-07:00 | 0.315 | 0.014 | 250560 | 110662 | 0 | 91 |
2020-05-17 19:10:00-07:00 | 0.271 | 0.014 | 238384 | 99025 | 0 | 91 |
2020-05-17 19:20:00-07:00 | 0.273 | 0.014 | 219687 | 84019 | 0 | 91 |
2020-05-17 19:30:00-07:00 | 0.198 | 0.013 | 195357 | 59370 | 0 | 91 |
2020-05-17 19:40:00-07:00 | 0.181 | 0.013 | 167514 | 35705 | 0 | 91 |
In above output we can observe that total latency is high for the highlighted
period of time. And, wherever total latency is high, both commit_attempt_count
and commit_abort_coun
are also high even though commit latency (commit_latency
)
has not changed very much. Since transaction commits are getting aborted more
frequently, commit attempts are also high because of commit retries.
Conclusion
In this example, we saw that high commit abort count was the cause of high
latency. The next step is to look at the commit abort error messages received by
the application to know the reason for aborts. By inspecting logs in the
application, we see the application actually changed its workload during this
time, i.e. some other transaction shape showed up with high attempts_per_second
,
and that different transaction (maybe a nightly cleanup job) was responsible for
the additional lock conflicts.
What's next
- Learn about other Introspection tools.
- Learn about other information Cloud Spanner stores for each database in the database's information schema tables.
- Learn more about SQL best practices for Cloud Spanner.