Active partitioned Data Manipulation Language (DML) provides real time progress for the partitioned DMLs that are active in your database.
Spanner provides a built-in table,
SPANNER_SYS.ACTIVE_PARTITIONED_DMLS
, that lists running partitioned DMLs
and the progress made on them.
This page describes the table in detail, show some example queries that use this table and, finally, demonstrate how to use these queries to help mitigate issues caused by active partitioned DMLs. The information on this page is applicable to GoogleSQL-dialect databases and PostgreSQL-dialect databases.
Availability
SPANNER_SYS
data is available only through SQL interfaces; for example:
A database's Spanner Studio page in the Google Cloud console
The
gcloud spanner databases execute-sql
commandThe
executeQuery
API
Other single read methods that Spanner provides don't support
SPANNER_SYS
.
ACTIVE_PARTITIONED_DMLS
SPANNER_SYS.ACTIVE_PARTITIONED_DMLS
returns a list of active partitioned DMLs
sorted by their start time.
Table schema
The following shows the table schema for SPANNER_SYS.ACTIVE_PARTITIONED_DMLS.
Column name | Type | Description |
---|---|---|
TEXT |
STRING |
The partitioned DML query statement text. |
TEXT_FINGERPRINT |
INT64 |
Fingerprint is a hash of the partitioned DML text. |
SESSION_ID |
STRING |
The ID of the session that's executing the partitioned DML. Deleting the session ID will cancel the query. |
NUM_PARTITIONS_TOTAL |
INT64 |
The total number of partitions in the partitioned DML. |
NUM_PARTITIONS_COMPLETE |
INT64 |
The number of partitions that the partitioned DML has completed. |
NUM_TRIVIAL_PARTITIONS_COMPLETE |
INT64 |
The number of complete partitions where no rows were processed. |
PROGRESS |
DOUBLE |
The progress of a partitioned DML is calculated as the number of completed non-trivial partitions divided by the total number of non-trivial partitions. |
ROWS_PROCESSED |
INT64 |
The number of rows processed so far, updated after each partition completes. |
START_TIMESTAMP . |
TIMESTAMP |
An upper bound on the start time of a partitioned DML. |
LAST_UPDATE_TIMESTAMP |
TIMESTAMP |
Last timestamp when the partitioned DML made progress. Updated after a partition completes. |
Example queries
You can run the following example SQL statements using the client libraries, the Google Cloud CLI, or the Google Cloud console.
Listing oldest running queries
The following query returns a list of running partitioned DMLs sorted by the start time of the query.
SELECT text,
session_id,
num_partitions_total,
num_partitions_complete,
num_trivial_partitions_complete,
progress,
rows_processed,
start_timestamp,
last_update_timestamp
FROM spanner_sys.active_partitioned_dmls
ORDER BY start_timestamp ASC;
text | session_id | num_partitions_total | num_partitions_complete | num_trivial_partitions_complete | progress | rows_processed | start_timestamp | last_update_timestamp |
---|---|---|---|---|---|---|---|---|
UPDATE Concerts SET VenueId = \'amazing venue\' WHERE SingerId < 900000 | 5bd37a99-200c-5d2e-9021-15d0dbbd97e6 | 27 | 15 | 3 | 50.00% | 2398654 | 2024-01-21 15:56:30.498744-08:00 | 2024-01-22 15:56:39.049799-08:00 |
UPDATE Singers SET LastName = NULL WHERE LastName = '' | 0028284f-0190-52f9-b396-aa588e034806 | 8 | 4 | 4 | 00.00% | 0 | 2024-01-22 15:55:18.498744-08:00 | 2024-01-22 15:56:28.049799-08:00 |
DELETE from Singers WHERE SingerId > 1000000 | 0071a85e-7e5c-576b-8a17-f9bc3d157eea | 8 | 4 | 3 | 20.00% | 238654 | 2024-01-22 15:56:30.498744-08:00 | 2024-01-22 15:56:19.049799-08:00 |
UPDATE Singers SET MarketingBudget = 1000 WHERE true | 036097a9-91d4-566a-a399-20c754eabdc2 | 8 | 5 | 0 | 62.50% | 238654 | 2024-01-22 15:57:47.498744-08:00 | 2024-01-22 15:57:39.049799-08:00 |
Limitations
Using the SPANNER_SYS.ACTIVE_PARTITIONED_DMLS
table has the following
limitations:
PROGRESS
,ROWS_PROCESSED
, andLAST_UPDATE_TIMESTAMP
results are incremented at completed partition boundaries so the partitioned DML might keep updating rows while the values in these three fields stay the same.If there are millions of partitions in a partitioned DML, the value in the
PROGRESS
column might not capture all incremental progress. UseNUM_PARTITIONS_COMPLETE
andNUM_TRIVIAL_PARTITIONS_COMPLETE
to refer finer granularity progress.If you cancel a partitioned DML using an RPC request, the cancelled partitioned DML might still appear in the table. If you cancel a partitioned DML using session deletion, it's removed from the table immediately. For more information, see Deleting the session ID.
Use active partitioned DML queries data to troubleshoot high CPU utilization
Query statistics and transaction statistics provide useful information when troubleshooting latency in a Spanner database. These tools provide information about the queries that have already completed. However, sometimes it's necessary to know what is running in the system. For example, consider the scenario when CPU utilization is high and you want to answer the following questions.
- How many partitioned DMLs are running at the moment?
- What are these partitioned DMLs?
- How many of those partitioned DMLs are running for a long time?
- Which session is running the query?
If you have answers for the preceding questions, you can decide to take the following action.
- Delete the session executing the query for an immediate resolution.
- Reduce the frequency of a partitioned DML.
In the following walkthrough, we examine active partitioned DMLs and determine what action, if any, to take.
Retrieve a summary of active partitioned DMLs
In our example scenario, we notice higher than normal CPU usage, so we decide to run the following query to return the count of active partitioned DMLs.
SELECT count(*) as active_count
FROM spanner_sys.active_partitioned_dmls;
The query yields the following result.
active_count |
---|
22 |
Listing the top 2 oldest running partitioned DMLs
We can then run a query to find more information about the top 2 oldest running partitioned DMLs sorted by the start time of the partitioned DML.
SELECT text,
session_id,
num_partitions_total,
num_partitions_complete,
num_trivial_partitions_complete,
progress,
rows_processed,
start_timestamp,
last_update_timestamp
FROM spanner_sys.active_partitioned_dmls
ORDER BY start_timestamp ASC LIMIT 2;
text | session_id | num_partitions_total | num_partitions_complete | num_trivial_partitions_complete | progress | rows_processed | start_timestamp | last_update_timestamp |
---|---|---|---|---|---|---|---|---|
UPDATE Concerts SET VenueId = \'amazing venue\' WHERE SingerId < 900000 | 5bd37a99-200c-5d2e-9021-15d0dbbd97e6 | 27 | 15 | 3 | 50.00% | 2398654 | 2024-01-21 15:56:30.498744-08:00 | 2024-01-22 15:56:39.049799-08:00 |
UPDATE Singers SET LastName = NULL WHERE LastName = '' | 0028284f-0190-52f9-b396-aa588e034806 | 8 | 4 | 4 | 00.00% | 0 | 2024-01-22 15:55:18.498744-08:00 | 2024-01-22 15:56:28.049799-08:00 |
Cancel an expensive query
We found a partitioned DML that has been running for days and isn't making
progress. We can therefore run the following gcloud spanner databases sessions
delete
command to delete the session using the session ID
which cancels the partitioned DML.
gcloud spanner databases sessions delete\
5bd37a99-200c-5d2e-9021-15d0dbbd97e6 \
--database=singer_db --instance=test-instance
What's next
- Learn about other Introspection tools.
- Learn about other information Spanner stores for each database in the database's information schema tables.
- Learn more about SQL best practices for Spanner.