This page describes best practices for using Data Manipulation Language (DML) and Partitioned DML for GoogleSQL-dialect databases and PostgreSQL-dialect databases.
Use a WHERE
clause to reduce the scope of locks
You execute DML statements inside read-write transactions. When Spanner reads data, it
acquires shared read locks on limited portions of the row ranges that you read. Specifically, it
acquires these locks only on the columns you access. The locks can include data that does not
satisfy the filter condition of the WHERE
clause.
When Spanner modifies data using DML statements, it acquires exclusive locks on the specific data that you are modifying. In addition, it acquires shared locks in the same way as when you read data. If your request includes large row ranges, or an entire table, the shared locks might prevent other transactions from making progress in parallel.
To modify data as efficiently as possible, use a WHERE
clause that enables
Spanner to read only the necessary rows. You can achieve this goal with a filter on the
primary key, or on the key of a secondary index. The WHERE
clause limits the scope of
the shared locks and enables Spanner to process the update more efficiently.
For example, suppose that one of the musicians in the Singers
table changes their
first name, and you need to update the name in your database. You could execute the following DML
statement, but it forces Spanner to scan the entire table and acquires shared locks that
cover the entire table. As a result, Spanner must read more data than necessary, and
concurrent transactions cannot modify the data in parallel:
-- ANTI-PATTERN: SENDING AN UPDATE WITHOUT THE PRIMARY KEY COLUMN
-- IN THE WHERE CLAUSE
UPDATE Singers SET FirstName = "Marcel"
WHERE FirstName = "Marc" AND LastName = "Richards";
To make the update more efficient, include the SingerId
column in the
WHERE
clause. The SingerId
column is the only primary key column for
the Singers
table:
-- ANTI-PATTERN: SENDING AN UPDATE THAT MUST SCAN THE ENTIRE TABLE
UPDATE Singers SET FirstName = "Marcel"
WHERE FirstName = "Marc" AND LastName = "Richards"
If there is no index on FirstName
or LastName
, you need to
scan the entire table to find the target singers. If you don't want to add a secondary
index to make the update more efficient, then include the SingerId
column
in the WHERE
clause.
The SingerId
column is the only primary key column for the
Singers
table. To find it, run SELECT
in a separate,
read-only transaction prior to the update transaction:
SELECT SingerId
FROM Singers
WHERE FirstName = "Marc" AND LastName = "Richards"
-- Recommended: Including a seekable filter in the where clause
UPDATE Singers SET FirstName = "Marcel"
WHERE SingerId = 1;
Avoid using DML statements and mutations in the same transaction
Spanner buffers insertions, updates, and deletions performed using DML statements on the server-side, and the results are visible to subsequent SQL and DML statements within the same transaction. This behavior is different from the Mutation API, where Spanner buffers the mutations on the client-side and sends the mutations server-side as part of the commit operation. As a result, mutations in the commit request aren't visible to SQL or DML statements within the same transaction.
Avoid using both DML statements and mutations in the same transaction. If you use both in the same transaction, you need to account for the order of execution in your client library code. If a transaction contains both DML statements and mutations in the same request, Spanner executes the DML statements before the mutations.
For operations that are only supported using mutations, you might want to
combine DML statements and mutations in the same transaction—for example,
insert_or_update
.
If you use both, the buffer writes only at the very end of the transaction.
Use the PENDING_COMMIT_TIMESTAMP
function to write commit timestamps
GoogleSQL
You use the PENDING_COMMIT_TIMESTAMP
function to write the commit
timestamp in a DML statement. Spanner selects the commit timestamp when the transaction
commits.
PostgreSQL
You use the SPANNER.PENDING_COMMIT_TIMESTAMP()
function to write the commit
timestamp in a DML statement. Spanner selects the commit timestamp when the transaction
commits.
Partitioned DML and date and timestamp functions
Partitioned DML uses one or more transactions that might run and commit at different times. If you use the date or timestamp functions, the modified rows might contain different values.
Improve latency with Batch DML
To reduce latency, use batch DML to send multiple DML statements to Spanner within a single client-server round trip.
Batch DML can apply optimizations to groups of statements within a batch to enable faster and more efficient data updates.
Execute writes with a single request
Spanner automatically optimizes contiguous groups of similar
INSERT
,UPDATE
, orDELETE
batched statements that have different parameter values, if they don't violate data dependencies.For example, consider a scenario where you want to insert a large set of new rows into a table called
Albums
. To let Spanner optimize all the requiredINSERT
statements into a single, efficient server-side action, begin by writing an appropriate DML statement that uses SQL query parameters:INSERT INTO Albums (SingerId, AlbumId, AlbumTitle) VALUES (@Singer, @Album, @Title);
Then, send Spanner a DML batch that invokes this statement repeatedly and contiguously, with the repetitions differing only in the values you bind to the statement's three query parameters. Spanner optimizes these structurally identical DML statements into a single server-side operation before executing it.
Execute writes in parallel
Spanner automatically optimizes contiguous groups of DML statements by executing in parallel when doing so doesn't violate data dependencies. This optimization brings performance benefits to a wider set of batched DML statements because it can apply to a mix of DML statement types (
INSERT
,UPDATE
andDELETE
) and to both parameterized or non-parameterized DML statements.For example, our sample schema has the tables
Singers
,Albums
, andAccounts
.Albums
is interleaved withinSingers
and stores information about albums forSingers
. The following contiguous group of statements writes new rows to multiple tables and doesn't have complex data dependencies.INSERT INTO Singers (SingerId, Name) VALUES(1, "John Doe"); INSERT INTO Singers (SingerId, Name) VALUES(2, "Marcel Richards"); INSERT INTO Albums(SingerId, AlbumId, AlbumTitle) VALUES (1, 10001, "Album 1"); INSERT INTO Albums(SingerId, AlbumId, AlbumTitle) VALUES (1, 10002, "Album 2"); INSERT INTO Albums(SingerId, AlbumId, AlbumTitle) VALUES (2, 10001, "Album 1"); UPDATE Accounts SET Balance = 100 WHERE AccountId = @AccountId;
Spanner optimizes this group of DML statements by executing the statements in parallel. The writes are applied in order of the statements in the batch and maintains batch DML semantics if a statement fails during execution.