Data Manipulation Language best practices

Stay organized with collections Save and categorize content based on your preferences.

This page describes best practices for using Data Manipulation Language (DML) and Partitioned DML.

Use a WHERE clause to reduce the scope of locks

You execute DML statements inside read-write transactions. When Spanner reads data, it acquires shared read locks on limited portions of the row ranges that you read. Specifically, it acquires these locks only on the columns you access. The locks can include data that does not match the filter condition of the WHERE clause.

When Spanner modifies data using DML statements, it acquires exclusive locks on the specific data that you are modifying. In addition, it acquires shared locks in the same way as when you read data. If your request includes large row ranges, or an entire table, the shared locks might prevent other transactions from completing in parallel.

To modify data as efficiently as possible, use a WHERE clause that enables Spanner to read only the necessary rows. You can achieve this goal with a filter on the primary key, or on the key of a secondary index. The WHERE clause limits the scope of the shared locks and enables Spanner to process the update more efficiently.

For example, suppose that one of the musicians in the Singers table changes their first name, and you need to update the name in your database. You could execute the following DML statement, but it forces Spanner to scan the entire table and acquires shared locks that cover the entire table. As a result, Spanner must read more data than necessary, and concurrent transactions cannot modify the data in parallel:

-- ANTI-PATTERN: SENDING AN UPDATE WITHOUT THE PRIMARY KEY COLUMN
-- IN THE WHERE CLAUSE

UPDATE Singers SET FirstName = "Marcel"
WHERE FirstName = "Marc" AND LastName = "Richards";

To make the update more efficient, include the SingerId column in the WHERE clause. The SingerId column is the only primary key column for the Singers table:

-- RECOMMENDED: INCLUDING THE PRIMARY KEY COLUMN IN THE WHERE CLAUSE

UPDATE Singers SET FirstName = "Marcel"
WHERE FirstName = "Marc" AND LastName = "Richards" AND SingerId = 1;

Don't use DML statements and mutations in the same transaction

Spanner buffers insertions, updates, and deletions performed using DML statements on the server-side, and the results are visible to subsequent SQL and DML statements within the same transaction. This behavior is different from the Mutation API, where Spanner buffers the mutations on the client-side and sends the mutations server-side as part of the commit operation. As a result, mutations in the commit request are not visible to SQL or DML statements within the same transaction.

You might want to combine DML statements and mutations in the same transaction, because some operations are only supported in the Mutation API. An example is insert_or_update. If a transaction contains both DML statements and mutations in the commit request, Spanner executes the DML statements before the mutations. To avoid having to account for the order of execution in your client library code, you should use either DML statements or the mutations in a single transaction, but not both. If you use both, you should buffer writes only at the very end of the transaction.

Use the PENDING_COMMIT_TIMESTAMP function to write commit timestamps

Google Standard SQL

You use the PENDING_COMMIT_TIMESTAMP function to write the commit timestamp in a DML statement. Spanner selects the commit timestamp when the transaction commits.

PostgreSQL

You use the SPANNER.PENDING_COMMIT_TIMESTAMP() function to write the commit timestamp in a DML statement. Spanner selects the commit timestamp when the transaction commits.

Partitioned DML and date and timestamp functions

Partitioned DML uses one or more transactions that might run and commit at different times. If you use the date or timestamp functions, the modified rows might contain different values.

Use batch DML to make multiple writes with a single request

To reduce latency, use batch DML to send multiple DML statements to Spanner within a single client-server round trip.

Additionally, grouping similar INSERT, UPDATE, or DELETE statements together within batches can result in faster and more efficient data updates. Spanner automatically optimizes the way it processes contiguous, batched DML statements that differ only in parameter values, and which do not contain complex data dependencies.

For example, consider a scenario where you wish to insert a large set of new rows into a table called Albums. To let Spanner optimize all the required INSERT statements into a single, efficient server-side action, begin by writing an appropriate DML statement that makes use of SQL query parameters:

INSERT INTO Albums (SingerId, AlbumId, AlbumTitle) VALUES (@Singer, @Album, @Title);

Then, send Spanner a DML batch that invokes this statement repeatedly and contiguously, with the repetitions differing only in the values you bind to the statement's three query parameters. Spanner optimizes these structurally identical DML statements into a single server-side operation before executing it.