Partitioned Data Manipulation Language (Partitioned DML) is designed for bulk updates and deletes:
- Periodic cleanup and garbage collection. Examples are deleting old rows or
setting columns to
NULL
. - Backfilling new columns with default values. An example is using an
UPDATE
statement to set a new column's value toFalse
where it is currentlyNULL
.
DML and Partitioned DML
Cloud Spanner supports two execution modes for DML statements.
DML is suitable for transaction processing. For more information, see Using DML.
Partitioned DML enables large-scale, database-wide operations with minimal impact on concurrent transaction processing by partitioning the key space and running the statement over partitions in separate, smaller-scoped transactions. For more information, see Using Partitioned DML.
The following table highlights some of the differences between the two execution modes.
DML | Partitioned DML |
---|---|
Rows that do not match the WHERE clause might be locked. |
Only rows that match the WHERE clause are locked. |
Transaction size limits apply. | Cloud Spanner handles the transaction limits and per-transaction concurrency limits. |
Statements do not need to be idempotent. | A DML statement must be idempotent to guarantee consistent results. |
A transaction can include multiple DML and SQL statements. | A partitioned transaction can include only one DML statement. |
There are no restrictions on complexity of statements. | Statements must be fully partitionable. |
You create read-write transactions in your client code. | Cloud Spanner creates the transactions. |
Partitionable and idempotent
When a Partitioned DML statement runs, rows in one partition do not have access
to rows in other partitions, and you cannot choose how Cloud Spanner creates
the partitions. Partitioning ensures scalability, but it also means that
Partitioned DML statements must be fully partitionable. That is, the
Partitioned DML statement must be expressible as the union of a set of
statements, where each statement accesses a single row of the table and each
statement accesses no other tables. For example, a DML statement that accesses
multiple tables or performs a self-join is not partitionable. If the DML
statement is not partitionable, Cloud Spanner returns the error BadUsage
.
These DML statements are fully partitionable, because each statement can be applied to a single row in the table:
UPDATE Singers SET Available = TRUE WHERE Available IS NULL
DELETE FROM Concerts
WHERE DATE_DIFF(CURRENT_DATE(), ConcertDate, DAY) > 365
This DML statement is not fully partitionable, because it accesses multiple tables:
# Not fully partitionable
DELETE FROM Singers WHERE
SingerId NOT IN (SELECT SingerId FROM Concerts);
Cloud Spanner might execute a Partitioned DML statement multiple times against some partitions due to network-level retries. As a result, a statement might be executed more than once against a row. The statement must therefore be idempotent to yield consistent results. A statement is idempotent if executing it multiple times against a single row leads to the same result.
This DML statement is idempotent:
UPDATE Singers SET MarketingBudget = 1000 WHERE true
This DML statement is not idempotent:
UPDATE Singers SET MarketingBudget = 1.5 * MarketingBudget WHERE true
Locking
Cloud Spanner acquires a lock only if a row is a candidate for update or
deletion. This behavior is different from
DML execution, which might read-lock
rows that do not match the WHERE
clause.
Execution and transactions
Whether a DML statement is partitioned or not depends on the client library method that you choose for execution. Each client library provides separate methods for DML execution and Partitioned DML execution.
You can execute only one Partitioned DML statement in a call to the client library method.
Cloud Spanner does not apply the Partitioned DML statements atomically across the entire table. Cloud Spanner does, however, apply Partitioned DML statements atomically across each partition.
Partitioned DML does not support commit or rollback. Cloud Spanner executes and applies the DML statement immediately.
- If you cancel the operation, Cloud Spanner cancels the executing partitions and doesn't start the remaining partitions. Cloud Spanner does not roll back any partitions that have already executed.
- If the execution of the statement causes an error, then execution stops across
all partitions and Cloud Spanner returns that error for the entire operation.
Some examples of errors are violations of data type constraints, violations of
UNIQUE INDEX
, and violations ofON DELETE NO ACTION
. Depending on the point in time when the execution failed, the statement might have successfully run against some partitions, and might never have been run against other partitions.
If the Partitioned DML statement succeeds, then Cloud Spanner ran the statement at least once against each partition of the key range.
Count of modified rows
A Partitioned DML statement returns a lower bound on the number of modified rows. It might not be an exact count of the number of rows modified, because there is no guarantee that Cloud Spanner counts all the modified rows.
Transaction limits
Cloud Spanner creates the partitions and transactions that it needs to execute a Partitioned DML statement. Transaction limits or per-transaction concurrency limits apply, but Cloud Spanner attempts to keep the transactions within the limits.
Cloud Spanner allows a maximum of 20,000 concurrent Partitioned DML statements per database.
Features that aren't supported
Cloud Spanner does not support some features for Partitioned DML:
INSERT
is not supported.- Cloud console: You can't execute Partitioned DML statements in the Cloud console.
- Query plans and profiling: The Google Cloud CLI and the client libraries do not support query plans and profiling.
- Subqueries that read from another table, or a different row of the same table.
For complex scenarios, such as moving a table or transformations that require joins across tables, consider Using the Dataflow connector.
Examples
The following code example updates the MarketingBudget
column of the Albums
table.
C++
You use the ExecutePartitionedDml()
function to execute a Partitioned DML statement.
C#
You use the ExecutePartitionedUpdateAsync()
method to execute a Partitioned DML statement.
Go
You use the PartitionedUpdate()
method to execute a Partitioned DML statement.
Java
You use the executePartitionedUpdate()
method to execute a Partitioned DML statement.
Node.js
You use the runPartitionedUpdate()
method to execute a Partitioned DML statement.
PHP
You use the executePartitionedUpdate()
method to execute a Partitioned DML statement.
Python
You use the execute_partitioned_dml()
method to execute a Partitioned DML statement.
Ruby
You use the execute_partitioned_update()
method to execute a Partitioned DML statement.
The following code example deletes rows from the Singers
table, based on the
SingerId
column.
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
What's next?
Learn how to modify data Using DML.
Learn about Data Manipulation Language (DML) best practices.
To learn about the differences between DML and mutations, see Comparing DML and Mutations
Consider Using the Dataflow connector for other data transformation scenarios.