Partitioned Data Manipulation Language

Partitioned Data Manipulation Language (partitioned DML) is designed for the following types of bulk updates and deletes:

  • Periodic cleanup and garbage collection. Examples are deleting old rows or setting columns to NULL.
  • Backfilling new columns with default values. An example is using an UPDATE statement to set a new column's value to False where it is currently NULL.

Partitioned DML isn't suitable for small-scale transaction processing. If you want to run a statement on a few rows, use transactional DMLs with identifiable primary keys. For more information, see Using DML.

If you need to commit a large number of blind writes, but don't require an atomic transaction, you can bulk modify your Spanner tables using batch write. For more information, see Modify data using batch writes.

You can get insights on active partitioned DML queries and their progress from statistics tables in your Spanner database. For more information, see Active partitioned DMLs statistics.

DML and partitioned DML

Spanner supports two execution modes for DML statements:

  • DML, which is suitable for transaction processing. For more information, see Using DML.

  • Partitioned DML, which enables large-scale, database-wide operations with minimal impact on concurrent transaction processing by partitioning the key space and running the statement over partitions in separate, smaller-scoped transactions. For more information, see Using partitioned DML.

The following table highlights some of the differences between the two execution modes.

DML Partitioned DML
Rows that don't match the WHERE clause might be locked. Only rows that match the WHERE clause are locked.
Transaction size limits apply. Spanner handles the transaction limits and per-transaction concurrency limits.
Statements don't need to be idempotent. A DML statement must be idempotent to ensure consistent results.
A transaction can include multiple DML and SQL statements. A partitioned transaction can include only one DML statement.
There are no restrictions on complexity of statements. Statements must be fully partitionable.
You create read-write transactions in your client code. Spanner creates the transactions.

Partitionable and idempotent

When a partitioned DML statement runs, rows in one partition don't have access to rows in other partitions, and you cannot choose how Spanner creates the partitions. Partitioning ensures scalability, but it also means that partitioned DML statements must be fully partitionable. That is, the partitioned DML statement must be expressible as the union of a set of statements, where each statement accesses a single row of the table and each statement accesses no other tables. For example, a DML statement that accesses multiple tables or performs a self-join is not partitionable. If the DML statement is not partitionable, Spanner returns the error BadUsage.

These DML statements are fully partitionable, because each statement can be applied to a single row in the table:

UPDATE Singers SET LastName = NULL WHERE LastName = '';

DELETE FROM Albums WHERE MarketingBudget > 10000;

This DML statement is not fully partitionable, because it accesses multiple tables:

# Not fully partitionable
DELETE FROM Singers WHERE
SingerId NOT IN (SELECT SingerId FROM Concerts);

Spanner might execute a partitioned DML statement multiple times against some partitions due to network-level retries. As a result, a statement might be executed more than once against a row. The statement must therefore be idempotent to yield consistent results. A statement is idempotent if executing it multiple times against a single row leads to the same result.

This DML statement is idempotent:

UPDATE Singers SET MarketingBudget = 1000 WHERE true;

This DML statement is not idempotent:

UPDATE Singers SET MarketingBudget = 1.5 * MarketingBudget WHERE true;

Row locking

Spanner acquires a lock only if a row is a candidate for update or deletion. This behavior is different from DML execution, which might read-lock rows that don't match the WHERE clause.

Execution and transactions

Whether a DML statement is partitioned or not depends on the client library method that you choose for execution. Each client library provides separate methods for DML execution and Partitioned DML execution.

You can execute only one partitioned DML statement in a call to the client library method.

Spanner does not apply the partitioned DML statements atomically across the entire table. Spanner does, however, apply partitioned DML statements atomically across each partition.

Partitioned DML does not support commit or rollback. Spanner executes and applies the DML statement immediately.

  • If you cancel the operation, Spanner cancels the executing partitions and doesn't start the remaining partitions. Spanner does not roll back any partitions that have already executed.
  • If the execution of the statement causes an error, then execution stops across all partitions and Spanner returns that error for the entire operation. Some examples of errors are violations of data type constraints, violations of UNIQUE INDEX, and violations of ON DELETE NO ACTION. Depending on the point in time when the execution failed, the statement might have successfully run against some partitions, and might never have been run against other partitions.

If the partitioned DML statement succeeds, then Spanner ran the statement at least once against each partition of the key range.

Count of modified rows

A partitioned DML statement returns a lower bound on the number of modified rows. It might not be an exact count of the number of rows modified, because there is no guarantee that Spanner counts all the modified rows.

Transaction limits

Spanner creates the partitions and transactions that it needs to execute a partitioned DML statement. Transaction limits or per-transaction concurrency limits apply, but Spanner attempts to keep the transactions within the limits.

Spanner allows a maximum of 20,000 concurrent partitioned DML statements per database.

Unsupported features

Spanner does not support some features for partitioned DML:

  • INSERT is not supported.
  • Google Cloud console: You can't execute partitioned DML statements in the Google Cloud console.
  • Query plans and profiling: The Google Cloud CLI and the client libraries don't support query plans and profiling.
  • Subqueries that read from another table, or a different row of the same table.

For complex scenarios, such as moving a table or transformations that require joins across tables, consider using the Dataflow connector.

Examples

The following code example updates the MarketingBudget column of the Albums table.

C++

You use the ExecutePartitionedDml() function to execute a partitioned DML statement.

void DmlPartitionedUpdate(google::cloud::spanner::Client client) {
  namespace spanner = ::google::cloud::spanner;
  auto result = client.ExecutePartitionedDml(
      spanner::SqlStatement("UPDATE Albums SET MarketingBudget = 100000"
                            "  WHERE SingerId > 1"));
  if (!result) throw std::move(result).status();
  std::cout << "Updated at least " << result->row_count_lower_bound
            << " row(s) [spanner_dml_partitioned_update]\n";
}

C#

You use the ExecutePartitionedUpdateAsync() method to execute a partitioned DML statement.


using Google.Cloud.Spanner.Data;
using System;
using System.Threading.Tasks;

public class UpdateUsingPartitionedDmlCoreAsyncSample
{
    public async Task<long> UpdateUsingPartitionedDmlCoreAsync(string projectId, string instanceId, string databaseId)
    {
        string connectionString = $"Data Source=projects/{projectId}/instances/{instanceId}/databases/{databaseId}";

        using var connection = new SpannerConnection(connectionString);
        await connection.OpenAsync();

        using var cmd = connection.CreateDmlCommand("UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1");
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();

        Console.WriteLine($"{rowCount} row(s) updated...");
        return rowCount;
    }
}

Go

You use the PartitionedUpdate() method to execute a partitioned DML statement.


import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/spanner"
)

func updateUsingPartitionedDML(w io.Writer, db string) error {
	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	stmt := spanner.Statement{SQL: "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "%d record(s) updated.\n", rowCount)
	return nil
}

Java

You use the executePartitionedUpdate() method to execute a partitioned DML statement.

static void updateUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records updated.\n", rowCount);
}

Node.js

You use the runPartitionedUpdate() method to execute a partitioned DML statement.

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

try {
  const [rowCount] = await database.runPartitionedUpdate({
    sql: 'UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1',
  });
  console.log(`Successfully updated ${rowCount} records.`);
} catch (err) {
  console.error('ERROR:', err);
} finally {
  // Close the database when finished.
  database.close();
}

PHP

You use the executePartitionedUpdate() method to execute a partitioned DML statement.

use Google\Cloud\Spanner\SpannerClient;

/**
 * Updates sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function update_data_with_partitioned_dml(string $instanceId, string $databaseId): void
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        'UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1'
    );

    printf('Updated %d row(s).' . PHP_EOL, $rowCount);
}

Python

You use the execute_partitioned_dml() method to execute a partitioned DML statement.

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"

spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml(
    "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

print("{} records updated.".format(row_ct))

Ruby

You use the execute_partitioned_update() method to execute a partitioned DML statement.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partition_update(
  "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

puts "#{row_count} records updated."

The following code example deletes rows from the Singers table, based on the SingerId column.

C++

void DmlPartitionedDelete(google::cloud::spanner::Client client) {
  namespace spanner = ::google::cloud::spanner;
  auto result = client.ExecutePartitionedDml(
      spanner::SqlStatement("DELETE FROM Singers WHERE SingerId > 10"));
  if (!result) throw std::move(result).status();
  std::cout << "Deleted at least " << result->row_count_lower_bound
            << " row(s) [spanner_dml_partitioned_delete]\n";
}

C#


using Google.Cloud.Spanner.Data;
using System;
using System.Threading.Tasks;

public class DeleteUsingPartitionedDmlCoreAsyncSample
{
    public async Task<long> DeleteUsingPartitionedDmlCoreAsync(string projectId, string instanceId, string databaseId)
    {
        string connectionString = $"Data Source=projects/{projectId}/instances/{instanceId}/databases/{databaseId}";

        using var connection = new SpannerConnection(connectionString);
        await connection.OpenAsync();

        using var cmd = connection.CreateDmlCommand("DELETE FROM Singers WHERE SingerId > 10");
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();

        Console.WriteLine($"{rowCount} row(s) deleted...");
        return rowCount;
    }
}

Go


import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/spanner"
)

func deleteUsingPartitionedDML(w io.Writer, db string) error {
	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	stmt := spanner.Statement{SQL: "DELETE FROM Singers WHERE SingerId > 10"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err

	}
	fmt.Fprintf(w, "%d record(s) deleted.", rowCount)
	return nil
}

Java

static void deleteUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "DELETE FROM Singers WHERE SingerId > 10";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records deleted.\n", rowCount);
}

Node.js

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

try {
  const [rowCount] = await database.runPartitionedUpdate({
    sql: 'DELETE FROM Singers WHERE SingerId > 10',
  });
  console.log(`Successfully deleted ${rowCount} records.`);
} catch (err) {
  console.error('ERROR:', err);
} finally {
  // Close the database when finished.
  database.close();
}

PHP

use Google\Cloud\Spanner\SpannerClient;

/**
 * Delete sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function delete_data_with_partitioned_dml(string $instanceId, string $databaseId): void
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        'DELETE FROM Singers WHERE SingerId > 10'
    );

    printf('Deleted %d row(s).' . PHP_EOL, $rowCount);
}

Python

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml("DELETE FROM Singers WHERE SingerId > 10")

print("{} record(s) deleted.".format(row_ct))

Ruby

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partition_update(
  "DELETE FROM Singers WHERE SingerId > 10"
)

puts "#{row_count} records deleted."

What's next?