Partitioned Data Manipulation Language

Partitioned Data Manipulation Language (Partitioned DML) is designed for bulk updates and deletes:

  • Periodic cleanup and garbage collection. Examples are deleting old rows or setting columns to NULL.
  • Backfilling new columns with default values. An example is using an UPDATE statement to set a new column's value to False where it is currently NULL.

DML and Partitioned DML

Cloud Spanner supports two execution modes for DML statements.

  • DML is suitable for transaction processing.
  • Partitioned DML enables large-scale, database-wide operations with minimal impact on concurrent transaction processing by partitioning the key space and running the statement over partitions in separate, smaller-scoped transactions.

The following table highlights some of the differences between the two execution modes.

DML Partitioned DML
Rows that do not match the WHERE clause might be locked. Only rows that match the WHERE clause are locked.
Transaction size limits apply. Cloud Spanner handles the transaction limits and per-transaction concurrency limits.
Statements do not need to be idempotent. A DML statement must be idempotent to guarantee consistent results.
A transaction can include multiple DML and SQL statements. A partitioned transaction can include only one DML statement.
There are no restrictions on complexity of statements. Statements must be fully partitionable.
You create read-write transactions in your client code. Cloud Spanner creates the transactions.

Partitionable and idempotent

When a Partitioned DML statement runs, rows in one partition do not have access to rows in other partitions, and you cannot choose how Cloud Spanner creates the partitions. Partitioning ensures scalability, but it also means that Partitioned DML statements must be fully partitionable. That is, the Partitioned DML statement must be expressible as the union of a set of statements, where each statement accesses a single row of the table and each statement accesses no other tables. For example, a DML statement that accesses multiple tables or performs a self-join is not partitionable. If the DML statement is not partitionable, Cloud Spanner returns the error BadUsage.

These DML statements are fully partitionable, because each statement can be applied to a single row in the table:

UPDATE Singers SET Available = TRUE WHERE Available IS NULL

DELETE FROM Concerts
WHERE DATE_DIFF(CURRENT_DATE(), ConcertDate) > 365

This DML statement is not fully partitionable, because it accesses multiple tables:

# Not fully partitionable
DELETE FROM Singers WHERE
SingerId NOT IN (SELECT SingerId FROM Concerts);

Cloud Spanner might execute a Partitioned DML statement multiple times against some partitions due to network-level retries. As a result, a statement might be executed more than once against a row. The statement must therefore be idempotent to yield consistent results. A statement is idempotent if executing it multiple times against a single row leads to the same result.

This DML statement is idempotent:

UPDATE Singers SET MarketingBudget = 1000 WHERE true

This DML statement is not idempotent:

UPDATE Singers SET MarketingBudget = 1.5 * MarketingBudget WHERE true

Locking

Cloud Spanner acquires a lock only if a row is a candidate for update or deletion. This behavior is different from DML execution, which might read-lock rows that do not match the WHERE clause.

Execution and transactions

Whether a DML statement is partitioned or not depends on the client library method that you choose for execution. Each client library provides separate methods for DML execution and Partitioned DML execution.

You can execute only one Partitioned DML statement in a call to the client library method.

Cloud Spanner does not apply the Partitioned DML statements atomically across the entire table. Cloud Spanner does, however, apply Partitioned DML statements atomically across each partition.

Partitioned DML does not support commit or rollback. Cloud Spanner executes and applies the DML statement immediately.

  • If you cancel the operation, Cloud Spanner cancels the executing partitions and doesn't start the remaining partitions. Cloud Spanner does not roll back any partitions that have already executed.
  • If the execution of the statement causes an error, then execution stops across all partitions and Cloud Spanner returns that error for the entire operation. Some examples of errors are violations of data type constraints, violations of UNIQUE INDEX, and violations of ON DELETE NO ACTION. Depending on the point in time when the execution failed, the statement might have successfully run against some partitions, and might never have been run against other partitions.

If the Partitioned DML statement succeeds, then Cloud Spanner ran the statement at least once against each partition of the key range.

Count of modified rows

A Partitioned DML statement returns a lower bound on the number of modified rows. It might not be an exact count of the number of rows modified, because there is no guarantee that Cloud Spanner counts all the modified rows.

Transaction limits

Cloud Spanner creates the partitions and transactions that it needs to execute a Partitioned DML statement. Transaction limits or per-transaction concurrency limits apply, but Cloud Spanner attempts to keep the transactions within the limits.

Cloud Spanner allows a maximum of 20,000 concurrent Partitioned DML statements per database.

Features that aren't supported

Cloud Spanner does not support some features for Partitioned DML:

  • INSERT is not supported.
  • GCP Console: You can't execute Partitioned DML statements in the GCP Console.
  • Query plans and profiling: The gcloud command-line tool and the client libraries do not support query plans and profiling.

Examples

The following code example updates the MarketingBudget column of the Albums table.

C#

You use the ExecutePartitionedUpdateAsync() method to execute a Partitioned DML statement.

public static async Task UpdateUsingPartitionedDmlCoreAsync(
    string projectId,
    string instanceId,
    string databaseId)
{
    string connectionString =
        $"Data Source=projects/{projectId}/instances/{instanceId}"
        + $"/databases/{databaseId}";

    // Create connection to Cloud Spanner.
    using (var connection =
        new SpannerConnection(connectionString))
    {
        await connection.OpenAsync();

        SpannerCommand cmd = connection.CreateDmlCommand(
            "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
        );
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();
        Console.WriteLine($"{rowCount} row(s) updated...");
    }
}

Go

You use the PartitionedUpdate() method to execute a Partitioned DML statement.

func updateUsingPartitionedDML(ctx context.Context, w io.Writer, client *spanner.Client) error {
	stmt := spanner.Statement{SQL: "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "%d record(s) updated.\n", rowCount)
	return nil
}

Java

You use the executePartitionedUpdate() method to execute a Partitioned DML statement.

static void updateUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records updated.\n", rowCount);
}

Node.js

You use the runPartitionedUpdate() method to execute a Partitioned DML statement.

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

database
  .runPartitionedUpdate({
    sql: `UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1`,
  })
  .then(rowCount => {
    console.log(`Successfully updated ${rowCount} records.`);
  })
  .catch(err => {
    console.error('ERROR:', err);
  })
  .then(() => {
    // Close the database when finished.
    return database.close();
  });

PHP

You use the executePartitionedUpdate() method to execute a Partitioned DML statement.

use Google\Cloud\Spanner\SpannerClient;

/**
 * Updates sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function update_data_with_partitioned_dml($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
    );

    printf('Updated %d row(s).' . PHP_EOL, $rowCount);
}

Python

You use the execute_partitioned_dml() method to execute a Partitioned DML statement.

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"

spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml(
    "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

print("{} records updated.".format(row_ct))

Ruby

You use the execute_partitioned_update() method to execute a Partitioned DML statement.

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partitioned_update(
  "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

puts "#{row_count} records updated."

The following code example deletes rows from the Singers table, based on the SingerId column.

C#

public static async Task DeleteUsingPartitionedDmlCoreAsync(
    string projectId,
    string instanceId,
    string databaseId)
{
    string connectionString =
        $"Data Source=projects/{projectId}/instances/{instanceId}"
        + $"/databases/{databaseId}";

    // Create connection to Cloud Spanner.
    using (var connection =
        new SpannerConnection(connectionString))
    {
        await connection.OpenAsync();

        SpannerCommand cmd = connection.CreateDmlCommand(
            "DELETE Singers WHERE SingerId > 10"
        );
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();
        Console.WriteLine($"{rowCount} row(s) deleted...");
    }
}

Go

func deleteUsingPartitionedDML(ctx context.Context, w io.Writer, client *spanner.Client) error {
	stmt := spanner.Statement{SQL: "DELETE Singers WHERE SingerId > 10"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err

	}
	fmt.Fprintf(w, "%d record(s) deleted.", rowCount)
	return nil
}

Java

static void deleteUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "DELETE FROM Singers WHERE SingerId > 10";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records deleted.\n", rowCount);
}

Node.js

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

database
  .runPartitionedUpdate({
    sql: `DELETE Singers WHERE SingerId > 10`,
  })
  .then(rowCount => {
    console.log(`Successfully deleted ${rowCount} records.`);
  })
  .catch(err => {
    console.error('ERROR:', err);
  })
  .then(() => {
    // Close the database when finished.
    return database.close();
  });

PHP

use Google\Cloud\Spanner\SpannerClient;

/**
 * Delete sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function delete_data_with_partitioned_dml($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        "DELETE Singers WHERE SingerId > 10"
    );

    printf('Deleted %d row(s).' . PHP_EOL, $rowCount);
}

Python

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml(
    "DELETE Singers WHERE SingerId > 10"
)

print("{} record(s) deleted.".format(row_ct))

Ruby

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partitioned_update(
  "DELETE FROM Singers WHERE SingerId > 10"
)

puts "#{row_count} records deleted."

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Spanner Documentation