分区数据操纵语言

分区数据操纵语言(分区 DML)是专为批量更新和删除而设计的:

  • 定期清理和垃圾回收。例如,删除旧行或将列设置为 NULL
  • 使用默认值回填新列。例如,使用 UPDATE 语句将新列当前的 NULL 值设置为False

DML 和分区 DML

Cloud Spanner 支持 DML 语句的两种执行模式。

  • DML 适用于事务处理。 如需了解详情,请参阅使用 DML

  • 通过在较小范围的单独事务中对键空间进行分区并在分区上运行语句,分区 DML 实现了大规模、数据库范围的操作,同时将对并发事务处理的影响降至最低。 如需了解详情,请参阅使用分区 DML

下表着重说明了两种执行模式之间的一些差异。

DML 分区 DML
WHERE 子句不匹配的行可能会加以锁定。 仅锁定与 WHERE 子句匹配的行。
适用事务大小限制。 Cloud Spanner 处理事务限制和每个事务的并发限制。
语句不需要具有幂等性。 DML 语句必须具有幂等性,以保证一致结果。
一个事务可以包括多个 DML 和 SQL 语句。 一个分区事务只能包含一个 DML 语句。
对语句的复杂程度没有限制。 语句必须是完全可分区的
您可以在客户端代码中创建读写事务。 Cloud Spanner 创建事务。

可分区和幂等性

运行分区 DML 语句时,一个分区中的行无权访问其他分区中的行,且您无法选择 Cloud Spanner 创建分区的方式。分区确保了可伸缩性,但这也意味着分区 DML 语句必须是“完全可分区的”。也就是说,分区 DML 语句必须可表示为一组语句的并集,其中每个语句访问表的单一行,并且每个语句不访问其他表。例如,访问多个表或执行自连接的 DML 语句不可分区。如果 DML 语句不可分区,则 Cloud Spanner 返回错误 BadUsage

这些 DML 语句是完全可分区的,因为每个语句都可以应用于表中的单一行:

UPDATE Singers SET Available = TRUE WHERE Available IS NULL

DELETE FROM Concerts
WHERE DATE_DIFF(CURRENT_DATE(), ConcertDate, DAY) > 365

此 DML 语句不是完全可分区的,因为它访问多个表:

# Not fully partitionable
DELETE FROM Singers WHERE
SingerId NOT IN (SELECT SingerId FROM Concerts);

由于网络级层重试,Cloud Spanner 可能会对某些分区多次执行分区 DML 语句。因此,语句可能会对一行多次执行。所以该语句必须具有幂等性才能产生一致的结果。如果对单一行多次执行语句会产生相同的结果,则该语句具有幂等性。

此 DML 语句具有幂等性:

UPDATE Singers SET MarketingBudget = 1000 WHERE true

此 DML 语句不具有幂等性:

UPDATE Singers SET MarketingBudget = 1.5 * MarketingBudget WHERE true

锁定

仅当行是更新或删除的候选对象时,Cloud Spanner 才会获取锁定。此行为与 DML 执行不同,后者可能会对与 WHERE 子句不匹配的行进行读取锁定。

执行和事务

DML 语句是否已分区取决于您选择执行的客户端库方法。每个客户端库会为 DML 执行分区 DML 执行提供单独的方法。

在调用客户端库方法时,一次只能执行一个分区 DML 语句。

Cloud Spanner 不会在整个表中以原子方式应用分区 DML 语句。但是,Cloud Spanner 会在每个分区上以原子方式应用分区 DML 语句。

分区 DML 不支持提交或回滚。Cloud Spanner 立即执行并应用 DML 语句。

  • 如果您取消操作,则 Cloud Spanner 将取消正在执行的分区,并且不会启动其余分区。Cloud Spanner 不会回滚已执行的任何分区。
  • 如果语句的执行导致错误,则会在所有分区上停止执行,Cloud Spanner 将针对整个操作返回该错误。错误示例包括违反数据类型限制、违反 UNIQUE INDEX 和违反 ON DELETE NO ACTION。根据执行失败的时间点,语句可能已成功针对某些分区运行,并且可能永远不会针对其他分区运行。

如果分区 DML 语句成功,则 Cloud Spanner 对键范围的每个分区运行该语句至少一次。

已修改行的计数

分区 DML 语句返回已修改行数的下限。它可能不是已修改行数的精确计数,因为无法保证 Cloud Spanner 计算了所有已修改的行。

事务限制

Cloud Spanner 创建执行分区 DML 语句所需的分区和事务。系统会适用事务限制或每个事务的并发限制,但 Cloud Spanner 尝试将事务保持在限制范围内。

Cloud Spanner 允许每个数据库最多 20,000 个并发的分区 DML 语句。

不支持的功能

Cloud Spanner 不支持分区 DML 的某些功能:

  • 不支持 INSERT
  • Cloud Console:您无法在 Cloud Console 中执行分区 DML 语句。
  • 查询方案和剖析:gcloud 命令行工具和客户端库不支持查询方案和剖析。

示例

以下代码示例更新 Albums 表的 MarketingBudget 列。

C++

您可以使用 ExecutePartitionedDml() 函数来执行分区 DML 语句。

void DmlPartitionedUpdate(google::cloud::spanner::Client client) {
  namespace spanner = ::google::cloud::spanner;
  auto result = client.ExecutePartitionedDml(
      spanner::SqlStatement("UPDATE Albums SET MarketingBudget = 100000"
                            " WHERE SingerId > 1"));
  if (!result) throw std::runtime_error(result.status().message());
  std::cout << "Update was successful [spanner_dml_partitioned_update]\n";
}

C#

使用 ExecutePartitionedUpdateAsync() 方法来执行分区 DML 语句。

public static async Task UpdateUsingPartitionedDmlCoreAsync(
    string projectId,
    string instanceId,
    string databaseId)
{
    string connectionString =
        $"Data Source=projects/{projectId}/instances/{instanceId}"
        + $"/databases/{databaseId}";

    // Create connection to Cloud Spanner.
    using (var connection =
        new SpannerConnection(connectionString))
    {
        await connection.OpenAsync();

        SpannerCommand cmd = connection.CreateDmlCommand(
            "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
        );
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();
        Console.WriteLine($"{rowCount} row(s) updated...");
    }
}

Go

使用 PartitionedUpdate() 方法来执行分区 DML 语句。


import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/spanner"
)

func updateUsingPartitionedDML(w io.Writer, db string) error {
	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	stmt := spanner.Statement{SQL: "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "%d record(s) updated.\n", rowCount)
	return nil
}

Java

使用 executePartitionedUpdate() 方法来执行分区 DML 语句。

static void updateUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records updated.\n", rowCount);
}

Node.js

使用 runPartitionedUpdate() 方法来执行分区 DML 语句。

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

try {
  const [rowCount] = await database.runPartitionedUpdate({
    sql: 'UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1',
  });
  console.log(`Successfully updated ${rowCount} records.`);
} catch (err) {
  console.error('ERROR:', err);
} finally {
  // Close the database when finished.
  database.close();
}

PHP

使用 executePartitionedUpdate() 方法来执行分区 DML 语句。

use Google\Cloud\Spanner\SpannerClient;

/**
 * Updates sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function update_data_with_partitioned_dml($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
    );

    printf('Updated %d row(s).' . PHP_EOL, $rowCount);
}

Python

使用 execute_partitioned_dml() 方法来执行分区 DML 语句。

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"

spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml(
    "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

print("{} records updated.".format(row_ct))

Ruby

使用 execute_partitioned_update() 方法来执行分区 DML 语句。

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partition_update(
  "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

puts "#{row_count} records updated."

以下代码示例根据 SingerId 列从 Singers 表中删除行。

C++

void DmlPartitionedDelete(google::cloud::spanner::Client client) {
  namespace spanner = ::google::cloud::spanner;
  auto result = client.ExecutePartitionedDml(
      spanner::SqlStatement("DELETE FROM Singers WHERE SingerId > 10"));
  if (!result) throw std::runtime_error(result.status().message());
  std::cout << "Delete was successful [spanner_dml_partitioned_delete]\n";
}

C#

public static async Task DeleteUsingPartitionedDmlCoreAsync(
    string projectId,
    string instanceId,
    string databaseId)
{
    string connectionString =
        $"Data Source=projects/{projectId}/instances/{instanceId}"
        + $"/databases/{databaseId}";

    // Create connection to Cloud Spanner.
    using (var connection =
        new SpannerConnection(connectionString))
    {
        await connection.OpenAsync();

        SpannerCommand cmd = connection.CreateDmlCommand(
            "DELETE FROM Singers WHERE SingerId > 10"
        );
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();
        Console.WriteLine($"{rowCount} row(s) deleted...");
    }
}

Go


import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/spanner"
)

func deleteUsingPartitionedDML(w io.Writer, db string) error {
	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	stmt := spanner.Statement{SQL: "DELETE FROM Singers WHERE SingerId > 10"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err

	}
	fmt.Fprintf(w, "%d record(s) deleted.", rowCount)
	return nil
}

Java

static void deleteUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "DELETE FROM Singers WHERE SingerId > 10";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records deleted.\n", rowCount);
}

Node.js

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

try {
  const [rowCount] = await database.runPartitionedUpdate({
    sql: 'DELETE FROM Singers WHERE SingerId > 10',
  });
  console.log(`Successfully deleted ${rowCount} records.`);
} catch (err) {
  console.error('ERROR:', err);
} finally {
  // Close the database when finished.
  database.close();
}

PHP

use Google\Cloud\Spanner\SpannerClient;

/**
 * Delete sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function delete_data_with_partitioned_dml($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        "DELETE FROM Singers WHERE SingerId > 10"
    );

    printf('Deleted %d row(s).' . PHP_EOL, $rowCount);
}

Python

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml("DELETE FROM Singers WHERE SingerId > 10")

print("{} record(s) deleted.".format(row_ct))

Ruby

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partition_update(
  "DELETE FROM Singers WHERE SingerId > 10"
)

puts "#{row_count} records deleted."

后续步骤