分区数据操纵语言

分区数据操纵语言(分区 DML)专为以下类型的批量更新和删除而设计:

  • 定期清理和垃圾回收。例如,删除旧行或将列设置为 NULL
  • 使用默认值回填新列。例如,使用 UPDATE 语句将新列当前的 NULL 值设置为False

DML 和分区 DML

Spanner 支持 DML 语句的两种执行模式:

  • DML,适用于事务处理。如需了解详情,请参阅使用 DML

  • 分区 DML 支持大规模数据库级操作,同时对键空间进行分区,并在较小范围的单独事务中对分区运行语句,因而对并发事务处理的影响微乎其微。如需了解详情,请参阅使用分区 DML

下表着重说明了两种执行模式之间的一些差异。

DML 分区 DML
WHERE 子句不匹配的行可能会加以锁定。 仅锁定与 WHERE 子句匹配的行。
适用事务大小限制。 Spanner 可处理事务限制和每个事务的并发限制。
语句不需要具有幂等性。 DML 语句必须具有幂等性,以保证一致结果。
一个事务可以包括多个 DML 和 SQL 语句。 一个分区事务只能包含一个 DML 语句。
对语句的复杂程度没有限制。 语句必须是完全可分区的
您可以在客户端代码中创建读写事务。 Spanner 创建事务。

可分区和幂等性

运行分区 DML 语句时,一个分区中的行无权访问其他分区中的行,并且您无法选择 Spanner 创建分区的方式。分区可确保可伸缩性,但这也意味着分区 DML 语句必须是“完全可分区的”。也就是说,分区 DML 语句必须可表示为一组语句的并集,其中每个语句访问表的一行,并且每个语句不访问其他表。例如,访问多个表或执行自连接的 DML 语句不可分区。如果 DML 语句不可分区,Spanner 会返回错误 BadUsage

这些 DML 语句是完全可分区的,因为每个语句都可以应用于表中的单一行:

UPDATE Singers SET LastName = NULL WHERE LastName = '';

DELETE FROM Albums WHERE MarketingBudget > 10000;

此 DML 语句不是完全可分区的,因为它访问多个表:

# Not fully partitionable
DELETE FROM Singers WHERE
SingerId NOT IN (SELECT SingerId FROM Concerts);

由于网络级重试,Spanner 可能会对某些分区多次执行分区 DML 语句。因此,语句可能会对一行多次执行。所以该语句必须具有幂等性才能产生一致的结果。如果对单一行多次执行语句会产生相同的结果,则该语句具有幂等性。

此 DML 语句具有幂等性:

UPDATE Singers SET MarketingBudget = 1000 WHERE true;

此 DML 语句不具有幂等性:

UPDATE Singers SET MarketingBudget = 1.5 * MarketingBudget WHERE true;

行锁定

仅当某行是更新或删除的候选行时,Spanner 才会获取锁定。此行为与 DML 执行不同,后者可能会对与 WHERE 子句不匹配的行进行读取锁定。

执行和事务

DML 语句是否已分区取决于您选择执行的客户端库方法。每个客户端库会为 DML 执行分区 DML 执行提供单独的方法。

在调用客户端库方法时,您只能执行一个分区 DML 语句。

Spanner 不会在整个表中以原子方式应用分区 DML 语句。但是,Spanner 会在每个分区上以原子方式应用分区 DML 语句。

分区 DML 不支持提交或回滚。Spanner 会立即执行并应用 DML 语句。

  • 如果取消操作,Spanner 会取消正在执行的分区,并且不会启动其余分区。Spanner 不会回滚已执行的任何分区。
  • 如果语句的执行导致错误,则执行会在所有分区上停止,并且 Spanner 会针对整个操作返回该错误。错误示例包括违反数据类型限制、违反 UNIQUE INDEX 和违反 ON DELETE NO ACTION。根据执行失败的时间点,语句可能已成功针对某些分区运行,并且可能永远不会针对其他分区运行。

如果分区 DML 语句成功,则 Spanner 至少对键范围的每个分区运行一次该语句。

已修改行的计数

分区 DML 语句会返回已修改行数的下限。此数字可能并不等于修改后的行数,因为无法保证 Spanner 会计算所有修改后的行。

事务限制

Spanner 会创建执行分区 DML 语句所需的分区和事务。存在事务限制或每个事务的并发限制,但 Spanner 会尝试将事务保持在限制范围内。

Spanner 允许每个数据库最多 20000 个并发分区 DML 语句。

不受支持的功能

Spanner 不支持分区 DML 的某些功能:

  • 不支持 INSERT
  • 控制台:您无法在控制台中执行分区 DML 语句。
  • 查询计划和剖析:Google Cloud CLI 和客户端库不支持查询计划和剖析。
  • 从另一个表或同一表的不同行读取的子查询。

对于复杂场景(例如移动表或需要跨表联接的转换),请考虑使用 Dataflow 连接器

示例

以下代码示例更新 Albums 表的 MarketingBudget 列。

C++

您可以使用 ExecutePartitionedDml() 函数执行分区 DML 语句。

void DmlPartitionedUpdate(google::cloud::spanner::Client client) {
  namespace spanner = ::google::cloud::spanner;
  auto result = client.ExecutePartitionedDml(
      spanner::SqlStatement("UPDATE Albums SET MarketingBudget = 100000"
                            " WHERE SingerId > 1"));
  if (!result) throw std::runtime_error(result.status().message());
  std::cout << "Update was successful [spanner_dml_partitioned_update]\n";
}

C#

使用 ExecutePartitionedUpdateAsync() 方法执行分区 DML 语句。


using Google.Cloud.Spanner.Data;
using System;
using System.Threading.Tasks;

public class UpdateUsingPartitionedDmlCoreAsyncSample
{
    public async Task<long> UpdateUsingPartitionedDmlCoreAsync(string projectId, string instanceId, string databaseId)
    {
        string connectionString = $"Data Source=projects/{projectId}/instances/{instanceId}/databases/{databaseId}";

        using var connection = new SpannerConnection(connectionString);
        await connection.OpenAsync();

        using var cmd = connection.CreateDmlCommand("UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1");
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();

        Console.WriteLine($"{rowCount} row(s) updated...");
        return rowCount;
    }
}

Go

使用 PartitionedUpdate() 方法执行分区 DML 语句。


import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/spanner"
)

func updateUsingPartitionedDML(w io.Writer, db string) error {
	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	stmt := spanner.Statement{SQL: "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err
	}
	fmt.Fprintf(w, "%d record(s) updated.\n", rowCount)
	return nil
}

Java

使用 executePartitionedUpdate() 方法执行分区 DML 语句。

static void updateUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records updated.\n", rowCount);
}

Node.js

使用 runPartitionedUpdate() 方法执行分区 DML 语句。

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

try {
  const [rowCount] = await database.runPartitionedUpdate({
    sql: 'UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1',
  });
  console.log(`Successfully updated ${rowCount} records.`);
} catch (err) {
  console.error('ERROR:', err);
} finally {
  // Close the database when finished.
  database.close();
}

PHP

使用 executePartitionedUpdate() 方法执行分区 DML 语句。

use Google\Cloud\Spanner\SpannerClient;

/**
 * Updates sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function update_data_with_partitioned_dml($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        'UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1'
    );

    printf('Updated %d row(s).' . PHP_EOL, $rowCount);
}

Python

使用 execute_partitioned_dml() 方法执行分区 DML 语句。

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"

spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml(
    "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

print("{} records updated.".format(row_ct))

Ruby

使用 execute_partitioned_update() 方法执行分区 DML 语句。

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partition_update(
  "UPDATE Albums SET MarketingBudget = 100000 WHERE SingerId > 1"
)

puts "#{row_count} records updated."

以下代码示例根据 SingerId 列从 Singers 表中删除行。

C++

void DmlPartitionedDelete(google::cloud::spanner::Client client) {
  namespace spanner = ::google::cloud::spanner;
  auto result = client.ExecutePartitionedDml(
      spanner::SqlStatement("DELETE FROM Singers WHERE SingerId > 10"));
  if (!result) throw std::runtime_error(result.status().message());
  std::cout << "Delete was successful [spanner_dml_partitioned_delete]\n";
}

C#


using Google.Cloud.Spanner.Data;
using System;
using System.Threading.Tasks;

public class DeleteUsingPartitionedDmlCoreAsyncSample
{
    public async Task<long> DeleteUsingPartitionedDmlCoreAsync(string projectId, string instanceId, string databaseId)
    {
        string connectionString = $"Data Source=projects/{projectId}/instances/{instanceId}/databases/{databaseId}";

        using var connection = new SpannerConnection(connectionString);
        await connection.OpenAsync();

        using var cmd = connection.CreateDmlCommand("DELETE FROM Singers WHERE SingerId > 10");
        long rowCount = await cmd.ExecutePartitionedUpdateAsync();

        Console.WriteLine($"{rowCount} row(s) deleted...");
        return rowCount;
    }
}

Go


import (
	"context"
	"fmt"
	"io"

	"cloud.google.com/go/spanner"
)

func deleteUsingPartitionedDML(w io.Writer, db string) error {
	ctx := context.Background()
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		return err
	}
	defer client.Close()

	stmt := spanner.Statement{SQL: "DELETE FROM Singers WHERE SingerId > 10"}
	rowCount, err := client.PartitionedUpdate(ctx, stmt)
	if err != nil {
		return err

	}
	fmt.Fprintf(w, "%d record(s) deleted.", rowCount)
	return nil
}

Java

static void deleteUsingPartitionedDml(DatabaseClient dbClient) {
  String sql = "DELETE FROM Singers WHERE SingerId > 10";
  long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));
  System.out.printf("%d records deleted.\n", rowCount);
}

Node.js

// Imports the Google Cloud client library
const {Spanner} = require('@google-cloud/spanner');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = 'my-project-id';
// const instanceId = 'my-instance';
// const databaseId = 'my-database';

// Creates a client
const spanner = new Spanner({
  projectId: projectId,
});

// Gets a reference to a Cloud Spanner instance and database
const instance = spanner.instance(instanceId);
const database = instance.database(databaseId);

try {
  const [rowCount] = await database.runPartitionedUpdate({
    sql: 'DELETE FROM Singers WHERE SingerId > 10',
  });
  console.log(`Successfully deleted ${rowCount} records.`);
} catch (err) {
  console.error('ERROR:', err);
} finally {
  // Close the database when finished.
  database.close();
}

PHP

use Google\Cloud\Spanner\SpannerClient;

/**
 * Delete sample data in the database by partition with a DML statement.
 *
 * This updates the `MarketingBudget` column which must be created before
 * running this sample. You can add the column by running the `add_column`
 * sample or by running this DDL statement against your database:
 *
 *     ALTER TABLE Albums ADD COLUMN MarketingBudget INT64
 *
 * Example:
 * ```
 * update_data($instanceId, $databaseId);
 * ```
 *
 * @param string $instanceId The Spanner instance ID.
 * @param string $databaseId The Spanner database ID.
 */
function delete_data_with_partitioned_dml($instanceId, $databaseId)
{
    $spanner = new SpannerClient();
    $instance = $spanner->instance($instanceId);
    $database = $instance->database($databaseId);

    $rowCount = $database->executePartitionedUpdate(
        'DELETE FROM Singers WHERE SingerId > 10'
    );

    printf('Deleted %d row(s).' . PHP_EOL, $rowCount);
}

Python

# instance_id = "your-spanner-instance"
# database_id = "your-spanner-db-id"
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)

row_ct = database.execute_partitioned_dml("DELETE FROM Singers WHERE SingerId > 10")

print("{} record(s) deleted.".format(row_ct))

Ruby

# project_id  = "Your Google Cloud project ID"
# instance_id = "Your Spanner instance ID"
# database_id = "Your Spanner database ID"

require "google/cloud/spanner"

spanner = Google::Cloud::Spanner.new project: project_id
client  = spanner.client instance_id, database_id

row_count = client.execute_partition_update(
  "DELETE FROM Singers WHERE SingerId > 10"
)

puts "#{row_count} records deleted."

后续步骤