This document describes strategies to migrate primary keys from your database tables to Spanner to avoid hotspotting issues. A hotspot is a concentration of operations on a single node, which lowers the write throughput to the node's capacity instead of benefiting from load-balancing all writes among the Spanner nodes. The use of monotonically increasing or decreasing columns as the first part of your primary key (examples include regular sequences or timestamps) is the most common cause of hot spots.
Overall strategies
In Spanner, every table that must store more than one row must have a primary key consisting of one or more columns of the table. Your table's primary key uniquely identifies each row in a table, and Spanner uses the primary key to sort the table rows. Because Spanner is highly distributed, you can use the following techniques to generate unique primary key values and reduce the risk of hot spots:
- Use a hotspot-proof auto-generated key feature that Spanner
supports (more details are described in the section Migrate auto-generated
keys):
- Use the
GENERATE_UUID()
function (GoogleSQL, PostgreSQL) to generate universally unique identifier (UUID Version 4) values with theSTRING(36)
data type. RFC 4122 defines the UUID Version 4 format. - Use a bit-reversed positive sequence (GoogleSQL, PostgreSQL). Such a sequence generates positive values with high order bits already reversed, so that they are evenly distributed across the positive 64-bit number space.
- Use the
- Swap the order of keys so that the column that contains the monotonically increasing or decreasing value is not the first key part.
- Hash the unique key and spread the writes across logical shards by creating a column that contains the hash of the unique key, and then using the hash column (or the hash column and the unique key columns together) as the primary key. This approach helps avoid hot spots because new rows are spread more evenly across the keyspace.
After you designate your primary key for your table, you can't change it later without deleting and recreating the table. For more information on how to designate your primary key, see Schema and data model - primary keys.
The following is an example DDL statement that creates a table for a database of music tracks:
GoogleSQL
CREATE TABLE Singers ( SingerId INT64 NOT NULL, FirstName STRING(1024), LastName STRING(1024), SingerInfo BYTES(MAX), BirthDate DATE, ) PRIMARY KEY(SingerId);
PostgreSQL
CREATE TABLE Singers ( SingerId bigint NOT NULL, FirstName varchar(1024), LastName varchar(1024), SingerInfo bytea, BirthDate date, PRIMARY KEY(SingerId) );
Migrate auto-generated keys
This section describes the strategies and examples for the following scenarios where the source table already uses an auto-generated key feature:
- Migrating from a source table that uses a UUID primary key.
- Migrating from a source table that uses a sequential auto-generated integer
key. Examples include, but are not limited to, sequences (that various
databases support),
IDENTITY
columns (that various databases support), PostgreSQLSERIAL
data types, and the MySQLAUTO_INCREMENT
column attribute. - Migrating from a source table that uses a bit-reversed key. Maybe the source database is Spanner, where you create key values using the guide bit-reverse sequential values.
It's important to note that in all strategies, Spanner doesn't change data it migrates from a source database. You are only changing the method to generate new data.
Migrate UUID key columns
If your source table is using a UUID column, you can convert the column to
STRING type, make the values lowercase per the
RFC 4122 specification, and use the
GENERATE_UUID()
function (GoogleSQL,
PostgreSQL) as the column default value. For
example:
GoogleSQL
CREATE TABLE UserAccessLog ( UserId STRING(36) DEFAULT (GENERATE_UUID()), ... ) PRIMARY KEY (UserId);
PostgreSQL
CREATE TABLE UserAccessLog ( UserId varchar(36) DEFAULT SPANNER.GENERATE_UUID(), ... PRIMARY KEY (UserId) );
Migrate sequential key columns
If your source database system generates sequential values for a key column, you can use a bit-reversed positive sequence (GoogleSQL, PostgreSQL) in your Spanner schema to generate values that distribute evenly across the positive 64-bit integer number space. To prevent the Spanner sequence from generating a value that overlaps with a migrated value, you can define a skipped range for it. For example, you can skip the range from 1 to 4,294,967,296 (2^32) for the following two sequences, if you know the source database only generates 32-bit integers:
GoogleSQL
CREATE SEQUENCE MyFirstSequence OPTIONS ( sequence_kind = "bit_reversed_positive", skip_range_min = 1, skip_range_max = 4294967296 ); ALTER SEQUENCE MySecondSequence SET OPTIONS ( skip_range_min = 1, skip_range_max = 4294967296 );
PostgreSQL
CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE SKIP RANGE 1 4294967296; ALTER SEQUENCE MySecondSequence SKIP RANGE 1 4294967296;
Migrate bit-reversed key columns
If you already bit-reversed your key values to avoid hot spot issues in your source database, you can also use a Spanner bit-reversed positive sequence (GoogleSQL, PostgreSQL) to continue generating such values. To avoid generating duplicate values, you can configure the sequence to start its counter from a custom number.
For example, if you reversed numbers from 1 to 1000 to generate primary key values, the Spanner sequence can start its counter from any number larger than 10,000. Optionally, you can choose a high number to leave a buffer for new writes that occur in the source database after data migration. In the following example, the counters start at 11,000:
GoogleSQL
CREATE SEQUENCE MyFirstSequence OPTIONS ( sequence_kind = "bit_reversed_positive", start_with_counter = 11000 ); ALTER SEQUENCE MySecondSequence SET OPTIONS ( start_with_counter = 11000 );
PostgreSQL
CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE START COUNTER 11000; ALTER SEQUENCE MySecondSequence RESTART COUNTER 11000;
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-07-09 UTC.