Migrating primary key strategies

This document describes strategies to migrate primary keys from your database tables to Spanner to avoid hotspotting issues. A hotspot is a concentration of operations on a single node, which lowers the write throughput to the node's capacity instead of benefiting from load-balancing all writes among the Spanner nodes. The use of monotonically increasing or decreasing columns as the first part of your primary key (examples include regular sequences or timestamps) is the most common cause of hot spots.

Overall strategies

In Spanner, every table that must store more than one row must have a primary key consisting of one or more columns of the table. Your table's primary key uniquely identifies each row in a table, and Spanner uses the primary key to sort the table rows. Because Spanner is highly distributed, you can use the following techniques to generate unique primary key values and reduce the risk of hot spots:

  • Use a hotspot-proof auto-generated key feature that Spanner supports (more details are described in the section Migrate auto-generated keys):
    • Use the GENERATE_UUID() function (GoogleSQL, PostgreSQL) to generate universally unique identifier (UUID Version 4) values with the STRING(36) data type. RFC 4122 defines the UUID Version 4 format.
    • Use a bit-reversed positive sequence (GoogleSQL, PostgreSQL). Such a sequence generates positive values with high order bits already reversed, so that they are evenly distributed across the positive 64-bit number space.
  • Swap the order of keys so that the column that contains the monotonically increasing or decreasing value is not the first key part.
  • Hash the unique key and spread the writes across logical shards by creating a column that contains the hash of the unique key, and then using the hash column (or the hash column and the unique key columns together) as the primary key. This approach helps avoid hot spots because new rows are spread more evenly across the keyspace.

After you designate your primary key for your table, you can't change it later without deleting and recreating the table. For more information on how to designate your primary key, see Schema and data model - primary keys.

The following is an example DDL statement that creates a table for a database of music tracks:

GoogleSQL

CREATE TABLE Singers (
  SingerId   INT64 NOT NULL,
  FirstName  STRING(1024),
  LastName   STRING(1024),
  SingerInfo BYTES(MAX),
  BirthDate  DATE,
) PRIMARY KEY(SingerId);

PostgreSQL

CREATE TABLE Singers (
  SingerId   bigint NOT NULL,
  FirstName  varchar(1024),
  LastName   varchar(1024),
  SingerInfo bytea,
  BirthDate  date,
  PRIMARY KEY(SingerId)
);

Migrate auto-generated keys

This section describes the strategies and examples for the following scenarios where the source table already uses an auto-generated key feature:

  • Migrating from a source table that uses a UUID primary key.
  • Migrating from a source table that uses a sequential auto-generated integer key. Examples include, but are not limited to, sequences (that various databases support), IDENTITY columns (that various databases support), PostgreSQL SERIAL data types, and the MySQL AUTO_INCREMENT column attribute.
  • Migrating from a source table that uses a bit-reversed key. Maybe the source database is Spanner, where you create key values using the guide bit-reverse sequential values.

It's important to note that in all strategies, Spanner doesn't change data it migrates from a source database. You are only changing the method to generate new data.

Migrate UUID key columns

If your source table is using a UUID column, you can convert the column to STRING type, make the values lowercase per the RFC 4122 specification, and use the GENERATE_UUID() function (GoogleSQL, PostgreSQL) as the column default value. For example:

GoogleSQL


CREATE TABLE UserAccessLog (
UserId     STRING(36) DEFAULT (GENERATE_UUID()),
...
) PRIMARY KEY (UserId);

PostgreSQL


CREATE TABLE UserAccessLog (
UserId     varchar(36) DEFAULT SPANNER.GENERATE_UUID(),
...
PRIMARY KEY (UserId)
);

Migrate sequential key columns

If your source database system generates sequential values for a key column, you can use a bit-reversed positive sequence (GoogleSQL, PostgreSQL) in your Spanner schema to generate values that distribute evenly across the positive 64-bit integer number space. To prevent the Spanner sequence from generating a value that overlaps with a migrated value, you can define a skipped range for it. For example, you can skip the range from 1 to 4,294,967,296 (2^32) for the following two sequences, if you know the source database only generates 32-bit integers:

GoogleSQL


CREATE SEQUENCE MyFirstSequence OPTIONS (
  sequence_kind = "bit_reversed_positive",
  skip_range_min = 1,
  skip_range_max = 4294967296
);

ALTER SEQUENCE MySecondSequence SET OPTIONS (
  skip_range_min = 1,
  skip_range_max = 4294967296
);

PostgreSQL


CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE 
  SKIP RANGE 1 4294967296;

ALTER SEQUENCE MySecondSequence SKIP RANGE 1 4294967296;

Migrate bit-reversed key columns

If you already bit-reversed your key values to avoid hot spot issues in your source database, you can also use a Spanner bit-reversed positive sequence (GoogleSQL, PostgreSQL) to continue generating such values. To avoid generating duplicate values, you can configure the sequence to start its counter from a custom number.

For example, if you reversed numbers from 1 to 1000 to generate primary key values, the Spanner sequence can start its counter from any number larger than 10,000. Optionally, you can choose a high number to leave a buffer for new writes that occur in the source database after data migration. In the following example, the counters start at 11,000:

GoogleSQL


CREATE SEQUENCE MyFirstSequence OPTIONS (
  sequence_kind = "bit_reversed_positive",
  start_with_counter = 11000
);

ALTER SEQUENCE MySecondSequence SET OPTIONS (
  start_with_counter = 11000
);

PostgreSQL


CREATE SEQUENCE MyFirstSequence BIT_REVERSED_POSITIVE
  START COUNTER 11000;

ALTER SEQUENCE MySecondSequence RESTART COUNTER 11000;