This document offers instructions to migrate primary keys from your source database tables to Spanner. You must be familiar with the information available in Primary key migration overview.
Before you begin
-
To get the permissions that you need to migrate primary keys to Spanner, ask your administrator to grant you the Cloud Spanner Database Admin (
roles/spanner.databaseAdmin
) IAM role on instance.
Migrate auto-generated sequential keys
If you are migrating from a database that uses sequential monotonic keys,
such as AUTO_INCREMENT
in MySQL, SERIAL
in PostgreSQL, or the standard IDENTITY
type in SQL Server or Oracle,
consider the following high-level migration strategy:
- In Spanner, replicate the table structure from your source database, using an integer primary key.
- For each column in Spanner that contains sequential values,
create a sequence and assign the
GET_NEXT_SEQUENCE_VALUE
(GoogleSQL, PostgreSQL) function as the default value for the column. - Migrate existing data with original keys from the source database into Spanner. Consider using the Spanner migration tool or a Dataflow template.
- Optionally, you can establish foreign key constraints for any dependent tables.
- Before you insert new data, adjust the Spanner sequence to skip the range of existing key values.
- Insert new data, allowing the sequence to generate unique keys automatically.
Sample migration workflow
The following code defines the table structure and related sequence in
Spanner using a
SEQUENCE
object and sets the object
as the default primary value of the destination table:
GoogleSQL
CREATE SEQUENCE singer_id_sequence OPTIONS ( SequenceKind = 'bit_reversed_positive' ); CREATE TABLE Singers ( SingerId INT64 DEFAULT (GET_NEXT_SEQUENCE_VALUE(SEQUENCE SingerIdSequence)), Name STRING(1024), Biography STRING(MAX), ) PRIMARY KEY (SingerId); CREATE TABLE Albums ( AlbumId INT64, SingerId INT64, AlbumName STRING(1024), SongList STRING(MAX), CONSTRAINT FK_singer_album FOREIGN KEY (SingerId) REFERENCES Singers (SingerId) ) PRIMARY KEY (AlbumId);
PostgreSQL
CREATE SEQUENCE SingerIdSequence BIT_REVERSED_POSITIVE; CREATE TABLE Singers ( SingerId BIGINT DEFAULT nextval('SingerIdSequence') PRIMARY KEY, Name VARCHAR(1024) NOT NULL, Biography TEXT ); CREATE TABLE Albums ( AlbumId BIGINT PRIMARY KEY, SingerId BIGINT, AlbumName VARCHAR(1024), SongList TEXT, CONSTRAINT FK_singer_album FOREIGN KEY (SingerId) REFERENCES Singers (SingerId) );
The bit_reversed_positive
option indicates that the values generated by the
sequence are of type INT64
, are greater than zero, and are not sequential.
As you migrate existing rows from your source database to Spanner, the primary keys remain unchanged.
For new inserts that don't specify a primary key, Spanner
automatically retrieves a new value by calling
the GET_NEXT_SEQUENCE_VALUE()
(GoogleSQL
or PostgreSQL)
function.
These values are uniformly distributed across the range [1, 263]
and there
could be possible collisions with the existing keys. To prevent this, you can
configure the sequence using ALTER_SEQUENCE
(GoogleSQL or PostgreSQL) to skip the range of
values covered by the existing keys.
Assume the singers
table was migrated from PostgreSQL, where its primary key
singer_id
is of SERIAL
type. The following PostgreSQL shows your source database DDL:
PostgreSQL
CREATE TABLE Singers ( SingerId SERIAL PRIMARY KEY, Name varchar(1024), Biography varchar );
The primary key values are monotonically increasing. After the migration,
you can retrieve the maximum value of the primary key singer_id
on Spanner.
Use the following code in Spanner:
GoogleSQL
SELECT MAX(SingerId) FROM Singers;
PostgreSQL
SELECT MAX(SingerId) FROM Singers;
Assume the returned value is 20,000. You can configure the Spanner
sequence to skip the range [1, 21000]
. The additional 1,000 serves as a
buffer to accommodate writes to the source database after the initial migration.
New keys generated in Spanner don't conflict with the range of
primary keys generated in the source PostgreSQL database.
Use the following code in Spanner:
GoogleSQL
ALTER SEQUENCE SingerIdSequence SET OPTIONS ( skip_range_min = 1, skip_range_max = 21000 );
PostgreSQL
ALTER SEQUENCE SingerIdSequence SKIP RANGE 1 21000;
Use Spanner and your source database
You can use the skip range concept to support scenarios where either Spanner or your source database generates primary keys, for example to enable replication in either direction for disaster recovery during a migration cutover.
To support this, both databases generate primary keys and the data is synchronized between them. You can configure each database to create primary keys in non-overlapping key ranges. When you define a range for your source database, you can configure Spanner sequence to skip over that range.
For example, after the migration of the music tracks application, replicate the data from PostgreSQL to Spanner to reduce the time taken to cut over.
Once you've updated and tested the application on Spanner, you can stop using your source PostgreSQL database and use Spanner, making it the system of record for updates and new primary keys. Once Spanner takes over, you can reverse the flow of data between the databases into the PostgreSQL instance.
Assume your source PostgreSQL database uses SERIAL
primary keys, which are
32-bit signed integers. Spanner primary keys are larger 64-bit
numbers. In PostgreSQL, alter the primary key column to be a 64-bit column,
or bigint
. Use the following code on your source PostgreSQL database:
PostgreSQL
ALTER TABLE Singers ALTER COLUMN SingerId TYPE bigint;
You can set a CHECK
constraint to the table in the source PostgreSQL database
to ensure the values of the SingerId
primary key are always smaller than or
equal to 231-1
.
Use the following code on your source PostgreSQL database:
PostgreSQL
ALTER TABLE Singers ADD CHECK (SingerId <= 2147483647);
In Spanner, we can alter the sequence to skip the [1, 231-1]
range.
Use the following code in Spanner:
GoogleSQL
ALTER SEQUENCE SingerIdSequence SET OPTIONS (
skip_range_min = 1,
skip_range_max = 2147483647 -- 231-1
);
PostgreSQL
ALTER SEQUENCE SingerIdSequence SKIP RANGE 1 2147483648;
Your source PostgreSQL database always generates keys in the 32-bit integer space, while Spanner keys are restricted to the 64-bit integer space, larger than all of the 32-bit integer values. This ensures both your databases can independently generate primary keys that don't conflict.
Migrate UUID key columns
UUIDv4 keys are effectively unique regardless of where they are generated. UUID keys generated elsewhere integrate with new UUID keys generated in Spanner.
Consider the following high-level strategy to migrate UUID keys to Spanner:
- Define your UUID keys in Spanner using string columns with a
default expression. Use the
GENERATE_UUID()
function (GoogleSQL, PostgreSQL). - Export the data from the source system, serializing the UUID keys as strings.
- Import the primary keys into Spanner.
- Optional: Enable foreign keys.
Here's a sample migration workflow:
In Spanner, define a UUID primary key column as a STRING
or
TEXT
type and assign GENERATE_UUID()
(GoogleSQL or PostgreSQL) as its default value. Migrate all the
data from your source database to Spanner. After migration, as
new rows are inserted, Spanner calls GENERATE_UUID()
to
generate new UUID values for the primary keys. For example, the primary key
FanClubId
gets a UUIDv4 value when a new row is inserted in the table FanClubs
.
Use the following code in Spanner:
GoogleSQL
CREATE TABLE Fanclubs ( FanClubId STRING(36) DEFAULT (GENERATE_UUID()), ClubName STRING(1024), ) PRIMARY KEY (FanClubId); INSERT INTO FanClubs (ClubName) VALUES ("SwiftFanClub");
PostgreSQL
CREATE TABLE FanClubs ( FanClubId TEXT DEFAULT spanner.generate_uuid() PRIMARY KEY, ClubName VARCHAR(1024) ); INSERT INTO FanClubs (ClubName) VALUES ('SwiftFanClub');
Migrate your own primary keys
Your application may rely on the primary key order to determine how recent the data is or to sequence newly created data. To use externally generated sequential keys in Spanner, you can create a composite key that combines a uniformly distributed value, such as a hash, as the first component and your sequential key as the second component. This way, you can preserve the sequential key values, without creating hot spots at scale. Consider the following migration workflow:
Assume you need to migrate a MySQL table students
with an
AUTO_INCREMENT
primary key to Spanner.
Use the following code in your source MySQL database:
MySQL
CREATE TABLE Students ( StudentId INT NOT NULL AUTO_INCREMENT, Info VARCHAR(2048), PRIMARY KEY (StudentId) );
In Spanner, you can add a generated column StudentIdHash
by creating a hash of
the StudentId
column.
For example:
StudentIdHash = FARM_FINGERPRINT(CAST(StudentId AS STRING))
You can use the following code in Spanner:
GoogleSQL
CREATE TABLE student ( StudentIdHash INT64 AS (FARM_FINGERPRINT(cast(StudentId as string))) STORED, StudentId INT64 NOT NULL, Info STRING(2048), ) PRIMARY KEY(StudentIdHash, StudentId);
PostgreSQL
CREATE TABLE Student ( StudentIdHash bigint GENERATED ALWAYS AS (FARM_FINGERPRINT(cast(StudentId AS varchar))) STORED, StudentId bigint NOT NULL, Info varchar(2048), PRIMARY KEY (StudentIdHash, StudentId) );