Migrating from PostgreSQL to Cloud Spanner

This page provides guidance on migrating a PostgreSQL database to Cloud Spanner. It describes several aspects of a PostgreSQL to Cloud Spanner migration:

  • Mapping a PostgreSQL schema to a Cloud Spanner schema.
  • Creating a Cloud Spanner instance, database, and schema.
  • Refactoring the application to work with your Cloud Spanner database.
  • Migrating your data.
  • Verifying the new system and moving it to production status.

This page also provides some example schemas using tables from the MusicBrainz PostgreSQL database.

Mapping your PostgreSQL schema to Cloud Spanner

Your first step in moving a database from PostgreSQL to Cloud Spanner is to determine what schema changes you must make. Use pg_dump to create Data Definition Language (DDL) statements that define the objects in your PostgreSQL database, and then modify the statements as described in the following sections. After you update the DLL statements, use the DDL statements to create your database in a Cloud Spanner instance.

Data types

The following table describes how PostgreSQL data types map to Cloud Spanner data types. Update the data types in your DDL statements from PostgreSQL data types to Cloud Spanner data types.

PostgreSQL Cloud Spanner
Bigint

int8

INT64
Bigserial

serial8

INT64
bit [ (n) ] ARRAY<BOOL>
bit varying [ (n) ]

varbit [ (n) ]

ARRAY<BOOL>
Boolean

bool

BOOL
box ARRAY<FLOAT64>
bytea BYTES
character [ (n) ]

char [ (n) ]

STRING
character varying [ (n) ]

varchar [ (n) ]

STRING
cidr STRING, using standard CIDR notation.
circle ARRAY<FLOAT64>
date DATE
double precision

float8

FLOAT64
inet STRING
Integer

int

int4

INT64
interval[ fields ] [ (p) ] INT64 if storing the value in milliseconds, or STRING if storing the value in an application-defined interval format.
json STRING
jsonb BYTES
line ARRAY<FLOAT64>
lseg ARRAY<FLOAT64>
macaddr STRING, using standard MAC address notation.
money INT64, or STRING for arbitrary precision numbers.
numeric [ (p, s) ]

decimal [ (p, s) ]

INT64, or STRING for arbitrary precision numbers.
path ARRAY<FLOAT64>
pg_lsn This data type is PostgreSQL-specific, so there isn't a Cloud Spanner equivalent.
point ARRAY<FLOAT64>
polygon ARRAY<FLOAT64>
Real

float4

FLOAT64
Smallint

int2

INT64
Smallserial

serial2

INT64
Serial

serial4

INT64
text STRING
time [ (p) ] without time zone STRING, using HH:MM:SS.sss notation.
time [ (p) ] with time zone

timetz

STRING, using HH:MM:SS.sss+ZZZZ notation.
timestamp [ (p) ] without time zone      TIMESTAMP
timestamp [ (p) ] with time zone

timestamptz

STRING, using a standard format like ISO 8601. Alternatively, you can store the time in a TIMESTAMP column and then store the time zone in a separate column.
tsquery No equivalent. Define a storage mechanism in your application instead.
tsvector No equivalent. Define a storage mechanism in your application instead.
txid_snapshot No equivalent. Define a storage mechanism in your application instead.
uuid STRING or BYTES
xml STRING

Primary keys

For tables in your Cloud Spanner database that you frequently append to, avoid using primary keys that monotonically increase or decrease, as this approach causes hotspots during writes. Instead, modify the DDL CREATE TABLE statements so that they use supported primary key strategies. Careful schema design is important, because you can't add or remove a primary key column after you create a table.

During migration, you might need to keep some existing monotonically increasing integer keys. If you need to keep these kinds of keys on a frequently updated table with a lot of operations on these keys, you can avoid creating hotspots by prefixing the existing key with a pseudo-random number. This technique causes Cloud Spanner to redistribute the rows. See What DBAs need to know about Cloud Spanner, part 1: Keys and indexes for more information on using this approach.

Foreign keys and referential integrity

Cloud Spanner doesn't have foreign key constraints or triggers. If you rely on these features, you must move this functionality to your application.

When there is a parent-child relationship between tables, and you want the records in those tables co-located for faster access, you can create interleaved tables. When you use an interleaved table, you can choose to enforce referential integrity to delete rows in the child table when the related row in the parent table is deleted. You can't delete the parent row if there are child rows and you used the ON DELETE NO ACTION clause. You also can't add a child row if the parent row doesn't exist.

To find the foreign keys on your PostgreSQL tables, query the information_schema.table_constraints view using a WHERE constraint_type = 'FOREIGN KEY' clause.

Update the CREATE TABLE statements so that they create interleaved tables as appropriate.

Indexes

PostgreSQL b-tree indexes are similar to secondary indexes in Cloud Spanner. In a Cloud Spanner database you use secondary indexes to index commonly searched columns for better performance, and to replace any unique constraints specified in your tables. For example, if your PostgreSQL DDL has this statement:

 CREATE TABLE customer (
    id CHAR (5) PRIMARY KEY,
    first_name VARCHAR (50),
    last_name VARCHAR (50),
    email VARCHAR (50) UNIQUE
 );

You would use this statement in your Cloud Spanner DDL:

CREATE TABLE customer (
   id STRING(5),
   first_name STRING(50),
   last_name STRING(50),
   email STRING(50)
   ) PRIMARY KEY (id);

CREATE UNIQUE INDEX customer_emails ON customer(email);

You can find the indexes for any of your PostgreSQL tables by running the \di meta-command in psql.

After you determine the indexes that you need, add CREATE INDEX statements to create them. Follow the guidance at Creating indexes.

Cloud Spanner implements indexes as tables, so indexing monotonically increasing columns (like those containing TIMESTAMP data) can cause a hotspot. See What DBAs need to know about Cloud Spanner, part 1: Keys and indexes for more information on methods to avoid hotspots.

Other database objects

You must create the functionality of the following objects in your application logic:

  • Views
  • Triggers
  • Stored procedures
  • User-defined functions (UDFs)
  • Columns that use serial data types as sequence generators

Keep the following tips in mind when migrating this functionality into application logic:

Creating your Cloud Spanner instance

After you update your DDL statements to conform to Cloud Spanner schema requirements, use it to create your database in Cloud Spanner.

  1. Create a Cloud Spanner instance. Follow the guidance in Instances to determine the correct regional configuration and number of nodes to support your performance goals.

  2. Create the database by using either the Google Cloud Platform Console or the gcloud command-line tool:

Console

  1. Go to the instances page
  2. Click on the name of the instance that you want to create the example database in to open the Instance details page.
  3. Click Create Database.
  4. Type a name for the database and click Continue.
  5. In the Define your database schema section, toggle the Edit as text control.
  6. Copy and paste your DDL statements into the DDL statements field.
  7. Click Create.

gcloud

  1. Install the gcloud tool.
  2. Use the gcloud spanner databases create command to create the database:
    gcloud spanner databases create DATABASE_NAME --instance=INSTANCE_NAME
    --ddl='DDL1' --ddl='DDL2'
    
  • DATABASE_NAME is the name of your database.
  • INSTANCE_NAME is the Cloud Spanner instance that you created.
  • DDLn are your modified DDL statements.

After you create the database, follow the instructions in Applying IAM Roles to create user accounts and grant permissions to the Cloud Spanner instance and database.

Refactoring the applications and data access layers

In addition to the code needed to replace the preceding database objects, you must add application logic to handle the following functionality:

  • Hashing primary keys for writes, for tables that have high write rates to sequential keys.
  • Validating data, to replace constraints that you could not migrate from the PostgreSQL schema.
  • Referential integrity checks not already covered by table interleaving or application logic, including functionality handled by triggers in the PostgreSQL schema.

We recommend using the following process when refactoring:

  1. Find all of your application code that accesses the database, and refactor it into a single module or library. That way, you know exactly what code accesses to the database, and therefore exactly what code needs to be modified.
  2. Write code that performs reads and writes on the Cloud Spanner instance, providing parallel functionality to the original code that reads and writes to PostgreSQL. During writes, update the entire row, not just the columns that have been changed, to ensure that the data in Cloud Spanner is identical to that in PostgreSQL.
  3. Write code that replaces the functionality of the database objects and functions that aren't available in Cloud Spanner.

Migrating data

After you create your Cloud Spanner database and refactor your application code, you can migrate your data to Cloud Spanner.

  1. Use the PostgreSQL COPY command to dump data to .csv files.
  2. Upload the .csv files to Cloud Storage.

    1. Create a Cloud Storage bucket.
    2. In the Cloud Storage console, click on the bucket name to open the bucket browser.
    3. Click Upload Files.
    4. Navigate to the directory containing the .csv files and select them.
    5. Click Open.
  3. Create an application to import data into Cloud Spanner. This application could use Cloud Dataflow or it could use the client libraries directly. Make sure to follow the guidance in Bulk data loading best practices to get the best performance.

Testing

Test all application functions against the Cloud Spanner instance to verify that they work as expected. Run production-level workloads to ensure the performance meets your needs. Update the number of nodes as needed to meet your performance goals.

Moving to the new system

After you complete the initial application testing, turn up the new system using one of the following processes. Offline migration is the simplest way to migrate. However, this approach makes your application unavailable for a period of time, and it provides no rollback path if you find data issues later on. To perform an offline migration:

  1. Delete all the data in the Cloud Spanner database.
  2. Shut down the application that targets the PostgreSQL database.
  3. Export all data from the PostgreSQL database and import it into the Cloud Spanner database as described in Migrating data.
  4. Start up the application that targets the Cloud Spanner database.

    Offline migration dataflow.

Live migration is possible and requires extensive changes to your application to support the migration.

Schema migration examples

These examples show the CREATE TABLE statements for several tables in the MusicBrainz PostgreSQL database schema. Each example includes both the PostgreSQL schema and the Cloud Spanner schema.

artist_credit table

PostgreSQL version:

CREATE TABLE artist_credit (
  id SERIAL,
  name VARCHAR NOT NULL,
  artist_count SMALLINT NOT NULL,
  ref_count INTEGER DEFAULT 0,
  created TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Cloud Spanner version:

CREATE TABLE artist_credit (
  hashed_id STRING(4),
  id INT64,
  name STRING(MAX) NOT NULL,
  artist_count INT64 NOT NULL,
  ref_count INT64,
  created TIMESTAMP OPTIONS (
     allow_commit_timestamp = true
  ),
) PRIMARY KEY(hashed_id, id);

recording table

PostgreSQL version:

CREATE TABLE recording (
  id SERIAL,
  gid UUID NOT NULL,
  name VARCHAR NOT NULL,
  artist_credit INTEGER NOT NULL, -- references artist_credit.id
  length INTEGER CHECK (length IS NULL OR length > 0),
  comment VARCHAR(255) NOT NULL DEFAULT '',
  edits_pending INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >= 0),
  last_updated TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  video BOOLEAN NOT NULL DEFAULT FALSE
);

Cloud Spanner version:

CREATE TABLE recording (
  hashed_id STRING(36),
  id INT64,
  gid STRING(36) NOT NULL,
  name STRING(MAX) NOT NULL,
  artist_credit_hid STRING(36) NOT NULL,
  artist_credit_id INT64 NOT NULL,
  length INT64,
  comment STRING(255) NOT NULL,
  edits_pending INT64 NOT NULL,
  last_updated TIMESTAMP OPTIONS (
     allow_commit_timestamp = true
  ),
  video BOOL NOT NULL,
) PRIMARY KEY(hashed_id, id);

recording-alias table

PostgreSQL version:

CREATE TABLE recording_alias (
  id SERIAL, --PK
  recording INTEGER NOT NULL, -- references recording.id
  name VARCHAR NOT NULL,
  locale TEXT,
  edits_pending INTEGER NOT NULL DEFAULT 0 CHECK (edits_pending >=0),
  last_updated TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
  type INTEGER, -- references recording_alias_type.id
  sort_name VARCHAR NOT NULL,
  begin_date_year SMALLINT,
  begin_date_month SMALLINT,
  begin_date_day SMALLINT,
  end_date_year SMALLINT,
  end_date_month SMALLINT,
  end_date_day SMALLINT,
  primary_for_locale BOOLEAN NOT NULL DEFAULT false,
  ended BOOLEAN NOT NULL DEFAULT FALSE
  -- CHECK constraint skipped for brevity
);

Cloud Spanner version:

CREATE TABLE recording_alias (
  hashed_id STRING(36)  NOT NULL,
  id INT64  NOT NULL,
  alias_id INT64,
  name STRING(MAX)  NOT NULL,
  locale STRING(MAX),
  edits_pending INT64  NOT NULL,
  last_updated TIMESTAMP NOT NULL OPTIONS (
     allow_commit_timestamp = true
  ),
  type INT64,
  sort_name STRING(MAX)  NOT NULL,
  begin_date_year INT64,
  begin_date_month INT64,
  begin_date_day INT64,
  end_date_year INT64,
  end_date_month INT64,
  end_date_day INT64,
  primary_for_locale BOOL NOT NULL,
  ended BOOL NOT NULL,
) PRIMARY KEY(hashed_id, id, alias_id),
 INTERLEAVE IN PARENT recording ON DELETE NO ACTION;
Оцените, насколько информация на этой странице была вам полезна:

Оставить отзыв о...

Текущей странице
Cloud Spanner Documentation