Migrate from PostgreSQL to Spanner (PostgreSQL dialect)

This page explains how to migrate an open source PostgreSQL database (from now on referred to as just PostgreSQL) to a Spanner PostgreSQL-dialect database (from now on referred to as Spanner).

For information about migrating to Spanner and the GoogleSQL dialect, see Migrating from PostgreSQL to Spanner (GoogleSQL dialect).

Migration constraints

Spanner uses certain concepts differently from other enterprise database management tools, so you might need to adjust your application's architecture to take full advantage of its capabilities. You might also need to supplement Spanner with other services from Google Cloud to meet your needs.

Stored procedures and triggers

Spanner does not support running user code in the database level, so as part of the migration, business logic implemented by database-level stored procedures and triggers must be moved into the application.

Sequences

Spanner recommends using UUID Version 4 as the default method to generate primary key values. The GENERATE_UUID() function (GoogleSQL, PostgreSQL) returns UUID Version 4 values represented as STRING type.

If you need to generate integer values, Spanner supports bit-reversed positive sequences (GoogleSQL, PostgreSQL), which produce values that distribute evenly across the positive 64-bit number space. You can use these numbers to avoid hotspotting issues.

For more information, see primary key default value strategies.

Access controls

Spanner supports fine-grained access control at the table and column level. Fine-grained access control for views is not supported. For more information, see About fine-grained access control.

Migration process

Migration involves the following tasks:

  • Mapping a PostgreSQL schema to Spanner.
  • Translating SQL queries.
  • Creating a Spanner instance, database, and schema.
  • Refactoring the application to work with your Spanner database.
  • Migrating your data.
  • Verifying the new system and moving it to production status.

Step 1: Map your PostgreSQL schema to Spanner

Your first step in moving a database from open source PostgreSQL to Spanner is to determine what schema changes you must make.

Primary keys

In Spanner, every table that must store more than one row must have a primary key consisting of one or more columns of the table. Your table's primary key uniquely identifies each row in a table, and Spanner uses the primary key to sort the table rows. Because Spanner is highly distributed, it's important that you choose a primary key generation technique that scales well with your data growth. For more information, see the primary key migration strategies that we recommend.

Note that after you designate your primary key, you can't add or remove a primary key column, or change a primary key value later without deleting and recreating the table. For more information on how to designate your primary key, see Schema and data model - primary keys.

Indexes

PostgreSQL b-tree indexes are similar to secondary indexes in Spanner. In a Spanner database you use secondary indexes to index commonly searched columns for better performance, and to replace any UNIQUE constraints specified in your tables. For example, if your PostgreSQL DDL has this statement:

 CREATE TABLE customer (
    id CHAR (5) PRIMARY KEY,
    first_name VARCHAR (50),
    last_name VARCHAR (50),
    email VARCHAR (50) UNIQUE
 );

You would use this statement in your Spanner DDL:

CREATE TABLE customer (
   id VARCHAR(5) PRIMARY KEY,
   first_name VARCHAR(50),
   last_name VARCHAR(50),
   email VARCHAR(50)
   );

CREATE UNIQUE INDEX customer_emails ON customer(email);

You can find the indexes for any of your PostgreSQL tables by running the \di meta-command in psql.

After you determine the indexes that you need, add CREATE INDEX statements to create them. Follow the guidance at Secondary indexes.

Spanner implements indexes as tables, so indexing monotonically increasing columns (like those containing TIMESTAMP data) can cause a hotspot. See What DBAs need to know about Spanner, part 1: Keys and indexes for more information on methods to avoid hotspots.

Spanner implements secondary indexes in the same way as tables, so the column values to be used as index keys will have the same constraints as the primary keys of tables. This also means that indexes have the same consistency guarantees as Spanner tables.

Value lookups using secondary indexes are effectively the same as a query with a table join. You can improve the performance of queries using indexes by storing copies of the original table's column values in the secondary index using the INCLUDE clause, making it a covering index.

Spanner's query optimizer only automatically uses a secondary index when the index itself stores all the columns being queried (a covered query). To force the use of an index when querying columns in the original table, you must use a FORCE INDEX directive in the SQL statement, for example:

SELECT *
FROM MyTable /*@ FORCE_INDEX=MyTableIndex */
WHERE IndexedColumn=$1;

Here is an example DDL statement creating a secondary index for the Albums table:

CREATE INDEX AlbumsByAlbumTitle ON Albums(AlbumTitle);

If you create additional indexes after your data is loaded, populating the index might take some time. We recommend that you limit the rate at which you add them to an average of three per day. For more guidance on creating secondary indexes, see Secondary indexes. For more information on the limitations on index creation, see Schema updates.

Views

Spanner views are read-only. They can't be used to insert, update, or delete data. For more information, see Views.

Generated columns

Spanner supports generated columns. See Create and manage generated columns for syntax differences and restrictions.

Table interleaving

Spanner has a feature where you can define two tables as having a 1-many, parent-child relationship. This feature interleaves the child data rows next to their parent row in storage, effectively pre-joining the table and improving data retrieval efficiency when the parent and children are queried together.

The child table's primary key must start with the primary key column(s) of the parent table. From the child row's perspective, the parent row primary key is referred to as a foreign key. You can define up to 6 levels of parent-child relationships.

You can define ON DELETE actions for child tables to determine what happens when the parent row is deleted: either all child rows are deleted, or the parent row deletion is blocked while child rows exist.

Here is an example of creating an Albums table interleaved in the parent Singers table defined earlier:

CREATE TABLE Albums (
 SingerID      bigint,
 AlbumID       bigint,
 AlbumTitle    varchar,
 PRIMARY KEY (SingerID, AlbumID)
 )
 INTERLEAVE IN PARENT Singers ON DELETE CASCADE;

For more information, see Create interleaved tables.

Data types

The following table lists the open source PostgreSQL data types that the PostgreSQL interface for Spanner doesn't support.

Data type Use instead
bigserial,serial8 bigint, int8
bit [ (n) ] -
bit varying [ (n) ], varbit [ (n) ] -
box -
character [ (n) ], char [ (n) ] character varying
cidr text
circle -
inet text
integer, int4 bigint, int8
interval [fields] [ (p) ] bigint
json jsonb
line -
lseg -
macaddr text
money numeric, decimal
path -
pg_lsn -
point -
polygon -
realfloat4 double precision, float8
smallint, int2 bigint, int8
smallserial, serial2 bigint, int8
serial, serial4 bigint, int8
time [ (p) ] [ without time zone ] text, using HH:MM:SS.sss notation
time [ (p) ] with time zonetimetz text, using HH:MM:SS.sss+ZZZZ notation. Or use two columns.
timestamp [ (p) ] [ without time zone ] text or timestamptz
tsquery -
tsvector -
txid_snapshot -
uuid text or bytea
xml text

Step 2: Translate any SQL queries

Spanner has many of the open source PostgreSQL functions available to help reduce the conversion burden.

SQL queries can be profiled using the Spanner Studio page in the Google Cloud console to execute the query. In general, queries that perform full table scans on large tables are very expensive, and should be used sparingly. For more information on optimizing SQL queries, see the SQL best practices documentation.

Step 3: Create the Spanner instance, database, and schema

Create the instance and create a database in the PostgreSQL dialect. Then create your schema using the PostgreSQL data definition language (DDL).

Use pg_dump to create DDL statements that define the objects in your PostgreSQL database, and then modify the statements as described in the preceding sections. After you update the DDL statements, use the DDL statements to create your database in the Spanner instance.

For more information, see:

Step 4: Refactor the application

Add application logic to account for the modified schema and revised SQL queries, and to replace database-resident logic such as procedures and triggers.

Step 5: Migrate your data

There are two ways to migrate your data:

  • By using Harbourbridge.

    Harbourbridge supports both schema and data migration. You can import a pg_dump file or CSV file, or you can import through a direct connection to the open source PostgreSQL database.

  • By using the COPY FROM STDIN command.

    For details, see COPY command for importing data.