Migrate an on-premises PostgreSQL cluster to Google Cloud Platform

This article explains how to migrate an on-premises PostgreSQL cluster to Google Cloud Platform (GCP). This approach uses PgBouncer as a connection pooler to minimize application downtime, and helps set up tools for monitoring the results. This article is designed for PostgreSQL administrators and sysadmins working in a Linux environment.

The following diagram shows a PgBouncer instance placed in front of an on-premises master.

Architecture of a PgBouncer instance placed in front of an on-premises master.

With connection pooling, clients are routed to an alternative node when the failover to GCP happens, without having to redeploy application configurations or make application-level changes.

The following diagram illustrates the migration.

Architecture of migration to GCP.

To perform the migration, you shut down the current master and then promote the head GCP replica to master. PgBouncer reroutes traffic to the new master node on GCP.

Costs

This tutorial uses the following billable components of GCP:

You can use the pricing calculator to generate a cost estimate based on your projected usage.

Objectives

  • Set up PostgreSQL in GCP.
  • Set up replication on Compute Engine.
  • Seed your data to a new instance.
  • Set up the PostgreSQL cluster on GCP.
  • Switch over to the new server.
  • Implement monitoring.

Before you begin

This tutorial uses gcloud and gsutil commands, which you can run from a Cloud Shell instance launched from the GCP Console. If you want to use gcloud and gsutil on your local workstation, install the Cloud SDK. The tutorial shows you how to run commands in Cloud Shell; if you use the Cloud SDK on your workstation, adjust the instructions accordingly.

  1. Sign in to your Google Account.

    If you don't already have one, sign up for a new account.

  2. Select or create a GCP project.

    Go to the project selector page

  3. Make sure that billing is enabled for your Google Cloud Platform project. Learn how to enable billing.

  4. Enable the Compute Engine API.

    Enable the API

Setting up your environment

To begin, you perform the following tasks:

  • Set up PostgreSQL in GCP.
  • Set up replication on Compute Engine.

Then, you'll seed the data to a new instance before you start replicating the master.

Set up PostgreSQL in GCP

Follow the steps in How to Set Up PostgreSQL for High Availability and Replication with Hot Standby to set up PostgreSQL on GCP in just a few minutes. You configure PostgreSQL on an Ubuntu virtual machine instance on Compute Engine.

Set up replication on Compute Engine

Follow the steps in How to Set Up PostgreSQL on Google Compute Engine to configure PostgreSQL to run in Hot Standby mode on Compute Engine. You'll use two Compute Engine instances. One instance will run the primary PostgreSQL server and the other instance will run the standby server.

Although the properties to configure PostgreSQL differ between a master and subordinate instance, the property files should be identical to allow for seamless failovers.

A subordinate instance in a PostgreSQL cluster is denoted by the presence of the recovery.conf file.

In most cases, it's important to separate the data directory for the database from the boot disk. This example stores database files under a mount at /database.

Use the following commands to modify the postgresql.conf file on the master instance to set up replication:

wal_level = 'hot_standby'
archive_mode = on
archive_command = 'test ! -f /postgresql/archivedir/%f && cp %p /postgresql/archivedir/%f'
max_wal_senders = 3
listen_addresses = '*'
wal_keep_segments = 8

Use the following commands to modify the postgresql.conf file on the replica:

hot_standby = on
standby_mode = on
primary_conninfo = 'host=${PRIMARY_IP} port=5432 user=repuser'

You can offset some of the load on the master if you send Read requests to the replica.

Seed the data

Because the master database contains a capped transaction log, most PostgreSQL migrations require the data to be seeded to a new instance before you can start replicating the master. You can seed the data in one of the following ways:

  • Dump a single database into a script or archive file using Pg_dump.
  • Take a binary copy of a running database cluster using Pg_basebackup.
  • Copy the data folder to the replica using rsync.
  • Restore a previous backup to the replica.

Of these options, we recommend that you restore a previous backup to the replica. This solution does not cause any performance hit while large volumes of data are being transferred and the current cluster can continue to operate as normal.

After the initial seeding of the database, you can use the rsync command to feed changes to the replica that have occurred since the backup; the command syncs the data directories between the two instances. This step is important if the backup has fallen too far behind the master to catch up through normal replication.

Setting up the PostgreSQL cluster on GCP

You can create the PostgreSQL cluster using cascade replication. First, migrate the database, as shown in the following diagram.

Architecture of your current environment migrated to GCP.

Migrate the database

  1. Take a full backup from the running master server (label could be any label):

    echo "select pg_start_backup('label',true);" |sudo su - postgres -c psql
    sudo tar cvfz postgresql_AAAAMMDD.tar.gz $PGDATA
    

    $PGDATA is the main data directory for PostgreSQL.

  2. Create a bucket in your GCP project named gs://pg-repo/.

  3. Transfer the backup to the bucket you just created:

    master$ gsutil cp postgresql_AAAAMMDD.tar.gz gs://pg-repo/
    
  4. Transfer the backup file to the GCP master:

    new_master$ gsutil cp gs://pg-repo/postgresql_AAAAMMDD.tar.gz
    
  5. Restore the backup file into the GCP master:

    new_master$ (cd / ; tar xvf postgresql_AAAAMMDD.tar.gz)
    
  6. Create a recovery.conf file in the $PG_DATA directory that includes the following:

    standby_mode     = 'on'
    primary_conninfo = 'port=5432 host=${running_master_ip} user=${replication_user} application_name=cloud_master'
    trigger_file     = '/tmp/failover.postgresql.5432'
    
  7. Start the PostgreSQL service:

    sudo service postgresql start
    
  8. Wait until the GCP master server syncs with the running master. In the log, you can see something like:

    tail -f /var/log/postgresql/postgresql*log
    ...
    2018-09-22 17:59:54 UTC LOG:  consistent recovery state reached at 0/230000F0
    2018-09-22 17:59:54 UTC LOG:  database system is ready to accept read only connections
    ...
    

    Additionally, you can search in master pg_stat_replication to determine whether the new subordinate (called cloud_master) is connected:

    postgres=# \x
    Expanded display is on.
    postgres=# select * from pg_stat_replication where application_name='cloud_master';
    -[ RECORD 1 ]----+------------------------------
    pid              | 16940
    usesysid         | 16402
    usename          | repmgr
    application_name | cloud_master
    ...
    

Create a subordinate database

  1. Shut down the database and the server:

    sudo service postgresql stop
    sudo shutdown -h now
    
  2. To verify the service has stopped, run the following command:

    gcloud compute instances describe master-instance-name | grep status
    

    The output shows the status of the instance as TERMINATED:

    status: TERMINATED
    

    Next, create snapshots of the data disk to help create new subordinates.

  3. In the GCP Console, go to the Snapshots page.

    Go to the Snapshots page

  4. From the PostgreSQL disk, create a new snapshot.

  5. Start the GCP master server.

  6. Go to the VM instances page, click master-instance-name, and then click Start.

    The PostgreSQL service starts automatically.

  7. To check, run the following command:

    ps ax | grep postgres
    

    The result should be something like the following:

     1398 ?     S   0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
     1454 ?     Ss  0:00 postgres: checkpointer process
     1455 ?     Ss  0:00 postgres: writer process
     1456 ?     Ss  0:00 postgres: wal writer process
     1457 ?     Ss  0:00 postgres: stats collector process
    
  8. In the GCP Console, go to the VM instances page, and then click Create instance.

  9. For the boot disk, choose Ubuntu 14.04.

  10. Click Management, disks, networking, ssh-keys and add a new disk based on the snapshot that you created earlier.

  11. Start the new server and mount the disk:

    sudo mkdir /database && sudo mount /dev/sdb1 /database
    
  12. Install PostgreSQL:

    sudo apt-get install postgresql && sudo service postgresql stop
    
  13. Configure data directory and replication values. Copy the postgresql.conf file and the pg_hba.conf file from the GCP master, and edit the recovery.conf file to include the following:

    standby_mode = 'on'
    primary_conninfo = 'port=5432 host=${cloud_master_ip} user=${replication_user} application_name=cloud_slave_${identifier}'
    recovery_target_timeline = 'latest'
    
  14. Start the PostgreSQL service with the new config file, pointing to the GCP master:

    sudo service postgresql restart
    
  15. Verify that the service is running:

    ps ax | grep postgres
    
  16. Check the GCP master server with this query:

    postgres=# \x
    Expanded display is on.
    postgres=# select * from pg_stat_replication where application_name like 'cloud_slave%';
    -[ RECORD 1 ]----+-----------------------------
    pid              | 2466
    usesysid         | 16402
    usename          | repmgr
    application_name | cloud_slave_1
    ...
    
  17. Repeat these steps to create additional subordinates.

Switching over to the new server

  1. Change the config file in PgBouncer to point to the new GCP master server.
  2. In the PgBouncer instance, shut down PgBouncer, promote the new master using the failover.postgresql.5432 trigger file, and then restart PgBouncer:

    service pgbouncer stop ; ssh ${cloud_master_ip} 'touch /tmp/failover.postgresql.5432' ; service pgbouncer start
    

Setting up monitoring

Stackdriver has partnered with Blue Medora to provide PostgreSQL metrics. You can collect metrics and logs from your existing PostgreSQL instance and from your new GCP instance.

For more information about Blue Medora's PostgreSQL monitoring support, visit their Getting started page.

Sample queries

Check all concurrent connections on the server:

 select * from pg_stat_activity;

(Master) Check the replication status:

select * from pg_stat_replication;

(Master) Check to see the lag in applying data on the replica:

select pg_xlog_location_diff(write_location, replay_location) from pg_stat_replication;

(Master) Check the byte lag in replication:

select client_hostname, client_addr, pg_xlog_location_diff(pg_stat_replication.sent_location,
  pg_stat_replication.replay_location)AS byte_lag from pg_stat_replication;

(Subordinate) Check if the database is a replica or not:

select pg_is_in_recovery();

(Subordinate) Check the last data received from the master:

select pg_last_xlog_receive_location();

(Subordinate) Check the data last applied from the master:

select pg_last_xlog_replay_location();

(Subordinate) Check replication delay in seconds:

select now() - pg_last_xact_replay_timestamp();

Cleaning up

Delete the project

  1. In the GCP Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Next steps

  • Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.
  • Learn how to use Google Cloud Platform products to build end-to-end solutions.
Was this page helpful? Let us know how we did:

Send feedback about...