Performing a PITR of a PostgreSQL database on Compute Engine


This tutorial shows how to set up the archiving process, and then perform a point-in-time recovery (PITR) of a PostgreSQL database running on Compute Engine.

In this tutorial, you create a demonstration database and run an application workload. Then, you configure the archive and backup processes. Next, you learn how to verify the backup, archive, and recovery processes. Finally, you learn how to recover the database to a specific point in time.

This tutorial is intended for database administrators, system operators, DevOps professionals, and cloud architects interested in configuring a backup and recovery strategy for PostgreSQL databases.

This tutorial assumes that you are familiar with Docker containers and that you are comfortable with Linux commands, PostgreSQL database engines, and Compute Engine.

Objectives

  • Set up a backup and archiving process.
  • Perform a PITR.
  • Monitor your backup.
  • Verify a recovery.

Costs

In this document, you use the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Compute Engine and Cloud Storage APIs.

    Enable the APIs

  5. Install the Google Cloud CLI.
  6. To initialize the gcloud CLI, run the following command:

    gcloud init
  7. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  8. Make sure that billing is enabled for your Google Cloud project.

  9. Enable the Compute Engine and Cloud Storage APIs.

    Enable the APIs

  10. Install the Google Cloud CLI.
  11. To initialize the gcloud CLI, run the following command:

    gcloud init
  12. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

    At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

Concepts

Before you start the tutorial, review the following PostgreSQL concepts:

  • Continuous archiving. When the database continuously saves sequential transactions to a file.
  • Write Ahead Log (WAL). Changes to data files are recorded in the WAL before they are made to the file.
  • WAL Record. Each transaction applied to the database is formatted and stored as a WAL record.
  • Segment files. Segment files have monotonically increasing filenames, and contain as many WAL records as possible. The file size is configurable, with a default of 16 MiB. You might choose a larger size if you expect voluminous transactions in size, or count, to lower the aggregate number of generated segment files and to decrease the file management burden.

For more information, see Reliability and the Write-Ahead Log.

The following diagram shows how WALs are persisted in two stages.

2 stages of persistent WALs.

In the preceding diagram, the first stage of persisting WALs consists of the database engine recording write transactions in the WAL buffer concurrently with the write to a table. When the transaction is committed, the WAL buffer is written (flushed) to disk during the second stage with an append to the WAL segment file.

Choosing a PITR

A PITR is appropriate for the following scenarios:

  • Minimize the recovery point objective (RPO). The RPO is the maximum time of data loss that is tolerated before it significantly impacts business processes. Saving all transactions in the WALs between backup snapshots drastically decreases the amount of lost data because you have the transactions since the last full backup to apply to the database.
  • Minimize the recovery time objective (RTO). The RTO is the amount of time required to recover a database if a destructive event occurs. After you set up binary backups and log archiving, the time required to recover the database can be minimal.
  • Remediation for a data corruption bug, or an administrative misstep. If a code release causes catastrophic data corruption, or an unrecoverable mistake is made during routine maintenance, you can recover to before that moment.

In some application architectures, such as a microservices architecture, there might be parallel databases that could require independent recoveries. For example, a retail application might have customer data in one database and retail order details and inventory information in other databases. Depending on the overall state of data, one, two, or all databases might need to be recovered in parallel.

A PITR is not appropriate in the following scenarios:

  • RPO is large. If your disaster recovery policy can tolerate the loss of any transactions received after the recent snapshot, you can avoid additional steps and focus on reducing the time to recover your data.
  • A complete database recovery is needed. If your goal is to recover to the most recent transaction, your recovery target is the timestamp of the last persisted transaction. This scenario is a specific case of PITR but semantically this goal is referred to as a full recovery.

Performance considerations

The archiving process puts additional I/O load on your database server. The additional load is dependent upon the characteristics of your workload, because it is proportionate to the write, update, and delete transaction volume.

If you want to reduce the I/O impact that the WAL archive activity might incur on your primary database, you can perform the periodic WAL archives using a read-only replica.

This configuration isolates the primary database from the batch-oriented I/O activities related to the transfer of the WAL files. Transactions destined for the read-only replica are transmitted in a constant stream from the primary database, thus exacting a much lower impact upon steady-state throughput.

Further, if your production database topology already includes a read-only replica, this configuration doesn't add any additional burden: management, price or otherwise.

Reference architecture

The following diagram illustrates the architecture that you implement in this tutorial.

Cloud infrastructure of PITR using Compute Engine and Cloud Storage.

In this tutorial, you create cloud infrastructure to observe a PITR that is using the following components:

  • A PostgreSQL database server running on Compute Engine.
  • Cloud Storage for storage of snapshots and transaction logs.

The following diagram shows the two Docker containers that are launched on the PostgreSQL database virtual machine (VM). As a separation of concerns, the database server is running in one of the containers, and the WAL archiver is running in the other container.

Docker containers for the database server and the WAL archiver.

This diagram shows how the Docker volumes in each container are mapped to Persistent Disk mount points on the host VM.

Setting up environment variables

The scripts and commands used in this tutorial rely on shell environment variables.

  1. In Cloud Shell, set environment variables for your project, the instance name, and the demonstration PostgreSQL database.

    export PROJECT_ID=your-gcp-project
    export PG_INSTANCE_NAME=instance-pg-pitr
    export POSTGRES_PASSWORD=PasswordIsThis
    export POSTGRES_PITR_DEMO_DBNAME=pitr_demo
    

    Replace the following:

    • your-gcp-project: the name of the project that you created for this tutorial.
    • PasswordIsThis: a secure password for the PostgreSQL database.
  2. Set the environment variable for the Google Cloud zone. Replace choose-an-appropriate-zone with a Google Cloud zone.

    export ZONE=choose-an-appropriate-zone
    export REGION=${ZONE%-[a-z]}
    
  3. Set the environment variable for the default Virtual Private Cloud (VPC) subnet for the region of your zone:

    export SUBNETWORK_URI=$(gcloud compute networks subnets \
        describe default --format=json --region=$REGION | \
        jq --raw-output '.ipCidrRange')
    
  4. Set the environment variable for the Cloud Storage bucket. Replace archive-bucket with a unique name for the Cloud Storage bucket where WALs are saved.

    export ARCHIVE_BUCKET=archive-bucket
    

Creating a Cloud Storage bucket

  • Create a Cloud Storage bucket to archive the WAL files from the PostgreSQL database:

    gcloud storage buckets create gs://${ARCHIVE_BUCKET}
    

Allowing access to private IP addresses instances

For the instances used in this tutorial, as in many production use cases, there is no need for the VM instances to obtain public IP addresses. However, the instances require access to the public internet to pull the example container images, and you require access in order to connect by using a secure shell. You configure a network address translation (NAT) gateway and configure Identity-Aware Proxy (IAP) for TCP forwarding.

Create a NAT gateway

Because the VM instances that you create don't have public IP addresses, you create a NAT gateway so that the instances can pull container images from the Docker Hub.

  1. In Cloud Shell, create a Cloud Router:

    export CLOUD_ROUTER_NAME=${PROJECT_ID}-nat-router
    gloud compute routers create $CLOUD_ROUTER_NAME \
        --network=default --region=$REGION
    
  2. Create the NAT gateway:

    gcloud compute routers nats create ${PROJECT_ID}-nat-gateway \
        --region=$REGION \
        --router=$CLOUD_ROUTER_NAME \
        --auto-allocate-nat-external-ips \
        --nat-all-subnet-ip-ranges
    

Configure IAP for TCP forwarding

IAP controls access to your cloud applications and VMs running on Google Cloud. IAP verifies the user identity and context of the request to determine whether a user is allowed to access a VM.

  1. In Cloud Shell, allow traffic from the TCP forwarding net block to the instances in your project:

    export IAP_FORWARDING_CIDR=35.235.240.0/20
    gcloud compute --project=$PROJECT_ID firewall-rules create \
        cloud-iap-tcp-forwarding --direction=INGRESS  \
        --priority=1000 --network=default \
        --action=ALLOW --rules=all  \
        --source-ranges=$IAP_FORWARDING_CIDR
    
  2. To connect by using a TCP forwarding tunnel, add an Identity and Access Management (IAM) policy binding. Replace your-email-address with the email address that you use to log into Google Cloud console.

    export GRANT_EMAIL_ADDRESS=your-email-address
    gcloud projects add-iam-policy-binding $PROJECT_ID \
       --member=user:$GRANT_EMAIL_ADDRESS \
       --role=roles/iap.tunnelResourceAccessor
    

Creating the PostgreSQL database infrastructure

  1. In Cloud Shell, clone the source repository that contains the configuration scripts, and change the shell context to the local repository:

    git clone https://github.com/GoogleCloudPlatform/gcs-postgresql-recovery-tutorial
    cd gcs-postgresql-recovery-tutorial
    
  2. To create and configure the database VM instance, run the following script:

    cd bin
    ./create_postgres_instance.sh
    

    For this tutorial, this script starts a VM instance in your chosen zone with the container-optimized operating system, and two new attached persistent disks. In this case, you can ignore the warning message returned by the API about poor I/O performance because the scripts create small Persistent Disks.

Reviewing the cloud-init configuration

Cloud-init is a multi-distribution package that initializes a cloud instance.

Review the following cloud-init code sample:

write_files:
- path: /var/tmp/docker-entrypoint-initdb.d/init-pitr-demo-db.sql
  permissions: 0644
  owner: root
  content: |
    CREATE DATABASE ${POSTGRES_PITR_DEMO_DBNAME};

    \c ${POSTGRES_PITR_DEMO_DBNAME}

    CREATE SCHEMA pitr_db_schema;

    CREATE TABLE pitr_db_schema.customer
       (id SERIAL NOT NULL,
        name VARCHAR(255),
        create_timestamp TIMESTAMP DEFAULT current_timestamp,
        PRIMARY KEY (id));

    CREATE TABLE pitr_db_schema.invoice
       (id SERIAL NOT NULL,
        customer_id INTEGER
          REFERENCES pitr_db_schema.customer(id),
        description VARCHAR(1000),
        create_timestamp TIMESTAMP DEFAULT current_timestamp,
        PRIMARY KEY (customer_id, id));

- path: /etc/systemd/system/postgres.service
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Requires=docker.service
    After=docker.service
    Description=postgres docker container

    [Service]
    TimeoutStartSec=0
    KillMode=none
    Restart=always
    RestartSec=5s
    ExecStartPre=-/usr/bin/docker kill postgres-db
    ExecStartPre=-/usr/bin/docker rm -v postgres-db
    ExecStart=/usr/bin/docker run -u postgres --name postgres-db \
                                  -v /var/tmp/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d \
                                  -v /mnt/disks/data:/var/lib/postgresql/data \
                                  -v /mnt/disks/wal:/var/lib/postgresql/wal \
                                  -e PGDATA=/var/lib/postgresql/data/pgdata \
                                  -e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
                                  -e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/wal/pg_wal \
                                  -p ${POSTGRES_PORT}:${POSTGRES_PORT} \
                               postgres:11-alpine
    ExecStop=-/usr/bin/docker stop postgres-db
    ExecStopPost=-/usr/bin/docker rm postgres-db

- path: /etc/systemd/system/wal_archive.service
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Requires=docker.service postgres.service
    After=docker.service postgres.service
    Description=WAL archive docker container

    [Service]
    TimeoutStartSec=10min
    Type=oneshot
    ExecStart=/usr/bin/docker run --name wal-archive \
                                  -v /mnt/disks/wal/pg_wal_archive:/mnt/wal_archive \
                               google/cloud-sdk:slim gsutil mv /mnt/wal_archive/[0-9A-F]*[0-9A-F] gs://${ARCHIVE_BUCKET}
    ExecStopPost=-/usr/bin/docker rm wal-archive

- path: /etc/systemd/system/wal_archive.timer
  permissions: 0644
  owner: root
  content: |
    [Unit]
    Description=Archive WAL to GCS (every 5 minutes)

    [Timer]
    OnBootSec=5min
    OnUnitInactiveSec=5min
    OnUnitActiveSec=5min

    [Install]
    WantedBy=timers.target

For this tutorial, cloud-init is used to do the following:

  1. Create two Persistent Disk block storage devices.
  2. Create the file systems on the two devices: one for the data and one for the archive logs.
  3. Mount the devices at logical mount points on the VM instance which are shared with the Docker containers.
  4. Create and then start a systemd service (postgres.service), which starts a PostgreSQL Docker container with the following:
    • The persistent disks mounted as volumes.
    • The PostgreSQL port (5432) published to the VM host.
  5. Create a /var/tmp/docker-entrypoint-initdb.d/init-pitr-demo-db.sql file to create a simple set of tables in a demonstration database and schema.
  6. Create and start a second systemd service (wal_archive.service) that runs a Google Cloud CLI Docker container with the WAL disks mounted as a volume. This service backs up archived WAL files to Cloud Storage.
  7. Create, enable, and then start a systemd timer (wal_archive.timer) that periodically runs the wal_archive.service.
  8. Ensure that the PostgreSQL port (5432) is open for the VPC subnet so that the transaction generator can reach the database port.

Modify the database instance configuration

The database server is running, but you need to configure network access, and to start the WAL archiving process.

Connect to the database VM instance

  1. In the Google Cloud console, go to the VM instances page.

    Go to VM instance

  2. To open a terminal shell, next to the instance-pg-pitr instance that you created, click SSH.

  3. In the terminal shell, check that the Docker container started: docker ps

    The output is similar to the following:

    CONTAINER ID   IMAGE                COMMAND                  CREATED              STATUS              PORTS   NAMES
    8bb65d8c1197   postgres:11-alpine   "docker-entrypoint.s…"   About a minute ago   Up About a minute           postgres-db
    

    If the container is not yet running, wait a moment, and then use the same command to check again.

Allow inbound network connections to database

  1. In the terminal shell of the instance-pg-pitr instance, open the PostgreSQL host-based authentication configuration file for editing:

    sudoedit /mnt/disks/data/pgdata/pg_hba.conf
    
  2. To remove the default all IP address access to the database, comment out the following line from the end of the file by adding # at the beginning of the line. The line in the file looks similar to the following:

    #host all all all md5
    
  3. To allow password-protected connections from hosts in the 10.0.0.0/8 CIDR block, add the following line to the end of the file:

    host    all             all             10.0.0.0/8               md5
    

    This entry enables connectivity from the VPC subnet where the transaction generator is later created.

  4. Save and then close the file.

Configure WAL archiving

  1. In the terminal shell of the instance-pg-pitr instance, edit the postgresql.conf file:

    sudoedit /mnt/disks/data/pgdata/postgresql.conf
    
  2. Replace the existing commented-out archive_mode, archive_command, and archive_timeout lines with the following:

    archive_mode=on
    archive_command = '( ARCHIVE_PATH=/var/lib/postgresql/wal/pg_wal_archive;
    test ! -f $ARCHIVE_PATH/%f && cp %p $ARCHIVE_PATH/%f.cp && mv
    $ARCHIVE_PATH/%f.cp $ARCHIVE_PATH/%f ) '
    archive_timeout = 120
    

    When you replace the lines in the modified file, it looks similar to the following code snippet:

    
    .... illustrative snippet start ....
    
    # - Archiving -
    archive_mode=on
    archive_command = '( ARCHIVE_PATH=/var/lib/postgresql/wal/pg_wal_archive;  test ! -f $ARCHIVE_PATH/%f && cp %p $ARCHIVE_PATH/%f.cp && mv $ARCHIVE_PATH/%f.cp $ARCHIVE_PATH/%f ) '
    archive_timeout = 120
    #------------------------------------------------------------------------------
    # REPLICATION
    #------------------------------------------------------------------------------
    
    .... illustrative snippet end ....
    
    
  3. Save and then close the file.

Apply and verify the configuration changes

  1. In the terminal shell of the instance-pg-pitr instance, restart the container to apply the changes:

    sudo systemctl restart postgres
    
  2. Check for the WAL segment files:

    sudo ls -l /mnt/disks/wal/pg_wal
    

    The output is similar to the following:

    total 16388
    -rw------- 1 postgres 70 16777216 Sep  5 23:07 000000010000000000000001
    drwx------ 2 postgres 70     4096 Sep  5 23:05 archive_status
    
  3. Check for network connectivity to the database:

    export LOCAL_IP=127.0.0.1
    docker exec postgres-db psql -w --host=$LOCAL_IP \
          --command='SELECT 1'
    

    The output is similar to the following:

    ?column?
    ----------
           1
    (1 row)
    
  4. Close the SSH connection to the instance.

Starting the transaction generator to populate database

The following steps start a Go program that generates transactions for this tutorial. The program runs inside a container on a VM instance.

The image for the container is already built and hosted in a project with a public Container Registry.

  1. In Cloud Shell, change to the transaction generator directory:

    cd ~/gcs-postgresql-recovery-tutorial/bin
    
  2. Set the environment variables:

    export TRANS_GEN_INSTANCE_NAME=instance-trans-gen
    export POSTGRES_HOST_IP=$(gcloud compute instances describe  \
        --format=json --zone=${ZONE} ${PG_INSTANCE_NAME} | \
        jq --raw-output '.networkInterfaces[0].networkIP')
    
  3. To run the transaction generator, start the instance:

    ./run_trans_gen_instance.sh
    

    Ignore the warning message about poor I/O performance.

  4. Wait a few moments, and check that transactions are reaching the PostgreSQL database:

    gcloud compute ssh $PG_INSTANCE_NAME \
       --tunnel-through-iap \
       --zone=$ZONE \
       --command="docker exec postgres-db psql \
       --dbname=$POSTGRES_PITR_DEMO_DBNAME \
       --command='SELECT COUNT(*) FROM pitr_db_schema.customer;'"
    

    The output contains a count greater than 0 when records are added to the database by the transaction generator:

     count
    -------
       413
    (1 row)
    

Configuring the binary snapshot backup schedule

You can back up persistent disks according to a schedule, and retain them for a time that is defined in the resource policy.

Create the snapshot schedule

  1. In Cloud Shell, set environment variables:

    export ZONE=zone-of-your-instance
    export SNAPSHOT_SCHEDULE_NAME=pg-disk-schedule
    export REGION=${ZONE%-[a-z]}
    export SNAPSHOT_WINDOW_START=$(TZ=":GMT" date "+%H:00")
    export SNAPSHOT_RETENTION_DAYS=2
    export SNAPSHOT_FREQUENCY_HOURS=1
    

    Replace zone-of-your-instance with the Google Cloud zone where you previously started the database VM.

  2. Create the snapshot schedule:

    gcloud compute resource-policies create snapshot-schedule \
        $SNAPSHOT_SCHEDULE_NAME \
        --region=$REGION \
        --max-retention-days=$SNAPSHOT_RETENTION_DAYS \
        --on-source-disk-delete=apply-retention-policy \
        --hourly-schedule=$SNAPSHOT_FREQUENCY_HOURS \
        --start-time=$SNAPSHOT_WINDOW_START \
        --storage-location=$REGION
    

Attach the snapshot schedule to the disks

When you ran the script to create an instance, the data and WAL volumes were created as two independent persistent disks. To create persistent disk snapshots according to a defined schedule, you associate a resource policy with each persistent disk. In this case, you want the disk snapshots to occur concurrently, so you use the same policy for both persistent disks attached to the Compute Engine VM.

  1. In Cloud Shell, set environment variables:

    export SNAPSHOT_SCHEDULE_NAME=pgdata-disk-schedule
    export PG_INSTANCE_NAME=instance-pg-pitr
    export ZONE=zone-of-your-instance
    
  2. Attach the schedule policy to the persistent data disk:

    gcloud beta compute disks add-resource-policies ${PG_INSTANCE_NAME}-data \
        --resource-policies $SNAPSHOT_SCHEDULE_NAME \
        --zone $ZONE
    
  3. Attach the schedule policy to the persistent WAL disk:

    gcloud beta compute disks add-resource-policies ${PG_INSTANCE_NAME}-wal \
        --resource-policies $SNAPSHOT_SCHEDULE_NAME \
        --zone $ZONE
    

Manually run a snapshot

(Optional) Scheduled snapshots happen within the schedule window, so it's unlikely a snapshot is taken immediately when you created the schedule. If you don't want to wait for the scheduled snapshot, you can manually run the initial snapshot.

  1. In Cloud Shell, set environment variables:

    export ZONE=zone-of-your-instance
    export PG_INSTANCE_NAME=instance-pg-pitr
    export REGION=${ZONE%-[a-z]}
    
  2. Create a snapshot of the two PostgreSQL instance persistent disks:

    gcloud compute disks snapshot \
        ${PG_INSTANCE_NAME}-data ${PG_INSTANCE_NAME}-wal \
        --snapshot-names=${PG_INSTANCE_NAME}-data-`date+%s`,${PG_INSTANCE_NAME}-wal-`date +%s` \
        --zone=$ZONE --storage-location=$REGION
    
  3. View the snapshots that you created:

    gcloud compute snapshots list
    

    The output is similar to the following:

    NAME                              DISK_SIZE_GB  SRC_DISK                                   STATUS
    instance-pg-pitr-data-1578339767  200           us-central1-f/disks/instance-pg-pitr-data  READY
    instance-pg-pitr-wal-1578339767   100           us-central1-f/disks/instance-pg-pitr-wal   READY
    

Performing a PITR

A PITR is often performed to recover data that was lost due to an operational or programmatic misstep.

In this section of the tutorial, you perform a database update to simulate a catastrophic loss of data. You then simulate a panicked response, before starting a recovery to the moment before the erroneous command was issued.

Ensure that a PITR can be performed

Before performing a PITR, you have to allow sufficient time for the following to run:

  • The binary backups (disk snapshots)
  • WAL archiving

For this tutorial, the archive_timeout was set to an atypically low 120 seconds to force frequent WAL file rotation. You also have to wait until at least one scheduled disk snapshot is performed, or you need to snapshot the disk manually.

  1. Check that at least one snapshot was taken:

    1. In the Google Cloud console, go to the Snapshots page.

      Go to the Snapshots page

    2. Verify that there are at least two snapshots—one for the data volume and one for the WAL volume, for example, instance-pg-pitr--us-central1-a-20190805023535-i3hpw7kn.

  2. Check that the segment files are archived to Cloud Storage:

    1. In the Google Cloud console, go to the Cloud Storage Browser page.

      Go to the Cloud Storage Browser page

    2. Click archive-bucket.

      Cloud Storage bucket containing objects.

Damage the data

To simulate a catastrophic loss of data, open a command-line shell to the PostgreSQL database, and damage the data in the table populated by the transaction generator.

  1. In the Google Cloud console, go to the VM instances page.

    Go to the VM instances page

  2. For the instance-pg-pitr instance, click SSH.

  3. In the SSH terminal, run the PostgreSQL terminal-based frontend in the Docker container:

    docker exec -it postgres-db psql --dbname=pitr_demo
    
  4. To modify a row in the customer table, submit a SQL DML statement with an intentional typo in the PostgreSQL shell:

    UPDATE pitr_db_schema.customer
    SET name = 'New Name for customer id=1';
    WHERE id = 1;
    

    The output is similar to the following:

    UPDATE 999
    pitr_demo=#  WHERE id = 1;
    ERROR:  syntax error at or near "WHERE"
    LINE 1: WHERE id = 1;
            ^
     

    The error was generated because an extra semicolon was inserted before the WHERE clause. All rows in the database were updated. You can now do a PITR to recover the rows that the incorrect statement modified.

Determine target time for the recovery

The first step in a PITR is to determine the target time for the recovery. This time is determined by examining your data to identify a point slightly before the data-damaging event.

  1. In the terminal shell of the instance-pg-pitr instance, obtain the maximum timestamp of the damaged rows:

    SELECT MAX(create_timestamp)::timestamptz
      FROM pitr_db_schema.customer
    WHERE name = 'New Name for customer id=1';
    

    The output is similar to the following:

                 max              .
    -------------------------------
    2019-08-05 18:14:58.226511+00
    (1 row)
    

    In a production database, the query to determine the recovery target is more complex, especially in cases where the affected table is large, and the indicative column is unindexed.

  2. Copy the result; you use the value returned by this query in the next step.

Recover the database

For this tutorial, a recovery script automates the PITR. We recommend that you have an automated process to recover the database and that you periodically test this process.

  1. In Cloud Shell, change the current working directory to the location of the recovery script:

    cd ~/gcs-postgresql-recovery-tutorial/bin
    
  2. Set the required environment variables for the script. Replace YYYY-MM-DD HH:MM:SS.999999+00 with the query output that you previously copied.

    export PROJECT_ID=$(gcloud config get-value project)
    export PG_INSTANCE_NAME=instance-pg-pitr
    export POSTGRES_PASSWORD=PasswordIsThis
    export PG_INSTANCE_NAME=instance-pg-pitr
    export RECOVER_BUCKET=archive-bucket
    export PIT_RECOVERY_TARGET="YYYY-MM-DD HH:MM:SS.999999+00"
    export ZONE=zone-of-your-instance
    
  3. Run the recovery script:

    ./recover_to_point_in_time.sh
    

Understand the recovery script

This section provides some details about the input parameters and the steps taken by the script.

The script requires that the following environment variables are set:

  • PIT_RECOVERY_TARGET: the recovery target time.
  • PROJECT_ID: the project where the PG_INSTANCE_NAME instance is located.
  • ZONE: the zone where the PG_INSTANCE_NAME instance is located.
  • PG_INSTANCE_NAME: the instance where the production PostgreSQL instance is running.
  • RECOVER_BUCKET: the Cloud Storage bucket where WAL segment files are archived.
  • POSTGRES_PASSWORD: the password used for the PostgreSQL database user.

The script performs the following steps:

  1. Determines the most recent disk snapshots based on the recovery target date and time.
  2. Creates a cloud-init.yaml file that is provided to a container-optimized storage VM that runs the PITR database. The cloud-init.yaml file creates configuration files and runs several system commands to establish the following environment:

    • A gcsfuse container that mounts the WAL segment file archive bucket as a volume which is then exposed to the host with a Docker bind mount.
    • A postgres-dbcontainer where the database engine runs the following:

      • The host file system where the persistent disks are attached as volumes.
      • The host file system where the Cloud Storage bucket is attached as a volume.
    • A recovery.conf recovery file in the PostgreSQL data directory with the following information:

      • The target date.
      • The restore command: a parameterized copy command that the database uses to copy WAL segment files as needed from the archive file system. %f is the segment file and %p is the path used by the database for processing files during the recovery.
    • The archive_ settings are commented out from the postgresql.conf settings file to avoid corrupting the WAL archive directory.

  3. Starts the PITR instance with the following information:

    • A name created by combining the $PG_INSTANCE_NAME environment variable and the alphanumeric values from the $PIT_RECOVERY_TARGET environment variable.
    • Persistent disks created from the previously identified disk snapshots.

The following is an example recovery.conf file:

restore_command = '(test -d /var/lib/postgresql/wal/pg_wal_recover && cp /var/lib/postgresql/wal/pg_wal_recover/%f %p ) '
recovery_target_time='YYYY-MM-DD HH:MM:SS UTC'
recovery_target_inclusive=true

Validate the recovery

  1. In the Google Cloud console, go to the VM instances page.

    Go to the VM instances page

  2. For the instance-pg-pitr-YYYYMMDDHHMMSS instance, click SSH.

  3. In the SSH terminals, run the PostgreSQL terminal-based frontend in the Docker container:

    docker exec -it postgres-db psql --dbname=pitr_demo
    

    If you get the following error, wait a moment for the PostgreSQL container to start, and rerun the command:

    Error: No such container: postgres-db
    
  4. Check the data in the customer table:

    SELECT * FROM pitr_db_schema.customer
    WHERE id > (SELECT MAX(id)-10 FROM pitr_db_schema.customer);
    

    The output is similar to the following:

       id  |           name            |      create_timestamp
    ------+---------------------------+----------------------------
      711 | customer_name_from_golang | 2019-12-06 18:03:51.229444
      712 | customer_name_from_golang | 2019-12-06 18:03:52.531755
      713 | customer_name_from_golang | 2019-12-06 18:03:53.555441
      714 | customer_name_from_golang | 2019-12-06 18:03:54.581872
      715 | customer_name_from_golang | 2019-12-06 18:03:55.607459
      716 | customer_name_from_golang | 2019-12-06 18:03:56.633362
      717 | customer_name_from_golang | 2019-12-06 18:03:57.658523
      718 | customer_name_from_golang | 2019-12-06 18:03:58.685469
      719 | customer_name_from_golang | 2019-12-06 18:03:59.706939
    

    The name shows the value created by the transaction generator. The final row has a timestamp that is earlier than the recovery target (which you provided to the recovery script in an environment variable). Depending on the number of records that you need to recover, you might have to wait a few moments for all of the rows to update.

Clean up

The easiest way to eliminate billing is to delete the Google Cloud project you created for the tutorial. Alternatively, you can delete the individual resources.

Delete the project

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next