Jump to Content
Containers & Kubernetes

Leveraging Backup for GKE (BfG) for Effortless Volume Migration: From In-tree to CSI

February 14, 2024
Arun Singh

Technical Account Manager, AppMod SME, Google Cloud

Matthew Cary

Software Engineer, GKE, Google Cloud

Try Gemini 1.5 Pro

Google's most advanced multimodal model in Vertex AI

Try it

In Kubernetes, persistent volumes were initially managed by in-tree plugins, but this approach hindered development and feature implementation since in-tree plugins were compiled and shipped as part of k8s source code. To address this, the Container Storage Interface (CSI) was introduced, standardizing storage system exposure to containerized workloads. CSI drivers for standard volumes like Google Cloud PersistentDisk were developed and are continuously evolving. The implementation for in-tree plugins is being transitioned to CSI drivers.

If you have a Google Kubernetes Engine (GKE) cluster(s) that is still using the in-tree volumes, please follow the instructions below to learn how to migrate to CSI provisioned volumes.

Why migrate?

There are various  benefits to using a gce-pd CSI Driver, including improved deployment automation, customer managed keys, volume snapshots and more.

In GKE version 1.22 and later, CSI Migration is enabled. Existing volumes that use the gce-pd provider managed through CSI drivers via transparent migration in the kubernetes controller backend. No changes are required to any StorageClass. You must use the pd.csi.storage.gke.io provider in the StorageClass to enable features like CMEK or volume snapshots.

An example of a storage Class with an in-tree storage plugin and a CSI driver.

Loading...

[Please perform the below actions in your test/dev environment first]

Before you begin:

To test migration, create a GKE cluster. Once the cluster is ready, check the provisioner of your default storage class. If it’s already a CSI provisioner pd.csi.storage.gke.io then change it to gce-pd (in-tree) by following these instructions

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_v1_Un0ui3T.max-600x600.jpg
  • Refer to this page if you want to deploy a stateful PostgreSQL database application in a GKE cluster.. We will refer to this sample application throughout this blog.

  • Again, make sure that a storage class (standard) with gce-pd provisioner creates the volumes (PVCs) attached to the pods.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_k8f3Rc2.max-600x600.jpg

As a next step, we will backup this application using Backup for GKE (BfG) and restore the application while changing the provisioner from gce-pd (in-tree) to pd.csi.storage.io (the CSI driver).

Create a backup Plan

Please follow this page to ensure you have BfG enabled on your cluster. 

When you enable the BfG agent in your GKE cluster, BfG provides a CustomResourceDefinition that introduces a new kind of Kubernetes resource: the ProtectedApplication. For more on ProtectedApplication, please visit this page.

A sample manifest file:

Loading...

If Ready to backup status shows as true, your application is ready for backup.

Loading...

Let’s create a backup plan following these instructions

Up until now, we have only created a backup plan and haven’t taken an actual backup. But before we start the backup process, we have to bring down the application.

Bring down the Application

We have to bring down the application right before taking its backup (This is where the Application downtime starts). We are doing it to prevent any data loss during this migration. 

My application is currently exposed via a service db-postgresql-ha-pgpool with the following selectors:

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_wCd0TOZ.max-600x600.jpg

We’ll patch this service by overriding above selectors with a null value so that no new request can reach the database.

Save this file as patch.yaml and apply it using kubectl.

Loading...

Loading...

You should no longer be able to connect to your app (i.e., database) now.

Start a Backup Manually

Navigate to the GKE Console → Backup for GKE → Backup Plans

Click Start a backup as shown below.

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_j4uBvrZ.max-2000x2000.png

Restore from the Backup

We will restore this backup to a Target Cluster. Please note that you do have an option to select the same cluster as your source and your target cluster. The recommendation is to use a new GKE cluster as your target cluster. 

Restore process completes in following two steps:

Create a restore plan

Restore a backup using the restore plan

Create a restore plan

You can follow these instructions to create a restore plan.

While adding the  transformation rule(s) , we will change the storage class from standard to standard-rwo

Add transformation rules → Add Rule (Rename a PVC’s Storage Class)

https://storage.googleapis.com/gweb-cloudblog-publish/images/5_AXsnQ4A.max-1600x1600.png

Please see this page for more details.

Next, review the configuration and create a plan.

Restore backup using the (previously created) restore plan

When a backup is restored, the Kubernetes resources are re-created in the target cluster. 

Navigate to the GKE Console → Backup for GKE → BACKUPS tab to see the latest backup(s). Select the backup you took before bringing down the application to view the details and click on SET UP A RESTORE. Fill all the mandatory fields and click RESTORE.

Once done, switch the context to the target cluster and see how BfG has restored the application successfully in the same namespace.

The data was restored into new PVCs (verify with kubectl -n blog get pvc). Their storageclass is gce-pd-gkebackup-de, which is a special storageclass used to provision volumes from the backup.

https://storage.googleapis.com/gweb-cloudblog-publish/images/6_cRTjeFc.max-600x600.jpg

Let’s get the details of one of the restored volumes to confirm BfG has successfully changed the provisioner from in-tree to CSI

https://storage.googleapis.com/gweb-cloudblog-publish/images/7_X0A5PSQ.max-1700x1700.png

New volumes are created by the CSI provisioner. Great! 

Bring up the application

Let’s patch the service db-postgresql-ha-pgpool back with the original selectors to bring our application up. Save this patch file as new_patch.yaml and apply using kubectl.

https://storage.googleapis.com/gweb-cloudblog-publish/images/8_yXInIDj.max-600x600.jpg

We are able to connect to our database application now.

https://storage.googleapis.com/gweb-cloudblog-publish/images/9_K5Kg2W3.max-600x600.jpg

Note: This downtime will depend on your application size. For more information, please see this link.

Use it today

Backup for GKE can help you reduce the overhead of this migration with a minimal downtime. It can also help you prepare for disaster recovery.

Posted in