Define custom backup and restore logic

Stay organized with collections Save and categorize content based on your preferences.

Administrators can create a ProtectedApplication resource to customize the backup and restore of individual stateful applications. The ProtectedApplication resource defines which Kubernetes resources belong to an application instance. Administrators can then set up specialized orchestration for backup and restore of those applications in scenarios such as the following:

  • Run hooks before and after backing up the volumes. Hooks are commands that run in the application's containers. These hooks are often used for flush and quiesce or unquiesce operations, providing an application-consistent backup.
  • Identify a set of resources in a namespace that may be backed up or restored independently of the other resources in that namespace. This method allows applications that follow primary or replica deployment modes to reduce backup time and avoid duplicate storage costs.
  • Enable an application-mediated backup and restore. This method typically follows a data dump and load workflow. For example, logical backup and restore for databases.

ProtectedApplications are an optional configuration step that applies additional backup and restore logic to that application. If you do not define ProtectedApplications for your backups, consider the following implications:

  • The most fine-grained option for defining backup and restore scope is the namespace.
  • Volume backups are taken during the backup process without any attempt to quiesce the workloads using those volumes.

Only applications that are deployed with either a Deployment or StatefulSet can use the ProtectedApplication resource.

Create ProtectedApplication resources

There are three backup and restore strategies that you can use when defining a ProtectedApplication resource:

Backup all and restore all

This strategy backs up all the resources associated with the application during backup, and restores all those resources during restore. This strategy works best with standalone applications, meaning applications that have no replication between pods.

For a backup all and restore all strategy, you must include the following information in the resource definition:

  • Hooks: defines commands that are executed before and after taking volume backups, such as application quiesce and unquiesce steps. These commands are executed on all pods within a component.
  • Volume selection: provides finer granularity on which volumes are backed up and restored within the component. Any volumes not selected are not backed up. During a restore, any volumes skipped during backup are restored as empty volumes.

This example creates a ProtectedApplication resource that quiesces the filesystem before backing up the logs volume and unquiesces after the backup:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1alpha2
metadata:
  name: nginx
  namespace: sales
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: nginx
  components:
  - name: nginx-app
    resourceKind: Deployment
    resourceNames: ["nginx"]
    strategy:
      type: BackupAllRestoreAll
      backupAllRestoreAll:
        backupPreHooks:
        - name: fsfreeze
          container: nginx
          command: [ /sbin/fsfreeze, -f, /var/log/nginx ]
        backupPostHooks:
        - name: fsunfreeze
          container: nginx
          command: [ /sbin/fsfreeze, -u, /var/log/nginx ]

Backup one and restore all

This strategy backs up one copy of a selected Pod. This single copy is the source for restoring all Pods during a restore. This method can help reduce storage cost and backup time. This strategy works in a high availability configuration when a component is deployed with one primary PersistentVolumeClaim and multiple secondary PersistentVolumeClaims.

For a backup one and restore all strategy, you must include the following information in the resource definition:

  • Backup target: specifies which Deployment or StatefulSet that you want to use to back up the data. The best Pod to back up is automatically selected. In a high availability configuration, it's recommended to back up from a secondary PersistentVolumeClaim.
  • Hooks: defines commands that are executed before and after taking volume backups, such as application quiesce and unquiesce steps. These commands are executed only on the selected backup Pod.
  • Volume selection: provides finer granularity on which volumes are backed up and restored within the component.

If a component is configured with multiple Deployments or StatefulSets, all resources must have the same PersistentVolume structure, meaning they must follow these rules:

  • The number of PersistentVolumeClaims used by all Deployments or StatefulSets must be the same.
  • The purpose of PersistentVolumeClaims in the same index must be the same. For StatefulSets, the index is defined in the volumeClaimTemplate. For Deployments, the index is defined in Volumes and any non-persistent volumes are skipped.
  • If the application component consists of Deployments, each Deployment must have exactly one replica.

Given these considerations, multiple volume sets can be selected for backup, but only one volume from each volume set will be selected.

This example, assuming an architecture of one primary StatefulSet and a secondary StatefulSet, shows a backup of volumes of one Pod in secondary StatefulSet, and then a restore to all other volumes:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1alpha2
metadata:
  name: mariadb
  namespace: mariadb
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: mariadb
  components:
  - name: mariadb
    resourceKind: StatefulSet
    resourceNames: ["mariadb-primary", "mariadb-secondary"]
    strategy:
      type: BackupOneRestoreAll
      backupOneRestoreAll:
        backupTargetName: mariadb-secondary
        backupPreHooks:
        - name: quiesce
          container: mariadb
          command: [...]
        backupPostHooks:
        - name: unquiesce
          container: mariadb
          command: [...]

Dump and load

This strategy uses a dedicated volume for backup and restore processes and requires a dedicated PersistentVolumeClaim attached to a component that stores dump data.

For a dump and load strategy, you must include the following information in the resource definition:

  • Dump target: specifies which Deployment or StatefulSet should be used to dump the data. The best Pod to back up is automatically selected. In a high availability configuration, it's recommended to back up from a secondary PersistentVolumeClaim.
  • Load target: specifies which Deployment or StatefulSet should be used to load the data. The best Pod to back up is automatically selected. The load target does not have to be the same as the dump target.
  • Hooks: defines commands that are executed before and after taking volume backups. There are specific hooks you must define for dump and load strategies:
    • Dump hooks: defines the hooks that dump the data into the dedicated volume before back up. This hook is executed only on the selected dump Pod.
    • Load hooks: defines the hooks that load the data after the application starts. This hook is executed only on the selected load Pod.
    • (optional) Post-backup hooks: defines the hooks that are executed after the dedicated volumes are backed up, such as cleanup steps. This hook is executed only on the selected dump Pod.
  • Volume selection: specifies all dedicated volumes to store the dump data. You should select only one volume for each dump and load Pod.

If the application consists of Deployments, each Deployment must have exactly one replica.

This example, assuming an architecture of one primary StatefulSet and a secondary StatefulSet with dedicated PersistentVolumeClaims for both primary and secondary StatefulSets, shows a dump and load strategy:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1alpha2
metadata:
  name: mariadb
  namespace: mariadb
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: mariadb
  components:
  - name: mariadb-dump
    resourceKind: StatefulSet
    resourceNames: ["mariadb-primary", "mariadb-secondary"]
    strategy:
      type: DumpAndLoad
      dumpAndLoad:
        loadTarget: mariadb-primary
        dumpTarget: mariadb-secondary
        dumpHooks:
        - name: db_dump
          container: mariadb
          command:
          - bash
          - "-c"
          - |
            mysqldump -u root --all-databases > /backup/mysql_backup.dump
        loadHooks:
        - name: db_load
          container: mariadb
          command:
          - bash
          - "-c"
          - |
            mysql -u root < /backup/mysql_backup.sql
        volumeSelector:
          matchLabels:
            gkebackup.gke.io/backup: dedicated-volume

Check if a ProtectedApplication is ready for backup

You can check whether a ProtectedApplication is ready for a backup by running the following command:

kubectl describe protectedapplication APPLICATION_NAME

Replace APPLICATION_NAME with the name of your application.

If ready, the application description will show Ready to backup status as true, such as in this example:

% kubectl describe protectedapplication nginx
Name:         nginx
Namespace:    default
API Version:  gkebackup.gke.io/v1alpha2
Kind:         ProtectedApplication
Metadata:
  UID:               90c04a86-9dcd-48f2-abbf-5d84f979b2c2
Spec:
  Components:
    Name:           nginx
    Resource Kind:  Deployment
    Resource Names:
      nginx
    Strategy:
      Backup All Restore All:
        Backup Pre Hooks:
          Command:
             /sbin/fsfreeze
             -f
             /var/log/nginx
          Container:         nginx
          Name:              freeze
        Backup Post Hooks:
          Command:
             /sbin/fsfreeze
             -u
             /var/log/nginx
          Container:         nginx
          Name:              unfreeze
      Type:                  BackupAllRestoreAll
  Resource Selection:
    Selector:
      Match Labels:
        app:        nginx
    Type:           Selector
 Status:
  Ready To Backup:  true 
Events:             <none>

Define ProtectedApplication hooks

For some application workloads, you might want to run additional processes or scripts before and/or after a backup or restore process. You can use hooks to run those processes.

Here are some examples of how you can use hooks:

  • Quiesce a database before backing up its volumes. Then, unquiesce it after the backup finishes.
  • Dump database logical backup prior to a backup. For example, running mysqldump to export a MySQL database.
  • Load database logical backup during restore. For example, running mysql to import data from a MySQL backup file.
  • Clean up after a backup finishes.

You can use the command field to execute specific commands or scripts, which is useful for applications with complex requirements for backup and restore. For more information, see Define a Command and Arguments for a Container.

Test hooks

Before applying ProtectedApplication hooks to your application, you should test the command by using kubectl exec to ensure that the hooks behave as expected:

kubectl exec POD_NAME -- COMMAND

Replace the following:

  • POD_NAME: the name of the Pod that contains the ProtectedApplication resource.
  • COMMAND: the array containing the command that you want to run in the container, for example /sbin/fsfreeze, -f, /var/log/nginx.

This example shows a completed test command:

kubectl exec nginx-pod -- /sbin/fsfreeze -f /var/log/nginx

To test a shell script, you can get a shell in the running container by running the following command:

kubectl exec -it POD_NAME -- /bin/bash

Within the shell, you can run scripts as necessary. For more information about getting a shell and executing commands in a running container, see Executing shell commands on your container and Get a Shell to a Running Container.

What's next