Define custom backup and restore logic


When you enable the Backup for GKE agent in your Google Kubernetes Engine cluster, Backup for GKE provides a CustomResourceDefinition that introduces a new kind of Kubernetes resource: the ProtectedApplication.

Composing a ProtectedApplication involves three activities:

ProtectedApplication resources provide you with these capabilities when customizing backup and restore logic at the application level:

  • More fine-grained backup and restore operations. Without ProtectedApplications, the scope of your backups must be defined at the Namespace level (either by selecting allNamespaces or selectedNamespaces). Similar logic applies to namespaced resource restoration. Creating ProtectedApplication resources allows you to supply a name to a subset of the resources in a Namespace. You can then backup and restore that subset by listing selectedApplications in your backup scope (and similarly, for restore).

  • Orchestrating fine-grained details of the backup or restore process, including:

    • Skipping selected volumes during backup.

    • Incorporating application topology into backup and restore (for example, only backing up one instance of a replicated database and using it to restore multiple instances).

    • Executing user-defined hooks before and after volumes are snapshotted. These can be used, for example, to flush and quiesce a workload before snapshotting and unquiesce it afterwards.

You create ProtectedApplication via kubectl like other Kubernetes resources. They are completely optional. If ProtectedApplication resources are not present, Backup for GKE creates volume backups for all volumes within the scope of a backup and the resulting volume backups will be crash consistent - all writes flushed to the disk at a particular point in time will be captured (i.e., no partial writes). However, some applications may keep data in memory that isn't flushed to disk, so whether or not an application can recover successfully from a crash consistent backup depends upon the application logic.

Selecting resources

The first step in building your ProtectedApplication resource is to identify the other resources in the same Namespace that you want to include as part of the application. This is the set of resources that will be backed up or restored if you supply the selectedApplications scope option in your BackupPlan configuration.

Resources are identified using a label selector This requires that you label all your resources (using the metadata.label field in each resource) with the same label. Note that this also applies to resources that are automatically created by controllers. These auto-created resources are labeled using their corresponding template. Note that it is common to re-use the same label you are already using to associate generated Pods and PersistentVolumeClaims with their parent resource. The following example shows how you can apply the app: nginx label to the other resources in addition to the Deployment.

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-vars
  namespace: webserver
  labels:
    app: nginx
  data:
    ...
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nginx-logs
  namespace: webserver
  labels:
    app: nginx
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Mi
  storageClassName: standard-rwo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: webserver
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      volumes:
        - name: nginx-logs
          persistentVolumeClaim:
           claimName: nginx-logs
      containers:
      ...

Once you have your selected label applied to all your target resources (and the templates from which additional resources are generated), then you can reference those resources from a ProtectedApplication. For example:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1
metadata:
  name: nginx
  namespace: webserver
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: nginx
  ...

Define orchestration rules

Once you have all the resources in your ProtectedApplication identified, you can choose to define detailed orchestration rules for a subset of these resources. These rules may only apply to two kinds of resources: Deployments and StatefulSets and are referenced in the components section of the ProtectedApplication.

Component overview

Configuring a component involves the following:

  • Selecting a fundamental strategy for how backup and restore will work for this component. There are three strategies available:

    • BackupAllRestoreAll - backup the volumes associated with all instances of the component and restore them all from the backups.

    • BackupOneRestoreAll - backup the volumes from only one instance of the component and use those backups to restore all instances.

    • DumpAndLoad - export data from the application to a single volume at backup time and import that data into the application at restore time.

  • Defining execution hooks to run during backup (and possibly restore, depending on the strategy). A hook is a command that is executed in specific containers.

  • Selecting a subset of volumes to backup.

Execution hooks

A hook is a shell command that Backup for GKE executes in a container at particular phase of the backup or restore process.

There are four different types of hooks:

  • pre hooks - these commands are executed right before volumes are backed up and are generally expected to flush any data in memory to disk and then quiesce the application so that no new disk writes are occurring. These hooks are used in the BackupAllRestoreAll and BackupOneRestoreAll strategies.

  • post hooks - these commands are executed during the volume backup process right after the SNAPSHOTTING step of the volume backup process (before the UPLOADING step). Generally, the SNAPSHOTTING step takes just a few seconds. They are generally expected to unquiesce the application (i.e. allow normal processing and disk writes to proceed). These hooks are used in the BackupAllRestoreAll, BackupOneRestoreAll, and DumpAndLoad strategies.

  • dump hooks - these commands are executed before the volume is backed up in the DumpAndLoad strategy and are generally expected to export data from the application into the designated backup volume.

  • load hooks - these commands are executed at restore time after the backup volume is restored in DumpAndLoad strategy cases. They are generally expected to import the data from the backup volume into the application.

You may provide more than one hook for each type and Backup for GKE will execute them in the order you define them.

You define hooks as part of the component section of the ProtectedApplication specification. All hook definitions have the same available fields:

  • name - a name you assign to the hook.

  • container - (optional) name of container to run command in. If you don't supply the container, Backup for GKE will run the hook in the first container defined for the target Pod(s).

  • command - this is the actual command sent into the container, constructed as an array of words. The first word in the array is the path to the command and subsequent words are the arguments to be passed to the command.

  • timeoutSeconds - (optional) time before hook execution is aborted. If you don't supply this, then it defaults to 30 seconds.

  • onError - (optional) behavior taken when the hook fails. May be set to Ignore or Fail (default). If you set this to Fail, then when a hook fails, the volume backup will fail. If you set this to Ignore, failures of this hook are ignored.

Before applying ProtectedApplication hooks to your application, you should test the command by using kubectl exec to ensure that the hooks behave as expected:

kubectl exec POD_NAME -- COMMAND

Replace the following:

  • POD_NAME: the name of the Pod that contains the ProtectedApplication resource.
  • COMMAND: the array containing the command that you want to run in the container, for example /sbin/fsfreeze, -f, /var/log/nginx.

Selecting a subset of volumes to backup

Sometimes, applications write to volumes that are not interesting to restore (for example, certain log or scratch volumes). You can suppress the backup of these volumes by using a volume selector.

To use this feature, you must first apply a common label to the volumes you want to backup and then leave this label off the volumes you do not want backed up. Then you include a volumeSelector clause in your component definition as follows:

spec:
  ...
  components:
  ...
    strategy:
      ...
      volumeSelector:
        matchLabels:
          label_name: label_value

If you supply a volumeSelector for a component, then only the volumes that have the given label will be backed up and restored. At restore time, any other volumes will be provisioned as empty instead of restored from a volume backup.

Strategy: BackupAllRestoreAll

This is the simplest strategy and backs up all the component's volumes at backup time and restores them all from their volume backups at restore time. It is your best choice when your application has no replication between Pods.

This strategy supports the following parameters:

  • backupPreHooks - (optional) an ordered list of hooks that are executed right before volumes are backed up. These commands are executed on all Pods in the component.

  • backupPostHooks - (optional) an ordered list of hooks that are executed after volume backups have reached the UPLOADING phase. These commands are executed on all Pods in the component.

  • volumeSelector - (optional) logic for matching a subset of volumes to backup.

This example creates a ProtectedApplication resource that quiesces the file system before backing up the logs volume and unquiesces after the backup:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1
metadata:
  name: nginx
  namespace: sales
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: nginx
  components:
  - name: nginx-app
    resourceKind: Deployment
    resourceNames: ["nginx"]
    strategy:
      type: BackupAllRestoreAll
      backupAllRestoreAll:
        backupPreHooks:
        - name: fsfreeze
          container: nginx
          command: [ /sbin/fsfreeze, -f, /var/log/nginx ]
        backupPostHooks:
        - name: fsunfreeze
          container: nginx
          command: [ /sbin/fsfreeze, -u, /var/log/nginx ]

Strategy: BackupOneAndRestoreAll

This strategy backs up one copy of a selected Pod. This single copy is the source for restoring all Pods during a restore. This method can help reduce storage cost and backup time. This strategy works in a high availability configuration when a component is deployed with one primary PersistentVolumeClaim and multiple secondary PersistentVolumeClaims.

This strategy supports the following parameters:

  • backupTargetName - (required) specifies which Deployment or StatefulSet that you want to use to back up the data. The best Pod to back up is automatically selected. In a high availability configuration, we recommend that you set this to one of your application replicas.

  • backupPreHooks - (optional) an ordered list of hooks that are executed right before volumes are backed up. These commands are executed only on the selected backup Pod.

  • backupPostHooks - (optional) an ordered list of hooks that are executed after volume backups have reached the UPLOADING phase. These commands are executed only on the selected backup Pod.

  • volumeSelector - (optional) logic for matching a subset of volumes to backup.

If a component is configured with multiple Deployments or StatefulSets, all resources must have the same PersistentVolume structure, meaning they must follow these rules:

  • The number of PersistentVolumeClaims used by all Deployments or StatefulSets must be the same.
  • The purpose of PersistentVolumeClaims in the same index must be the same. For StatefulSets, the index is defined in the volumeClaimTemplate. For Deployments, the index is defined in Volumes and any non-persistent volumes are skipped.
  • If the application component consists of Deployments, each Deployment must have exactly one replica.

Given these considerations, multiple volume sets can be selected for backup, but only one volume from each volume set will be selected.

This example, assuming an architecture of one primary StatefulSet and a secondary StatefulSet, shows a backup of volumes of one Pod in secondary StatefulSet, and then a restore to all other volumes:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1
metadata:
  name: mariadb
  namespace: mariadb
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: mariadb
  components:
  - name: mariadb
    resourceKind: StatefulSet
    resourceNames: ["mariadb-primary", "mariadb-secondary"]
    strategy:
      type: BackupOneRestoreAll
      backupOneRestoreAll:
        backupTargetName: mariadb-secondary
        backupPreHooks:
        - name: quiesce
          container: mariadb
          command: [...]
        backupPostHooks:
        - name: unquiesce
          container: mariadb
          command: [...]

Strategy: DumpAndLoad

This strategy uses a dedicated volume for backup and restore processes and requires a dedicated PersistentVolumeClaim attached to a component that stores dump data.

This strategy supports the following parameters:

  • dumpTarget - (required) specifies which Deployment or StatefulSet that you want to use to back up the data. The best Pod to back up is automatically selected. In a high availability configuration, we recommend that you set this to one of your application replicas.

  • loadTarget - (required) specifies which Deployment or StatefulSet should be used to load the data. The best Pod to back up is automatically selected. The load target does not have to be the same as the dump target.

  • dumpHooks - (required) an ordered list of hooks that are executed to populate the dedicated backup volume. These commands are only executed on the selected dump Pod.

  • backupPostHooks - (optional) an ordered list of hooks that are executed after volume backups have reached the UPLOADING phase. These commands are executed only on the selected dump Pod.

  • loadHooks - (required) an ordered list of hooks that are executed to load the data from the restored volume after the application starts. These commands are executed only on the selected load Pod.

  • volumeSelector - (required) logic for matching a single volume to backup and restore (the "dump" volume). Though it must only match a single volume, you configure this the same way you do the subset of volumes to backup used by other strategies.

If the application consists of Deployments, each Deployment must have exactly one replica.

This example, assuming an architecture of one primary StatefulSet and a secondary StatefulSet with dedicated PersistentVolumeClaims for both primary and secondary StatefulSets, shows a DumpAndLoad strategy:

kind: ProtectedApplication
apiVersion: gkebackup.gke.io/v1
metadata:
  name: mariadb
  namespace: mariadb
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app: mariadb
  components:
  - name: mariadb-dump
    resourceKind: StatefulSet
    resourceNames: ["mariadb-primary", "mariadb-secondary"]
    strategy:
      type: DumpAndLoad
      dumpAndLoad:
        loadTarget: mariadb-primary
        dumpTarget: mariadb-secondary
        dumpHooks:
        - name: db_dump
          container: mariadb
          command:
          - bash
          - "-c"
          - |
            mysqldump -u root --all-databases > /backup/mysql_backup.dump
        loadHooks:
        - name: db_load
          container: mariadb
          command:
          - bash
          - "-c"
          - |
            mysql -u root < /backup/mysql_backup.sql
        volumeSelector:
          matchLabels:
            gkebackup.gke.io/backup: dedicated-volume

Check if a ProtectedApplication is ready for backup

You can check whether a ProtectedApplication is ready for a backup by running the following command:

kubectl describe protectedapplication APPLICATION_NAME

Replace APPLICATION_NAME with the name of your application.

If ready, the application description will show Ready to backup status as true, such as in this example:

% kubectl describe protectedapplication nginx
Name:         nginx
Namespace:    default
API Version:  gkebackup.gke.io/v1
Kind:         ProtectedApplication
Metadata:
  UID:               90c04a86-9dcd-48f2-abbf-5d84f979b2c2
Spec:
  Components:
    Name:           nginx
    Resource Kind:  Deployment
    Resource Names:
      nginx
    Strategy:
      Backup All Restore All:
        Backup Pre Hooks:
          Command:
             /sbin/fsfreeze
             -f
             /var/log/nginx
          Container:         nginx
          Name:              freeze
        Backup Post Hooks:
          Command:
             /sbin/fsfreeze
             -u
             /var/log/nginx
          Container:         nginx
          Name:              unfreeze
      Type:                  BackupAllRestoreAll
  Resource Selection:
    Selector:
      Match Labels:
        app:        nginx
    Type:           Selector
 Status:
  Ready To Backup:  true 
Events:             <none>

What's next