Back up and restore notebook data

Google Distributed Cloud (GDC) air-gapped lets you back up and restore the data in the home directory of your Vertex AI Workbench JupyterLab instances. For information about Vertex AI Workbench notebooks, see Create a notebook.

Create a protected application to use in backup or restore

You can define protected applications that back up the home directory of an individual JupyterLab instance or the home directories of all JupyterLab instances in a project at once.

Create a ProtectedApplication custom resource (CR) in the user cluster where you want to schedule backups. Backup and restore plans use protected applications to select resources. For more information about creating protected applications, see Protected application strategies.

The ProtectedApplication CR contains the following fields:

Field Description
resourceSelection Configure how the ProtectedApplication CR selects resources for backup or restore.
type Choose the method used to select resources. The value Selector indicates that the matching labels must select resources.
selector Define the selection rules.
matchLabels Configure the labels that the ProtectedApplication CR uses to match resources.
app.kubernetes.io/part-of Select resources created by Vertex AI Workbench that provide storage for JupyterLab instances.
app.kubernetes.io/component Select resources created by Vertex AI Workbench that provide storage for JupyterLab instances.
app.kubernetes.io/instance Narrow the scope to select resources for the JupyterLab instance. The value is the same as the JupyterLab instance name shown on the GDC console.

The following example shows a ProtectedApplication CR that selects the storage for a JupyterLab instance named my-jupyterlab-instance-name in the my-project namespace.

apiVersion: gkebackup.gke.io/v1
kind: ProtectedApplication
metadata:
  name: my-protected-application
  namespace: my-project
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app.kubernetes.io/part-of: vtxwb
        app.kubernetes.io/component: storage
        app.kubernetes.io/instance: my-jupyterlab-instance-name

The following example shows a ProtectedApplication CR that selects the storage for all JupyterLab instances in the my-project project:

apiVersion: gkebackup.gke.io/v1
kind: ProtectedApplication
metadata:
  name: my-protected-application
  namespace: my-project
spec:
  resourceSelection:
    type: Selector
    selector:
      matchLabels:
        app.kubernetes.io/part-of: vtxwb
        app.kubernetes.io/component: storage

Back up and restore JupyterLab instance data

To back up and restore JupyterLab instance data, follow the instructions to plan a set of backups and plan a set of restores using the ProtectedApplication CR you defined.

Copy restored data to a new JupyterLab instance

To copy restored data from the PersistentVolumeClaim resource of the JupyterLab instance to a new JupyterLab instance, follow these steps:

  1. To get the permissions that you need to copy restored data, ask your Organization IAM Admin to grant you the User Cluster Developer (user-cluster-developer) role.
  2. Use the GDC console to create a JupyterLab instance where the restored data must be copied.
  3. Get the name of the JupyterLab instance Pod resource:

    kubectl get pods -l notebook-name=INSTANCE_NAME -n PROJECT_NAMESPACE
    

    Replace the following:

    • INSTANCE_NAME: the name of the JupyterLab instance you created in the previous step.
    • PROJECT_NAMESPACE: the name of the project namespace where you created the JupyterLab instance.
  4. Get the name of the image that the JupyterLab instance is running:

    kubectl get pods POD_NAME -n PROJECT_NAMESPACE -o jsonpath="{.spec.containers[0].image}"
    

    Replace the following:

    • POD_NAME: the name of the JupyterLab instance Pod resource you obtained in the previous step.
    • PROJECT_NAMESPACE: the name of the project namespace where you created the JupyterLab instance.
  5. Find the name of the PersistentVolumeClaim resource that was restored in your user cluster.

    kubectl get pvc -l app.kubernetes.io/part-of=vtxwb,app.kubernetes.io/component=storage,app.kubernetes.io/instance=RESTORED_INSTANCE_NAME -n my-namespace
    

    Replace the following:

    • RESTORED_INSTANCE_NAME: the name of the JupyterLab instance that was restored.
    • PROJECT_NAMESPACE: the name of the project namespace where you created the JupyterLab instance.
  6. Create a YAML file named vtxwb-data.yaml with the following content:

    apiVersion: v1
    kind: Pod
    metadata:
      name: vtxwb-data
      namespace: PROJECT_NAMESPACE
      labels:
        aiplatform.gdc.goog/service-type: workbench
    spec:
      containers:
      - args:
        - sleep infinity
        command:
        - bash
        - -c
        image: IMAGE_NAME
        imagePullPolicy: IfNotPresent
        name: vtxwb-data
        resources:
          limits:
            cpu: "1"
            memory: 1Gi
          requests:
            cpu: "1"
            memory: 1Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /home/jovyan
          name: restore-data
        workingDir: /home/jovyan
      volumes:
      - name: restore-data
        persistentVolumeClaim:
          claimName: RESTORED_PVC_NAME
    

    Replace the following:

    • PROJECT_NAMESPACE: the name of the project namespace where you created the JupyterLab instance.
    • IMAGE_NAME: the name of the container image that the JupyterLab instance is running and that you obtained.
    • RESTORED_PVC_NAME: the name of the restored PersistentVolumeClaim resource that you obtained in the previous step.
  7. Create a new Pod for your restored PersistentVolumeClaim resource:

    kubectl apply -f ./vtxwb-data --kubeconfig USER_CLUSTER_KUBECONFIG
    

    Replace USER_CLUSTER_KUBECONFIG with the path of the kubeconfig file in the user cluster.

  8. Wait for the vtxwb-data pod to reach the RUNNING state.

  9. Copy your restored data to a new JupyterLab instance:

    kubectl cp PROJECT_NAMESPACE/vtxwb-data:/home/jovyan ./restore --kubeconfig USER_CLUSTER_KUBECONFIG
    
    kubectl cp ./restore PROJECT_NAMESPACE/POD_NAME:/home/jovyan/restore --kubeconfig USER_CLUSTER_KUBECONFIG
    
    rm ./restore
    

    Replace the following:

    • PROJECT_NAMESPACE: the name of the project namespace where you created the JupyterLab instance.
    • USER_CLUSTER_KUBECONFIG: the path of the kubeconfig file in the user cluster.
    • POD_NAME: the name of the JupyterLab instance Pod resource you obtained in the previous step.

    After copying the data, your restored data is available in the /home/jovyan/restore directory.

  10. Delete the Pod resource that you created to access your restored data:

    kubectl delete pod vtxwb-data -n my-namespace` --kubeconfig USER_CLUSTER_KUBECONFIG
    

    Replace USER_CLUSTER_KUBECONFIG with the path of the kubeconfig file in the user cluster.