Google Distributed Cloud (GDC) air-gapped lets you back up and restore the data in the home directory of your Vertex AI Workbench JupyterLab instances. For information about Vertex AI Workbench notebooks, see Create a notebook.
Create a protected application to use in backup or restore
You can define protected applications that back up the home directory of an individual JupyterLab instance or the home directories of all JupyterLab instances in a project at once.
Create a ProtectedApplication
custom resource (CR) in the user cluster where
you want to schedule backups. Backup and restore plans use protected
applications to select resources. For more information about creating protected
applications, see Protected application strategies.
The ProtectedApplication
CR contains the following fields:
Field | Description | |||
---|---|---|---|---|
resourceSelection |
Configure how the ProtectedApplication CR selects resources for backup or restore. |
|||
type |
Choose the method used to select resources. The value Selector indicates that the matching labels must select resources. |
|||
selector |
Define the selection rules. | |||
matchLabels |
Configure the labels that the ProtectedApplication CR uses to match resources. |
|||
app.kubernetes.io/part-of |
Select resources created by Vertex AI Workbench that provide storage for JupyterLab instances. | |||
app.kubernetes.io/component |
Select resources created by Vertex AI Workbench that provide storage for JupyterLab instances. | |||
app.kubernetes.io/instance |
Narrow the scope to select resources for the JupyterLab instance. The value is the same as the JupyterLab instance name shown on the GDC console. |
The following example shows a ProtectedApplication
CR that selects the storage
for a JupyterLab instance named my-jupyterlab-instance-name
in the my-project
namespace.
apiVersion: gkebackup.gke.io/v1
kind: ProtectedApplication
metadata:
name: my-protected-application
namespace: my-project
spec:
resourceSelection:
type: Selector
selector:
matchLabels:
app.kubernetes.io/part-of: vtxwb
app.kubernetes.io/component: storage
app.kubernetes.io/instance: my-jupyterlab-instance-name
The following example shows a ProtectedApplication
CR that selects the storage
for all JupyterLab instances in the my-project
project:
apiVersion: gkebackup.gke.io/v1
kind: ProtectedApplication
metadata:
name: my-protected-application
namespace: my-project
spec:
resourceSelection:
type: Selector
selector:
matchLabels:
app.kubernetes.io/part-of: vtxwb
app.kubernetes.io/component: storage
Back up and restore JupyterLab instance data
To back up and restore JupyterLab instance data, follow the instructions to plan a set of backups and plan a set of restores using the ProtectedApplication
CR you defined.
Copy restored data to a new JupyterLab instance
To copy restored data from the PersistentVolumeClaim
resource of the JupyterLab instance to a new JupyterLab instance, follow these steps:
- To get the permissions that you need to copy restored data, ask your Organization IAM Admin to grant you the User Cluster Developer (
user-cluster-developer
) role. - Use the GDC console to create a JupyterLab instance where the restored data must be copied.
Get the name of the JupyterLab instance
Pod
resource:kubectl get pods -l notebook-name=INSTANCE_NAME -n PROJECT_NAMESPACE
Replace the following:
INSTANCE_NAME
: the name of the JupyterLab instance you created in the previous step.PROJECT_NAMESPACE
: the name of the project namespace where you created the JupyterLab instance.
Get the name of the image that the JupyterLab instance is running:
kubectl get pods POD_NAME -n PROJECT_NAMESPACE -o jsonpath="{.spec.containers[0].image}"
Replace the following:
POD_NAME
: the name of the JupyterLab instancePod
resource you obtained in the previous step.PROJECT_NAMESPACE
: the name of the project namespace where you created the JupyterLab instance.
Find the name of the
PersistentVolumeClaim
resource that was restored in your user cluster.kubectl get pvc -l app.kubernetes.io/part-of=vtxwb,app.kubernetes.io/component=storage,app.kubernetes.io/instance=RESTORED_INSTANCE_NAME -n my-namespace
Replace the following:
RESTORED_INSTANCE_NAME
: the name of the JupyterLab instance that was restored.PROJECT_NAMESPACE
: the name of the project namespace where you created the JupyterLab instance.
Create a YAML file named
vtxwb-data.yaml
with the following content:apiVersion: v1 kind: Pod metadata: name: vtxwb-data namespace: PROJECT_NAMESPACE labels: aiplatform.gdc.goog/service-type: workbench spec: containers: - args: - sleep infinity command: - bash - -c image: IMAGE_NAME imagePullPolicy: IfNotPresent name: vtxwb-data resources: limits: cpu: "1" memory: 1Gi requests: cpu: "1" memory: 1Gi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /home/jovyan name: restore-data workingDir: /home/jovyan volumes: - name: restore-data persistentVolumeClaim: claimName: RESTORED_PVC_NAME
Replace the following:
PROJECT_NAMESPACE
: the name of the project namespace where you created the JupyterLab instance.IMAGE_NAME
: the name of the container image that the JupyterLab instance is running and that you obtained.RESTORED_PVC_NAME
: the name of the restoredPersistentVolumeClaim
resource that you obtained in the previous step.
Create a new
Pod
for your restoredPersistentVolumeClaim
resource:kubectl apply -f ./vtxwb-data --kubeconfig USER_CLUSTER_KUBECONFIG
Replace
USER_CLUSTER_KUBECONFIG
with the path of the kubeconfig file in the user cluster.Wait for the
vtxwb-data
pod to reach theRUNNING
state.Copy your restored data to a new JupyterLab instance:
kubectl cp PROJECT_NAMESPACE/vtxwb-data:/home/jovyan ./restore --kubeconfig USER_CLUSTER_KUBECONFIG kubectl cp ./restore PROJECT_NAMESPACE/POD_NAME:/home/jovyan/restore --kubeconfig USER_CLUSTER_KUBECONFIG rm ./restore
Replace the following:
PROJECT_NAMESPACE
: the name of the project namespace where you created the JupyterLab instance.USER_CLUSTER_KUBECONFIG
: the path of the kubeconfig file in the user cluster.POD_NAME
: the name of the JupyterLab instancePod
resource you obtained in the previous step.
After copying the data, your restored data is available in the
/home/jovyan/restore
directory.Delete the
Pod
resource that you created to access your restored data:kubectl delete pod vtxwb-data -n my-namespace` --kubeconfig USER_CLUSTER_KUBECONFIG
Replace
USER_CLUSTER_KUBECONFIG
with the path of the kubeconfig file in the user cluster.