Exporting streaming PVs to permanent storage

During migration, Migrate for Anthos provisions short-term storage for streaming from your source VM disks. In order for your workload to run independently of the source VM disks, you'll need to export its storage to a standalone Persistent Volume (PV).

Before doing this, test your applications based on what they are designed to do. When your migrated VMs function as they should, you're ready to export storage. After your tests pass while migrated VMs are using streaming storage, you can begin to export to permanent storage.

In this topic's example, you will finalize the migration to GKE by moving your workload's PVs to Compute Engine Persistent Disk using the Migrate for Anthos storage exporter. Note that you can move to any GKE-compatible Persistent Volume that meets your requirements.

For more about the parts of migration that precede this, see Getting started with Migrate for Anthos.

Calculate destination disk size

You'll need to estimate the destination disk size needed by the workload to be migrated from the source. With an estimate, you can allocate the required destination disk size needed (such as a Compute Engine persistent disk).

You can calculate the amount of disk currently being used (even where multiple source disks are used) using a script provided, as described in the following steps.

  1. From the command line, run the following kubectl command to execute a provided script that discovers disk usage:

    kubectl exec container-name /code/disk_usage.py

    For example:

    kubectl exec liveapp1-0 /code/disk_usage.py

    You should see output such as the following:

    /dev/mapper/mpathb-part1 on /rootfs (ext4) - Total Size: 3GB Used: 1GB
    /dev/mapper/data__velos_8283ce1c-home on /rootfs/home (ext4) - Total Size: 9GB Used: 550MB
    /dev/mapper/data__velos_8283ce1c-log on /rootfs/var/log (ext4) - Total Size: 4GB Used: 282MB
    Total used: 2GB
  2. Be sure to add a margin to accommodate future growth to the total. Your margin will depend on your needs. Adding 15-25% is recommended, but you might want to add more. Just as with disks on VMs, resizing later can be a disruptive operation. Adding a margin can help you avoid the disruption.

  3. Use your disk size calculation (total used, plus margin) in the following sections when configuring storage in a Persistent Volume Claim.

Create your YAML to configure for migration

Create the configuration YAML to define the source and target for storage export, as well as the export action itself. You'll apply the configuration in a later step.

  1. Define the Storage Class and Persistent Volume Claim that will create the destination Compute Engine persistent disk.

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: [STORAGE_CLASS_NAME]
    provisioner: kubernetes.io/gce-pd
    parameters:
      type: pd-ssd
      replication-type: none
    
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      # Replace this with the name of your application
      name: [TARGET_PVC_NAME]
    spec:
      storageClassName: [STORAGE_CLASS_NAME]
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          # Replace this with the quantity you'll need in the target volume, such as
          # 20G. You can use the included script to make this calculation (see the
          # section earlier in this topic).
          storage: [TARGET_STORAGE]
    

    where:

    • [STORAGE_CLASS_NAME] is a name of your choice for the Storage Class.
    • [TARGET_PVC_NAME] is a name of your choice for the target Persistent Volume Claim.
    • [TARGET_STORAGE] is the target volume size.
  2. Add a ConfigMap definition to specify the locations and types of data to copy.

    dataFilter is a Migrate for Anthos field through which you can filter paths and files copied from the source PVC. In this example, preceding the paths and files listed with a minus sign specifies that each item in the list should be excluded from the copy. For example, the following excludes the /etc/fstab file:

    "- /etc/fstab"
    

    Generally, you might find yourself exporting more, rather than less, data than you need.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: [CONFIGMAP_NAME]
    data:
      config: |-
        appSpec:
          dataFilter:
            - "- *.swp"
            - "- /etc/fstab"
            - "- /boot/"
            - "- /tmp/*"
    

    where:

    • [CONFIGMAP_NAME] is a name of your choice for the config map.
  3. Add a Job object to define the export action that will copy data from the source PVC to the target PVC.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: [JOB_NAME]
    spec:
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
            anthos-migrate.gcr.io/action: export
            anthos-migrate.gcr.io/source-type: streaming-disk
            anthos-migrate.gcr.io/source-pvc: [SOURCE_PVC_NAME]
            anthos-migrate.gcr.io/target-pvc: [TARGET_PVC_NAME]
            anthos-migrate.gcr.io/config: [CONFIGMAP_NAME]
        spec:
          restartPolicy: OnFailure
          containers:
          - name: exporter-sample
            image: anthos-migrate.gcr.io/v2k-export:v1.0.1
    

    where:

    • [JOB_NAME] is a name of your choice for the job.
    • export is the action to take.
    • streaming-disk is the source type -- here, Migrate for Anthos storage used during migration streaming.
    • [SOURCE_PVC_NAME] is the PVC of your existing disk.
    • [TARGET_PVC_NAME] is the name you chose for the target PVC.
    • [CONFIGMAP_NAME] is the name you chose for the config map.

Delete the StatefulSet holding existing volumes

After creating your YAML, delete the StatefulSet that is holding onto the existing volumes. Don't worry, you will recreate it after the export completes.

kubectl delete statefulset statefulset-name

Apply the YAML and run the export

Apply the YAML to run the export jobs.

kubectl apply -f job-yaml

GKE will run the jobs. To check on export progress, use kubectl to get the pod name for the exporter job, then to get logs for the exporter.

kubectl get pod
NAME                                READY   STATUS   RESTARTS  AGE
exporter-sample-suitecrm-app-h8d3k  1/1     Running  0         51s
suitecrm-app-0                      1/1     Running  0         34m

In this example, the job's name is exporter-sample-suitecrm-app-h8d3k. Use this name to get a log displaying progress of the job. In particular, the log shows the last file copied (usr/share/info/FileName/) and the number of bytes copied (859MB of 981MB).

kubectl logs exporter-sample-suitecrm-app-h8d3k
...
D0923 15:53:10.00000    10 hclog.py:68] [util] - TAIL: 'usr/share/info/FileName/'
D0923 15:53:10.00000    10 hclog.py:68] [util] - PROGRESS: 859MB / 981MB

Once the jobs are marked as SUCCESSFUL, delete the exporter jobs and reconfigure the StatefulSet to reference the exported PV.

If you would like to relaunch a job with the same name you will need to delete it by running kubectl delete jobs.batch job-name. For example kubectl delete jobs.batch exporter-sample-suitecrm-app. You can also change the rename the job in your YAML.

Delete the exporter job

After the exporter job has completed, delete it. This releases the PersistentVolumeClaim so that you can reattach it to your StatefulSet.

kubectl delete job [EXPORTER_JOB]

For example, if your job's name is exporter-sample-suitecrm-app, run:

kubectl delete job exporter-sample-suitecrm-app

Run containers with exported storage

Here, you'll recreate your application's StatefulSet to use the exported storage.

  1. Define the StatefulSet with YAML.

    kind: StatefulSet
    apiVersion: apps/v1
    metadata:
      name: [STATEFULSET_NAME]
    spec:
      serviceName: "[SERVICE_NAME]"
      replicas: 1
      selector:
        matchLabels:
          app: [STATEFULSET_NAME]
      template:
        metadata:
          labels:
            app: [STATEFULSET_NAME]
          annotations:
            anthos-migrate.gcr.io/action: run
            anthos-migrate.gcr.io/source-type: exported
            anthos-migrate.gcr.io/source-pvc: [TARGET_PVC_NAME]
        spec:
          containers:
          - name: [STATEFULSET_NAME]
            image: anthos-migrate.gcr.io/v2k-run:v1.0.1
    

    where:

    • [STATEFULSET_NAME] is a name of your choice.
    • run is the action to take.
    • exported is the source type.
    • [TARGET_PVC_NAME] is the name you chose for the target Persistent Volume Claim.
  2. Apply the YAML to launch your containers:

    kubectl apply -f your-yaml
    

Delete streaming storage PVCs

The Persistent Volume Claim that streamed storage to the container is no longer needed. You can delete it.

Run the following command, setting source-pvc to the value of the anthos-migrate.gcr.io/source-pvc field above.

kubectl delete pvc source-pvc

After this step, your workload is ready to run independently of both the source VM and Migrate for Anthos components.