Transfer data

Data transfers can occur between the following:

  1. Persistent Volume Claim (PVC) and object storage
  2. Object storage and object storage (within GDC)

Object storage on GDC is S3-compatible and referred to as s3 type in Kubernetes yamls.

Types of data sources/destinations

  1. Object storage (referred to as 's3'): Object storage present on GDC
  2. Local storage (referred to as 'local'): Storage on attached PVCs

Copying from object storage to object storage

Ensure you have the following prerequisites:

  • An S3 Endpoint with read permissions for the source, and an s3 endpoint with write permissions for the destination.
  • If you do not have bucket creation permission with the credentials, the transfer fails if the destination bucket does not exist. Ensure the destination bucket exists if that is the case.
  • Privileges to create Jobs and create or read Secrets inside your cluster or namespace. See the following example for permissions.

Create a job

To create a job, work through these steps:

  1. Create a namespace:

    apiVersion: v1
    kind: Namespace
      name: transfer-ns
  2. Create credentials:

    apiVersion: v1
    kind: Secret
      name: src-secret
      namespace: transfer-ns
      access-key-id: NkFDTUg3WDBCVDlQMVpZMU5MWjU= # base 64 encoded version of key
      access-key: VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== # base 64 encoded version of secret key
    apiVersion: v1
    kind: Secret
      name: dst-secret
      namespace: transfer-ns
      access-key-id: NkFDTUg3WDBCVDlQMVpZMU5MWjU= # base 64 encoded version of key
      access-key: VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== # base 64 encoded version of secret key

    These credentials are the same that you obtained in the object storage section.

  3. Create a service account (SA) that is used by your transfer, and then add permissions to the account to read and write secrets using roles and role bindings. You do not need to add permissions if your default namespace SA or custom SA already has these permissions.

    apiVersion: v1
    kind: ServiceAccount
      name: transfer-service-account
      namespace: transfer-ns
    kind: Role
      name: read-secrets-role
      namespace: transfer-ns
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["get", "watch", "list"]
    kind: RoleBinding
      name: read-secrets-rolebinding
      namespace: transfer-ns
    - kind: ServiceAccount
      name: transfer-service-account
      namespace: transfer-ns
      kind: Role
      name: read-secrets-role
  4. Obtain the CA certificates for your object storage systems. You can obtain the same certificates from your AO/PA.

    apiVersion: v1
    kind: Secret
      name: src-cert
      namespace: transfer-ns
    apiVersion: v1
    kind: Secret
      name: dst-cert
      namespace: transfer-ns
      ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUtoaEJXWWo3VGZlUUZWUWo0U0RpckV3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF6TURZeU16TTROVEJhRncweQpNekEyTURReU16TTROVEJhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWIzRFFF== # base 64 encoded version of certificate. Can be same OR different than source certificate.
  5. Optional: Create a LoggingTarget to see transfer-service logs in Loki.

    kind: LoggingTarget
      namespace: transfer-ns # Same namespace as your transfer job
      name: logtarg1
      # Choose matching pattern that identifies pods for this job
      # Optional
      # Relationship between different selectors: AND
        # Choose pod name prefix(es) to consider for this job
        # Observability platform will scrape all pods
        # where names start with specified prefix(es)
        # Should contain [a-z0-9-] characters only
        # Relationship between different list elements: OR
          - transfer-job # Choose the prefix here that matches your transfer job name
      serviceName: transfer-service
  6. Create the job:

    apiVersion: batch/v1
    kind: Job
      name: transfer-job
      namespace: transfer-ns
          serviceAccountName: transfer-service-account #service account created earlier
            - name: storage-transfer-pod #
              imagePullPolicy: Always #will always pull the latest image
                - /storage-transfer
                - '' #Your endpoint here
                - '' #Your endpoint here
                - '--src_path=aecvd-bucket1' #Please use Fully Qualified Name
                - '--dst_path=aklow-bucket2' #Please use Fully Qualified Name
                - '--src_credentials=transfer-ns/src-secret' #Created earlier
                - '--dst_credentials=transfer-ns/dst-secret' #Created earlier
                - '--dst_ca_certificate_reference=transfer-ns/dst-cert' #Created earlier
                - '--src_ca_certificate_reference=transfer-ns/src-cert' #Created earlier
                - '--src_type=s3'
                - '--dst_type=s3'
                - '--bandwidth_limit=10M' #Optional of the form '10K', '100M', '1G' bytes per second
          restartPolicy: OnFailure #Will restart on failure.

Monitor your data transfer

After you instantiate the Job, you can monitor its status using kubectl commands, such as kubectl describe. To verify the transfer, list the objects inside of your destination bucket to validate that your data transferred. The data transfer tool is agnostic to the location of the endpoints involved in the transfer.

Run the following:

kubectl describe transfer-job -n transfer-ns

The preceding command tells you the status of the job.

The job prompts a pod to transfer the data. You can get the name of the pod and look at logs to see if there are any errors during the transfer.

To view pod logs, run the following:

kubectl logs transfer-job-<pod_id_suffix_obtained_from_describe_operation_on_job> -n transfer-ns

Successful job logs:

DEBUG : Starting main for transfer
I0607 21:34:39.183106       1 transfer.go:103]  "msg"="Starting transfer "  "destination"="sample-bucket" "source"="/data"
2023/06/07 21:34:39 NOTICE: Bandwidth limit set to {100Mi 100Mi}
I0607 21:34:49.238901       1 transfer.go:305]  "msg"="Job finished polling "  "Finished"=true "Number of Attempts"=2 "Success"=true
I0607 21:34:49.239675       1 transfer.go:153]  "msg"="Transfer completed."  "AvgSpeed"="10 KB/s" "Bytes Moved"="10.0 kB" "Errors"=0 "Files Moved"=10 "FilesComparedAtSourceAndDest"=3 "Time since beginning of transfer"="1.0s"

Viewing logs allows you to see the data transfer speed, which is not the same as bandwidth used, bytes moved, number of errored files, and files moved.

Copy block storage to object storage

Ensure that you meet the following prerequisites:

  • An S3 endpoint with a S3 key ID and secret access key with at least WRITE permissions to the dedicated bucket that you want to transfer data to.
  • A working cluster with connectivity to the S3 endpoint.
  • Privileges to create Jobs and Secrets inside your cluster.
  • For replication of block storage, a Pod with an attached PersistentVolumeClaim (PVC) that you want to back up to object storage, and privileges to inspect running Jobs and PVCs.
  • For replication of the block storage, a window during which no writes take place to the PersistentVolume (PV).
  • For the restoration of block storage from an object storage endpoint, privileges to allocate a PV with sufficient capacity.

To replicate a PV to object storage, you must attach a volume to an existing Pod. During the window of the transfer, the Pod must not perform any writes. To avoid detaching the mounted PV from the Job, the data transfer process works by running the transfer Job on the same machine as the Pod, and using a hostPath mount to expose the volume on the disk. In preparation for the transfer, you must first find the node on which the Pod is running, and additional metadata such as the Pod UID and PVC type to reference the appropriate path on the Node. You must substitute this metadata into the sample YAML file outlined in the following section.

Collect metadata

To collect the metadata required to create the data transfer Job, work through these steps:

  1. Find the Node that has the scheduled Pod:

    kubectl get pod POD_NAME -o jsonpath='{.spec.nodeName}'

    Record the output of this command as the NODE_NAME to use in the data transfer Job YAML file.

  2. Find the Pod UID:

    kubectl get pod POD_NAME -o 'jsonpath={.metadata.uid}'

    Record the output of this command as the POD_UID to use in the data transfer Job YAML file.

  3. Find the PVC name:

    kubectl get pvc www-web-0 -o 'jsonpath={.spec.volumeName}'

    Record the output of this command as the PVC_NAME to use in the data transfer Job YAML file.

  4. Find the PVC storage provisioner:

    kubectl get pvc www-web-0 -o jsonpath='{.metadata.annotations.volume\.v1\.kubernetes\.io\/storage-provisioner}'

    Record the output of this command as the PROVISIONER_TYPE to use in the data transfer Job YAML file.

Create secrets

To replicate file to object storage across clusters, you must first instantiate the secrets inside your Kubernetes cluster. You must use matching keys for the Secret data for the tool to pull the credentials.

To perform the transfer in an existing namespace, see the following example of creating Secrets in a transfer namespace:

apiVersion: v1
kind: Secret
  name: src-secret
  namespace: transfer
  access-key-id: c3JjLWtleQ== # echo -n src-key| base64 -w0
  access-key: c3JjLXNlY3JldA== # echo -n src-secret| base64 -w0
apiVersion: v1
kind: Secret
  name: dst-secret
  namespace: transfer
  access-key-id: ZHN0LWtleQ== # echo -n dst-key| base64 -w0
  access-key: ZHN0LXNlY3JldA== # echo -n dst-secret| base64 -w0

Create the Job

With the data that you collected in the previous section, create a Job with the data transfer tool. The data transfer Job has a hostPath mount referencing the path for the PV of interest, and a nodeSelector for the relevant node.

The following is an example of a data transfer Job:

apiVersion: batch/v1
kind: Job
  name: transfer-job
  namespace: transfer
      nodeSelector: NODE_NAME
      serviceAccountName: data-transfer-sa
      - name: storage-transfer-pod
        image: storage-transfer
        - /storage-transfer
        - --dst_endpoint=
        - --src_path=/pvc-data
        - --dst_path=transfer-dst-bucket
        - --dst_credentials=transfer/dst-secret
        - --src_type=local
        - --dst_type=s3
      - mountPath: /pvc-data
        name: pvc-volume
      - name: pvc-volume
        path: /var/lib/kubelet/pods/POD_UID/volumes/PROVISIONER_TYPE/PVC_NAME
      restartPolicy: Never

As with the S3 data transfer, you must create a Secret containing the access keys for the destination endpoint in the Kubernetes cluster, and the data transfer Job must run with a service account with adequate privileges to read the Secret from the API server. Monitor the status of the transfer with standard kubectl commands operating on the Job.

Consider the following details when transferring block storage to object storage:

  • By default, symbolic links follow and replicate to the object storage, but a deep rather than shallow copy performs. Upon restoration, it destroys symlinks.
  • As with object storage replication, cloning into a subdirectory of the bucket is destructive. Ensure that the bucket is available exclusively for your volume.

Restore from object storage to block storage

Allocate a PV

To restore block storage from an object storage endpoint, follow these steps:

  1. Allocate a persistent volume to target in the restore. Use a PVC to allocate the volume, as shown in the following example:

    apiVersion: v1
    kind: PersistentVolumeClaim
      name: restore-pvc
      namespace: restore-ns
      storageClassName: "default"
          storage: 1Gi # Need sufficient capacity for full restoration.
  2. Check the status of the PVC:

    kubectl get pvc restore-pvc -n restore-ns

    After the PVC is in a Bound state, it is ready to consume inside the Pod that rehydrates it.

  3. If a Stateful set eventually consumes the PV, you must match the rendered StatefulSet PVCs. The Pods that StatefulSet produces consumes the hydrated volumes. The following example shows volume claim templates in a StatefulSet named ss.

      - metadata:
          name: pvc-name
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: "default"
              storage: 1Gi
  4. Pre-allocate PVCs with names such as ss-pvc-name-0 and ss-pvc-name-1 to ensure that the resultant Pods consume the pre-allocated volumes.

Hydrate the PV

After the PVC is bound to a PV, start the Job to populate the PV:

apiVersion: batch/v1
kind: Job
  name: transfer-job
  namespace: transfer
      serviceAccountName: data-transfer-sa
      - name: data-transfer-restore-volume
          claimName: restore-pvc
      - name: storage-transfer-pod
        image: storage-transfer
        - /storage-transfer
        - --src_endpoint=
        - --src_path=/your-src-bucket
        - --src_credentials=transfer/src-secret
        - --dst_path=/restore-pv-mnt-path
        - --src_type=s3
        - --dst_type=local
      - mountPath: /restore-pv-mnt-path
        name: data-transfer-restore-volume

After the Job has finished running, the data from the object storage bucket populates the volume. A separate Pod can consume the data by using the same standard mechanisms for mounting a volume.