Transfer data

Data transfers can occur between the following:

  1. Persistent Volume Claim (PVC) and object storage
  2. Object storage and object storage (within GDC)

Object storage on GDC is S3-compatible and referred to as s3 type in Kubernetes yamls.

Types of data sources/destinations

  1. Object storage (referred to as 's3') : Object storage present on GDC
  2. Local storage (referred to as 'local') : Storage on attached PVCs

Copying from object storage to object storage

Ensure you have the following prerequisites:

  • An S3 Endpoint with read permissions for the source, and an s3 endpoint with write permissions for the destination.
  • If you do not have bucket creation permission with the credentials, the transfer fails if the destination bucket does not exist. Ensure the destination bucket exists if that is the case.
  • Privileges to create Jobs and create or read Secrets inside your cluster or namespace. See the following example for permissions.

Create a job

To create a job, work through these steps:

  1. Create a namespace:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: transfer-ns
    
  2. Create credentials:

    ---
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: src-secret
      namespace: transfer-ns
    data:
      access-key-id: NkFDTUg3WDBCVDlQMVpZMU5MWjU= # base 64 encoded version of key
      access-key: VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== # base 64 encoded version of secret key
    ---
    apiVersion: v1
    kind: Secret
    metadata:
      name: dst-secret
      namespace: transfer-ns
    data:
      access-key-id: NkFDTUg3WDBCVDlQMVpZMU5MWjU= # base 64 encoded version of key
      access-key: VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== # base 64 encoded version of secret key
    ---
    

    These credentials are the same that you obtained in the object storage section.

  3. Create a service account (SA) that is used by your transfer, and then add permissions to the account to read and write secrets using roles and role bindings. You do not need to add permissions if your default namespace SA or custom SA already has these permissions.

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: transfer-service-account
      namespace: transfer-ns
    ---
    
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: read-secrets-role
      namespace: transfer-ns
    rules:
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["get", "watch", "list"]
    
    ---
    
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: read-secrets-rolebinding
      namespace: transfer-ns
    subjects:
    - kind: ServiceAccount
      name: transfer-service-account
      namespace: transfer-ns
    roleRef:
      kind: Role
      name: read-secrets-role
      apiGroup: rbac.authorization.k8s.io
    
    ---
    
  4. Obtain the CA certificates for your object storage systems. You can obtain the same certificates from your AO/PA.

    ---
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: src-cert
      namespace: transfer-ns
    data:
      ca.crt : LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUpHM2psOFZhTU85a1FteGdXUFl3N3d3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF5TVRVd01USXlNakZhRncweQpNekExTVRZd01USXlNakZhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWI== # base 64 encoded version of certificate
    
    ---
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: dst-cert
      namespace: transfer-ns
    data:
      ca.crt : LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUtoaEJXWWo3VGZlUUZWUWo0U0RpckV3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF6TURZeU16TTROVEJhRncweQpNekEyTURReU16TTROVEJhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWIzRFFF== # base 64 encoded version of certificate. Can be same OR different than source certificate.
    
    ---
    
    
  5. Optional: Create a LoggingTarget to see transfer-service logs in Loki.

    apiVersion: logging.gdc.goog/v1
    kind: LoggingTarget
    metadata:
      namespace: transfer-ns # Same namespace as your transfer job
      name: logtarg1
    spec:
      # Choose matching pattern that identifies pods for this job
      # Optional
      # Relationship between different selectors: AND
      selector:
    
        # Choose pod name prefix(es) to consider for this job
        # Observability platform will scrape all pods
        # where names start with specified prefix(es)
        # Should contain [a-z0-9-] characters only
        # Relationship between different list elements: OR
        matchPodNames:
          - transfer-job # Choose the prefix here that matches your transfer job name
      serviceName: transfer-service
    
  6. Create the job:

    ---
    
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: transfer-job
      namespace: transfer-ns
    spec:
      template:
        spec:
          serviceAccountName: transfer-service-account #service account created earlier
          containers:
            - name: storage-transfer-pod #
              image: gcr.io/private-cloud-staging/storage-transfer:latest
              imagePullPolicy: Always #will always pull the latest image
              command:
                - /storage-transfer
              args:
                - '--src_endpoint=objectstorage.zone1.google.gdch.test' #Your endpoint here
                - '--dst_endpoint=objectstorage.zone1.google.gdch.test' #Your endpoint here
                - '--src_path=aecvd-bucket1' #Please use Fully Qualified Name
                - '--dst_path=aklow-bucket2' #Please use Fully Qualified Name
                - '--src_credentials=transfer-ns/src-secret' #Created earlier
                - '--dst_credentials=transfer-ns/dst-secret' #Created earlier
                - '--dst_ca_certificate_reference=transfer-ns/dst-cert' #Created earlier
                - '--src_ca_certificate_reference=transfer-ns/src-cert' #Created earlier
                - '--src_type=s3'
                - '--dst_type=s3'
                - '--bandwidth_limit=10M' #Optional of the form '10K', '100M', '1G' bytes per second
          restartPolicy: OnFailure #Will restart on failure.
    ---
    

Monitor your data transfer

After you instantiate the Job, you can monitor its status using kubectl commands, such as kubectl describe. To verify the transfer, list the objects inside of your destination bucket to validate that your data transferred. The data transfer tool is agnostic to the location of the endpoints involved in the transfer.

Run the following:

kubectl describe transfer-job -n transfer-ns

The preceding command tells you the status of the job.

The job prompts a pod to transfer the data. You can get the name of the pod and look at logs to see if there are any errors during the transfer.

To view pod logs, run the following:

kubectl logs transfer-job-<pod_id_suffix_obtained_from_describe_operation_on_job> -n transfer-ns

Successful job logs:

DEBUG : Starting main for transfer
I0607 21:34:39.183106       1 transfer.go:103]  "msg"="Starting transfer "  "destination"="sample-bucket" "source"="/data"
2023/06/07 21:34:39 NOTICE: Bandwidth limit set to {100Mi 100Mi}
I0607 21:34:49.238901       1 transfer.go:305]  "msg"="Job finished polling "  "Finished"=true "Number of Attempts"=2 "Success"=true
I0607 21:34:49.239675       1 transfer.go:153]  "msg"="Transfer completed."  "AvgSpeed"="10 KB/s" "Bytes Moved"="10.0 kB" "Errors"=0 "Files Moved"=10 "FilesComparedAtSourceAndDest"=3 "Time since beginning of transfer"="1.0s"

Viewing logs allows you to see the data transfer speed, which is not the same as bandwidth used, bytes moved, number of errored files, and files moved.

Copy block storage to object storage

Ensure that you meet the following prerequisites:

  • An S3 endpoint with a S3 key ID and secret access key with at least WRITE permissions to the dedicated bucket that you want to transfer data to.
  • A working cluster with connectivity to the S3 endpoint.
  • Privileges to create Jobs and Secrets inside your cluster.
  • For replication of block storage, a Pod with an attached PersistentVolumeClaim (PVC) that you want to back up to object storage, and privileges to inspect running Jobs and PVCs.
  • For replication of the block storage, a window during which no writes take place to the PersistentVolume (PV).
  • For the restoration of block storage from an object storage endpoint, privileges to allocate a PV with sufficient capacity.

To replicate a PV to object storage, you must attach a volume to an existing Pod. During the window of the transfer, the Pod must not perform any writes. To avoid detaching the mounted PV from the Job, the data transfer process works by running the transfer Job on the same machine as the Pod, and using a hostPath mount to expose the volume on the disk. In preparation for the transfer, you must first find the node on which the Pod is running, and additional metadata such as the Pod UID and PVC type to reference the appropriate path on the Node. You must substitute this metadata into the sample YAML file outlined in the following section.

Collect metadata

To collect the metadata required to create the data transfer Job, work through these steps:

  1. Find the Node that has the scheduled Pod:

    kubectl get pod POD_NAME -o jsonpath='{.spec.nodeName}'
    

    Record the output of this command as the NODE_NAME to use in the data transfer Job YAML file.

  2. Find the Pod UID:

    kubectl get pod POD_NAME -o 'jsonpath={.metadata.uid}'
    

    Record the output of this command as the POD_UID to use in the data transfer Job YAML file.

  3. Find the PVC name:

    kubectl get pvc www-web-0 -o 'jsonpath={.spec.volumeName}'
    

    Record the output of this command as the PVC_NAME to use in the data transfer Job YAML file.

  4. Find the PVC storage provisioner:

    kubectl get pvc www-web-0 -o jsonpath='{.metadata.annotations.volume\.v1\.kubernetes\.io\/storage-provisioner}'
    

    Record the output of this command as the PROVISIONER_TYPE to use in the data transfer Job YAML file.

Create secrets

To replicate file to object storage across clusters, you must first instantiate the secrets inside your Kubernetes cluster. You must use matching keys for the Secret data for the tool to pull the credentials.

To perform the transfer in an existing namespace, see the following example of creating Secrets in a transfer namespace:

apiVersion: v1
kind: Secret
metadata:
  name: src-secret
  namespace: transfer
data:
  access-key-id: c3JjLWtleQ== # echo -n src-key| base64 -w0
  access-key: c3JjLXNlY3JldA== # echo -n src-secret| base64 -w0
---
apiVersion: v1
kind: Secret
metadata:
  name: dst-secret
  namespace: transfer
data:
  access-key-id: ZHN0LWtleQ== # echo -n dst-key| base64 -w0
  access-key: ZHN0LXNlY3JldA== # echo -n dst-secret| base64 -w0

Create the Job

With the data that you collected in the previous section, create a Job with the data transfer tool. The data transfer Job has a hostPath mount referencing the path for the PV of interest, and a nodeSelector for the relevant node.

The following is an example of a data transfer Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: transfer-job
  namespace: transfer
spec:
  template:
    spec:
      nodeSelector: NODE_NAME
      serviceAccountName: data-transfer-sa
      containers:
      - name: storage-transfer-pod
        image: storage-transfer
        command:
        - /storage-transfer
        args:
        - --dst_endpoint=https://your-dst-endpoint.com
        - --src_path=/pvc-data
        - --dst_path=transfer-dst-bucket
        - --dst_credentials=transfer/dst-secret
        - --src_type=local
        - --dst_type=s3
      volumeMounts:
      - mountPath: /pvc-data
        name: pvc-volume
      volumes:
      - name: pvc-volume
      hostPath:
        path: /var/lib/kubelet/pods/POD_UID/volumes/PROVISIONER_TYPE/PVC_NAME
      restartPolicy: Never

As with the S3 data transfer, you must create a Secret containing the access keys for the destination endpoint in the Kubernetes cluster, and the data transfer Job must run with a service account with adequate privileges to read the Secret from the API server. Monitor the status of the transfer with standard kubectl commands operating on the Job.

Consider the following details when transferring block storage to object storage:

  • By default, symbolic links follow and replicate to the object storage, but a deep rather than shallow copy performs. Upon restoration, it destroys symlinks.
  • As with object storage replication, cloning into a subdirectory of the bucket is destructive. Ensure that the bucket is available exclusively for your volume.

Restore from object storage to block storage

Allocate a PV

To restore block storage from an object storage endpoint, follow these steps:

  1. Allocate a persistent volume to target in the restore. Use a PVC to allocate the volume, as shown in the following example:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: restore-pvc
      namespace: restore-ns
    spec:
      storageClassName: "default"
      accessModes:
    ReadWriteOnce
      resources:
        requests:
          storage: 1Gi # Need sufficient capacity for full restoration.
    
  2. Check the status of the PVC:

    kubectl get pvc restore-pvc -n restore-ns
    

    After the PVC is in a Bound state, it is ready to consume inside the Pod that rehydrates it.

  3. If a Stateful set eventually consumes the PV, you must match the rendered StatefulSet PVCs. The Pods that StatefulSet produces consumes the hydrated volumes. The following example shows volume claim templates in a StatefulSet named ss.

      volumeClaimTemplates:
      - metadata:
          name: pvc-name
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: "default"
          resources:
            requests:
              storage: 1Gi
    
  4. Pre-allocate PVCs with names such as ss-pvc-name-0 and ss-pvc-name-1 to ensure that the resultant Pods consume the pre-allocated volumes.

Hydrate the PV

After the PVC is bound to a PV, start the Job to populate the PV:

apiVersion: batch/v1
kind: Job
metadata:
  name: transfer-job
  namespace: transfer
spec:
  template:
    spec:
      serviceAccountName: data-transfer-sa
      volumes:
      - name: data-transfer-restore-volume
        persistentVolumeClaim:
          claimName: restore-pvc
      containers:
      - name: storage-transfer-pod
        image: storage-transfer
        command:
        - /storage-transfer
        args:
        - --src_endpoint=https://your-src-endpoint.com
        - --src_path=/your-src-bucket
        - --src_credentials=transfer/src-secret
        - --dst_path=/restore-pv-mnt-path
        - --src_type=s3
        - --dst_type=local
      volumeMounts:
      - mountPath: /restore-pv-mnt-path
        name: data-transfer-restore-volume

After the Job has finished running, the data from the object storage bucket populates the volume. A separate Pod can consume the data by using the same standard mechanisms for mounting a volume.