Data transfers can occur between the following:
- Persistent Volume Claim (PVC) and object storage
- Object storage and object storage (within GDC)
Object storage on GDC is S3-compatible and referred to as s3
type in Kubernetes yamls.
Types of data sources/destinations
- Object storage (referred to as 's3') : Object storage present on GDC
- Local storage (referred to as 'local') : Storage on attached PVCs
Copying from object storage to object storage
Ensure you have the following prerequisites:
- An S3 Endpoint with read permissions for the source, and an s3 endpoint with write permissions for the destination.
- If you do not have bucket creation permission with the credentials, the transfer fails if the destination bucket does not exist. Ensure the destination bucket exists if that is the case.
- Privileges to create Jobs and create or read Secrets inside your cluster or namespace. See the following example for permissions.
Create a job
To create a job, work through these steps:
Create a namespace:
apiVersion: v1 kind: Namespace metadata: name: transfer-ns
Create credentials:
--- apiVersion: v1 kind: Secret metadata: name: src-secret namespace: transfer-ns data: access-key-id: NkFDTUg3WDBCVDlQMVpZMU5MWjU= # base 64 encoded version of key access-key: VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== # base 64 encoded version of secret key --- apiVersion: v1 kind: Secret metadata: name: dst-secret namespace: transfer-ns data: access-key-id: NkFDTUg3WDBCVDlQMVpZMU5MWjU= # base 64 encoded version of key access-key: VkRkeWJsbFgzb2FZanMvOVpnSi83SU5YUjk3Y0Q2TUdxZ2d4Q3dpdw== # base 64 encoded version of secret key ---
These credentials are the same that you obtained in the object storage section.
Create a service account (SA) that is used by your transfer, and then add permissions to the account to read and write secrets using roles and role bindings. You do not need to add permissions if your default namespace SA or custom SA already has these permissions.
--- apiVersion: v1 kind: ServiceAccount metadata: name: transfer-service-account namespace: transfer-ns --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: read-secrets-role namespace: transfer-ns rules: - apiGroups: [""] resources: ["secrets"] verbs: ["get", "watch", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-secrets-rolebinding namespace: transfer-ns subjects: - kind: ServiceAccount name: transfer-service-account namespace: transfer-ns roleRef: kind: Role name: read-secrets-role apiGroup: rbac.authorization.k8s.io ---
Obtain the CA certificates for your object storage systems. You can obtain the same certificates from your AO/PA.
--- apiVersion: v1 kind: Secret metadata: name: src-cert namespace: transfer-ns data: ca.crt : LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUpHM2psOFZhTU85a1FteGdXUFl3N3d3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF5TVRVd01USXlNakZhRncweQpNekExTVRZd01USXlNakZhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWI== # base 64 encoded version of certificate --- apiVersion: v1 kind: Secret metadata: name: dst-cert namespace: transfer-ns data: ca.crt : LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURBekNDQWV1Z0F3SUJBZ0lSQUtoaEJXWWo3VGZlUUZWUWo0U0RpckV3RFFZSktvWklodmNOQVFFTEJRQXcKR3pFWk1CY0dBMVVFQXhNUVltOXZkSE4wY21Gd0xYZGxZaTFqWVRBZUZ3MHlNekF6TURZeU16TTROVEJhRncweQpNekEyTURReU16TTROVEJhTUJzeEdUQVhCZ05WQkFNVEVHSnZiM1J6ZEhKaGNDMTNaV0l0WTJFd2dnRWlNQTBHCkNTcUdTSWIzRFFF== # base 64 encoded version of certificate. Can be same OR different than source certificate. ---
Optional: Create a LoggingTarget to see transfer-service logs in Loki.
apiVersion: logging.gdc.goog/v1 kind: LoggingTarget metadata: namespace: transfer-ns # Same namespace as your transfer job name: logtarg1 spec: # Choose matching pattern that identifies pods for this job # Optional # Relationship between different selectors: AND selector: # Choose pod name prefix(es) to consider for this job # Observability platform will scrape all pods # where names start with specified prefix(es) # Should contain [a-z0-9-] characters only # Relationship between different list elements: OR matchPodNames: - transfer-job # Choose the prefix here that matches your transfer job name serviceName: transfer-service
Create the job:
--- apiVersion: batch/v1 kind: Job metadata: name: transfer-job namespace: transfer-ns spec: template: spec: serviceAccountName: transfer-service-account #service account created earlier containers: - name: storage-transfer-pod # image: gcr.io/private-cloud-staging/storage-transfer:latest imagePullPolicy: Always #will always pull the latest image command: - /storage-transfer args: - '--src_endpoint=objectstorage.zone1.google.gdch.test' #Your endpoint here - '--dst_endpoint=objectstorage.zone1.google.gdch.test' #Your endpoint here - '--src_path=aecvd-bucket1' #Please use Fully Qualified Name - '--dst_path=aklow-bucket2' #Please use Fully Qualified Name - '--src_credentials=transfer-ns/src-secret' #Created earlier - '--dst_credentials=transfer-ns/dst-secret' #Created earlier - '--dst_ca_certificate_reference=transfer-ns/dst-cert' #Created earlier - '--src_ca_certificate_reference=transfer-ns/src-cert' #Created earlier - '--src_type=s3' - '--dst_type=s3' - '--bandwidth_limit=10M' #Optional of the form '10K', '100M', '1G' bytes per second restartPolicy: OnFailure #Will restart on failure. ---
Monitor your data transfer
After you instantiate the Job, you can monitor its status using kubectl
commands, such as kubectl describe
. To verify the transfer, list the
objects inside of your destination bucket to validate that your data
transferred. The data transfer tool is agnostic to the location of the endpoints
involved in the transfer.
Run the following:
kubectl describe transfer-job -n transfer-ns
The preceding command tells you the status of the job.
The job prompts a pod to transfer the data. You can get the name of the pod and look at logs to see if there are any errors during the transfer.
To view pod logs, run the following:
kubectl logs transfer-job-<pod_id_suffix_obtained_from_describe_operation_on_job> -n transfer-ns
Successful job logs:
DEBUG : Starting main for transfer
I0607 21:34:39.183106 1 transfer.go:103] "msg"="Starting transfer " "destination"="sample-bucket" "source"="/data"
2023/06/07 21:34:39 NOTICE: Bandwidth limit set to {100Mi 100Mi}
I0607 21:34:49.238901 1 transfer.go:305] "msg"="Job finished polling " "Finished"=true "Number of Attempts"=2 "Success"=true
I0607 21:34:49.239675 1 transfer.go:153] "msg"="Transfer completed." "AvgSpeed"="10 KB/s" "Bytes Moved"="10.0 kB" "Errors"=0 "Files Moved"=10 "FilesComparedAtSourceAndDest"=3 "Time since beginning of transfer"="1.0s"
Viewing logs allows you to see the data transfer speed, which is not the same as bandwidth used, bytes moved, number of errored files, and files moved.
Copy block storage to object storage
Ensure that you meet the following prerequisites:
- An S3 endpoint with a S3 key ID and secret access key with at least WRITE permissions to the dedicated bucket that you want to transfer data to.
- A working cluster with connectivity to the S3 endpoint.
- Privileges to create Jobs and Secrets inside your cluster.
- For replication of block storage, a Pod with an attached
PersistentVolumeClaim
(PVC) that you want to back up to object storage, and privileges to inspect running Jobs and PVCs. - For replication of the block storage, a window during which no writes take
place to the
PersistentVolume
(PV). - For the restoration of block storage from an object storage endpoint, privileges to allocate a PV with sufficient capacity.
To replicate a PV to object storage, you must attach a volume to an existing Pod. During the window of the transfer, the Pod must not perform any writes. To avoid detaching the mounted PV from the Job, the data transfer process works by running the transfer Job on the same machine as the Pod, and using a hostPath mount to expose the volume on the disk. In preparation for the transfer, you must first find the node on which the Pod is running, and additional metadata such as the Pod UID and PVC type to reference the appropriate path on the Node. You must substitute this metadata into the sample YAML file outlined in the following section.
Collect metadata
To collect the metadata required to create the data transfer Job, work through these steps:
Find the Node that has the scheduled Pod:
kubectl get pod POD_NAME -o jsonpath='{.spec.nodeName}'
Record the output of this command as the NODE_NAME to use in the data transfer Job YAML file.
Find the Pod UID:
kubectl get pod POD_NAME -o 'jsonpath={.metadata.uid}'
Record the output of this command as the POD_UID to use in the data transfer Job YAML file.
Find the PVC name:
kubectl get pvc www-web-0 -o 'jsonpath={.spec.volumeName}'
Record the output of this command as the PVC_NAME to use in the data transfer Job YAML file.
Find the PVC storage provisioner:
kubectl get pvc www-web-0 -o jsonpath='{.metadata.annotations.volume\.v1\.kubernetes\.io\/storage-provisioner}'
Record the output of this command as the PROVISIONER_TYPE to use in the data transfer Job YAML file.
Create secrets
To replicate file to object storage across clusters, you must first instantiate the secrets inside your Kubernetes cluster. You must use matching keys for the Secret data for the tool to pull the credentials.
To perform the transfer in an existing namespace, see the following example of
creating Secrets in a transfer
namespace:
apiVersion: v1
kind: Secret
metadata:
name: src-secret
namespace: transfer
data:
access-key-id: c3JjLWtleQ== # echo -n src-key| base64 -w0
access-key: c3JjLXNlY3JldA== # echo -n src-secret| base64 -w0
---
apiVersion: v1
kind: Secret
metadata:
name: dst-secret
namespace: transfer
data:
access-key-id: ZHN0LWtleQ== # echo -n dst-key| base64 -w0
access-key: ZHN0LXNlY3JldA== # echo -n dst-secret| base64 -w0
Create the Job
With the data that you collected in the previous section, create a Job with the
data transfer tool. The data transfer Job has a hostPath
mount referencing the
path for the PV of interest, and a nodeSelector
for the relevant node.
The following is an example of a data transfer Job:
apiVersion: batch/v1
kind: Job
metadata:
name: transfer-job
namespace: transfer
spec:
template:
spec:
nodeSelector: NODE_NAME
serviceAccountName: data-transfer-sa
containers:
- name: storage-transfer-pod
image: storage-transfer
command:
- /storage-transfer
args:
- --dst_endpoint=https://your-dst-endpoint.com
- --src_path=/pvc-data
- --dst_path=transfer-dst-bucket
- --dst_credentials=transfer/dst-secret
- --src_type=local
- --dst_type=s3
volumeMounts:
- mountPath: /pvc-data
name: pvc-volume
volumes:
- name: pvc-volume
hostPath:
path: /var/lib/kubelet/pods/POD_UID/volumes/PROVISIONER_TYPE/PVC_NAME
restartPolicy: Never
As with the S3 data transfer, you must create a Secret
containing the access keys for the destination endpoint in the Kubernetes
cluster, and the data transfer Job must run with a service account with
adequate privileges to read the Secret from the API server. Monitor the status
of the transfer with standard kubectl
commands operating on the Job.
Consider the following details when transferring block storage to object storage:
- By default, symbolic links follow and replicate to the object storage, but a deep rather than shallow copy performs. Upon restoration, it destroys symlinks.
- As with object storage replication, cloning into a subdirectory of the bucket is destructive. Ensure that the bucket is available exclusively for your volume.
Restore from object storage to block storage
Allocate a PV
To restore block storage from an object storage endpoint, follow these steps:
Allocate a persistent volume to target in the restore. Use a PVC to allocate the volume, as shown in the following example:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: restore-pvc namespace: restore-ns spec: storageClassName: "default" accessModes: ReadWriteOnce resources: requests: storage: 1Gi # Need sufficient capacity for full restoration.
Check the status of the PVC:
kubectl get pvc restore-pvc -n restore-ns
After the PVC is in a
Bound
state, it is ready to consume inside the Pod that rehydrates it.If a Stateful set eventually consumes the PV, you must match the rendered StatefulSet PVCs. The Pods that StatefulSet produces consumes the hydrated volumes. The following example shows volume claim templates in a StatefulSet named
ss
.volumeClaimTemplates: - metadata: name: pvc-name spec: accessModes: [ "ReadWriteOnce" ] storageClassName: "default" resources: requests: storage: 1Gi
Pre-allocate PVCs with names such as
ss-pvc-name-0
andss-pvc-name-1
to ensure that the resultant Pods consume the pre-allocated volumes.
Hydrate the PV
After the PVC is bound to a PV, start the Job to populate the PV:
apiVersion: batch/v1
kind: Job
metadata:
name: transfer-job
namespace: transfer
spec:
template:
spec:
serviceAccountName: data-transfer-sa
volumes:
- name: data-transfer-restore-volume
persistentVolumeClaim:
claimName: restore-pvc
containers:
- name: storage-transfer-pod
image: storage-transfer
command:
- /storage-transfer
args:
- --src_endpoint=https://your-src-endpoint.com
- --src_path=/your-src-bucket
- --src_credentials=transfer/src-secret
- --dst_path=/restore-pv-mnt-path
- --src_type=s3
- --dst_type=local
volumeMounts:
- mountPath: /restore-pv-mnt-path
name: data-transfer-restore-volume
After the Job has finished running, the data from the object storage bucket
populates the volume. A separate Pod can consume the data by using
the same standard mechanisms for mounting a volume.