The Google Distributed Cloud (GDC) air-gapped appliance device transfers arbitrary data to and from a Google Distributed Cloud air-gapped environment. Transfers can be started manually or set to automatically occur on a set interval.
Example transfers:
- Download software updates or updated customer workloads
- Upload customer data, device metrics, or device security, audit, and operations logs
- Backup data snapshots
The storage-transfer tool transfers data and is distributed on an image for use in running containers on the cluster.
Data sources
The storage-transfer tool allows flexibility with the operating conditions of GDC air-gapped appliance. S3 Compatible APIs can access externally exposed and internal storage targets. The tool also supports local file system and Cloud Storage sources.
The operator is responsible for maintaining control of access keys and any other credentials, secrets, or sensitive data required for authentication to connect GDC air-gapped appliance to external networks. The operator is also responsible for the configuration of the external network.
Refer to create storage buckets for creating and accessing external storage.
Local storage
Local storage is contained in the pod's container environment and includes the temporary file system or mounted volumes. The ServiceAccount bound to the pod must have access to all mount targets when mounting volumes.
S3 storage
Network available storage is accessible through the S3 Compatible API. The service can be either external or only exposed within the cluster network. You must provide an accessible URL and standardized credentials mounted by using a Kubernetes Secret.
Multi-node and object storage defined data is accessed through the S3 API. See the relevant sections for setting up multi-node storage and object storage within GDC air-gapped appliance.
Cloud storage
You must provide an accessible URL and standardized credentials mounted by using a Secret.
If accessing a Cloud Storage bucket with uniform access controls, then you must
set the --bucket-policy-only
flag to true
.
Credentials
A Kubernetes Secret is required in order to use the storage-transfer service for either S3 or GCS source or destination definitions. These can be provided with a remote service account or a user account.
When using Secrets in a Job or CronJob definition, the JobSpec must be attached to a Kubernetes ServiceAccount that has access to the Secrets.
Create a ServiceAccount that is used by the transfer, and then add permissions to the ServiceAccount to read and write secrets using roles and role bindings. You can choose not to create a ServiceAccount if your default namespace ServiceAccount or custom ServiceAccount already has permissions.
apiVersion: v1
kind: ServiceAccount
metadata:
name: transfer-service-account
namespace: NAMESPACE
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: read-secrets-role
namespace: NAMESPACE
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-secrets-rolebinding
namespace: NAMESPACE
subjects:
- kind: ServiceAccount
name: transfer-service-account
namespace: NAMESPACE
roleRef:
kind: Role
name: read-secrets-role
apiGroup: rbac.authorization.k8s.io
Remote service accounts
To get Cloud Storage service account credentials to make a transfer, see
https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account.
These credentials must be stored in a secret in the service-account-key
field.
Here is an example:
apiVersion: v1
data:
service-account-key: BASE_64_ENCODED_VERSION_OF_CREDENTIAL_FILE_CONTENTS
kind: Secret
metadata:
name: gcs-secret
namespace: NAMESPACE
type: Opaque
User accounts
You can use a user account for authentication with S3-compatible buckets, not
Cloud Storage buckets. You must specify the --src_type
or --dst_type
argument
as s3
.
kubectl create secret -n NAMESPACE generic S3_CREDENTIAL_SECRET_NAME \
--from-literal=access-key-id=ACCESS_KEY_ID
--from-literal=access-key=ACCESS_KEY
Replace the following:
NAMESPACE
: the name of the namespace in which you will create the Job definition.SECRET_NAME
: the name of the Secret you are creating.ACCESS_KEY_ID
: the value found in the Access Key field in the Google Cloud console. When configuring for Object Storage, this is called access-key-id.ACCESS_KEY
: the value found in the Secret field in the Google Cloud console. When configuring for Object Storage, this is the secret-key or Secret.
Certificates
Provide certificates for validation in the job with a Kubernetes Secret
containing a ca.crt
data key.
apiVersion: v1
kind: Secret
metadata:
name: SRC_CERTIFICATE_SECRET_NAME
namespace: NAMESPACE
data:
ca.crt : BASE_64_ENCODED_SOURCE_CERTIFICATE
---
apiVersion: v1
kind: Secret
metadata:
name: DST_CERTIFICATE_SECRET_NAME
namespace: NAMESPACE
data:
ca.crt : BASE_64_ENCODED_DESTINATION_CERTIFICATE # Can be same OR different than source certificate.
Certificates can be provided by reference to the tool using the arguments
src_ca_certificate_reference
and dst_ca_certificate_reference
in the format
NAMESPACE/SECRET_NAME
. For example:
...
containers:
- name: storage-transfer-pod
image: gcr.io/private-cloud-staging/storage-transfer:latest
command:
- /storage-transfer
args:
...
- --src_ca_certificate_reference=NAMESPACE/SRC_CERTIFICATE_SECRET_NAME
- --dst_ca_certificate_reference=NAMESPACE/DST_CERTIFICATE_SECRET_NAME
Optional: Define a LoggingTarget to see logs in Loki
By default, logs from Jobs are only viewable in the Kubernetes resources and are not available in the observability stack and must be configured with a LoggingTarget to be viewable.
apiVersion: logging.gdc.goog/v1alpha1
kind: LoggingTarget
metadata:
namespace: NAMESPACE # Same namespace as your transfer job
name: logtarg1
spec:
# Choose matching pattern that identifies pods for this job
# Optional
# Relationship between different selectors: AND
selector:
# Choose pod name prefix(es) to consider for this job
# Observability platform will scrape all pods
# where names start with specified prefix(es)
# Should contain [a-z0-9-] characters only
# Relationship between different list elements: OR
matchPodNames:
- data-transfer-job # Choose the prefix here that matches your transfer job name
serviceName: transfer-service
Define a built-in Job
Users manage their own Job resources. For single-use data transfers, define a Job. The Job creates a Pod to run the storage-transfer container.
An example Job:
apiVersion: batch/v1
kind: Job
metadata:
name: data-transfer-job
namespace: NAMESPACE
spec:
template:
spec:
restartPolicy: Never
containers:
- name: storage-transfer-pod
image: gcr.io/private-cloud-staging/storage-transfer:latest
command:
- /storage-transfer
args:
- --src_path=/src
- --src_type=local
- --dst_endpoint=https://your-dst-endpoint.com
- --dst_credentials=NAMESPACE/CREDENTIAL_SECRET_NAME
- --dst_path=/FULLY_QUALIFIED_BUCKET_NAME/BUCKET_PATH
- --dst_ca_certificate_reference=NAMESPACE/DST_CERTIFICATE_SECRET_NAME
- --dst_type=gcs
- --bucket_policy_only=true
- --bandwidth_limit=10M #Optional of the form '10K', '100M', '1G' bytes per second
volumeMounts:
- mountPath: /src
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: data-transfer-source
Define a built-in CronJob
Users manage their own defined CronJob resources. Using a built-in CronJob allows for regularly scheduled data transfers.
An example CronJob that achieves an automated data transfer:
apiVersion: batch/v1
kind: CronJob
metadata:
name: data-transfer-cronjob
namespace: NAMESPACE
spec:
schedule: "* * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
serviceAccountName: data-transfer-sa
containers:
- name: storage-transfer-pod
image: gcr.io/private-cloud-staging/storage-transfer:latest
command:
- /storage-transfer
args:
- --src_path=LOCAL_PATH
- --src_type=local
- --dst_endpoint=https://your-dst-endpoint.com
- --dst_credentials=NAMESPACE/CREDENTIAL_SECRET_NAME
- --dst_path=/FULLY_QUALIFIED_BUCKET_NAME/BUCKET_PATH
- --dst_type=gcs
- --bucket_policy_only=true
volumeMounts:
- mountPath: LOCAL_PATH
name: source
restartPolicy: Never
volumes:
- name: source
persistentVolumeClaim:
claimName: data-transfer-source
Google recommends setting concurrencyPolicy
to Forbid
to prevent data contention.
The CronJob, Secret, and PersistentVolumeClaim must be in the same namespace.
Prioritize data jobs
Setting priority on data jobs can be achieved in a number of ways that are not mutually exclusive. You can set less frequent job schedules in the CronJob definition.
Jobs can also be ordered by using InitContainers (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) which always run in order of definition. However, all containers must run successfully. Use InitContainers to give higher priority to one job, or manage data contention by defining two or more InitContainers with mirrored source and destination definitions.
An example jobTemplate
that achieves ordered data transfer:
apiVersion: batch/v1
kind: CronJob
metadata:
name: ordered-data-transfer-cronjob
namespace: NAMESPACE
spec:
schedule: "* * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: job-complete
image: whalesay
command: ["sh", "-c", "echo Job Completed."]
initContainers:
- name: A-to-B
image: gcr.io/private-cloud-staging/storage-transfer:latest
command: [/storage-transfer]
args:
- --src_type=s3
- --src_endpoint=ENDPOINT_A
- --src_path=/example-bucket
- --src_credentials=NAMESPACE/CREDENTIAL_SECRET_NAME
- --dst_type=s3
- --dst_endpoint=ENDPOINT_B
- --dst_credentials=NAMESPACE/CREDENTIAL_SECRET_NAME
- --dst_path=/example-bucket
- name: B-to-A
image: gcr.io/private-cloud-staging/storage-transfer:latest
command: [/storage-transfer]
args:
- --src_type=s3
- --src_endpoint=ENDPOINT_B
- --src_credentials=NAMESPACE/CREDENTIAL_SECRET_NAME
- --src_path=/example-bucket
- --dst_type=s3
- --dst_endpoint=ENDPOINT_A
- --dst_credentials=NAMESPACE/CREDENTIAL_SECRET_NAME
- --dst_path=/example-bucket
Container A-to-B
runs before B-to-A
. This example achieves both a
bisync and job ordering.