Manage object storage

Storage bucket naming guidelines

Bucket names must adhere to the following naming conventions:

  • Be unique within the project. A project appends a unique prefix to the bucket name, ensuring there aren't clashes within the organization. In the unlikely event of a prefix and bucket name clash across organizations, the bucket creation fails with a bucket name in use error.
  • Have at least one and no more than 57 characters.
  • Refrain from including any personally identifiable information (PII).
  • Be DNS-compliant.
  • Start with a letter and contain only letters, numbers, and hyphens.

Install s3cmd tool CLI

The s3cmd tool is a command-line tool for managing object storage.

  1. To download the tool, navigate to the directory from where the GDC bundle was extracted.
  2. Run the following commands to extract the s3cmd image, s3cmd.tar.tar.gz, to an empty temporary directory:

    tmpdir=$(mktemp -d)
    
    gdcloud artifacts extract oci/ $tmpdir \
      --image-name=$(gdcloud artifacts tree oci | grep s3cmd.tar | sed 's/^.* //')
    
  3. scp the tar file to a client machine where you use s3cmd for object operations; unzip and install the image.

Choose one of the following installation methods to install the s3cmd tool:

Install through tar file

  1. To unpack the archive and install the s3cmd package, run the following commands. You must have the Python distutils module to install the package. The module is often part of the core Python package or you can install it using your package manager.

    tar xvf /tmp/gpc-system-tar-files/s3cmd.tar.tar.gz
    cd /tmp/gpc-system-tar-files/s3cmd
    sudo python3 setup.py install
    
  2. Optional: Clean up the downloaded files:

    rm /tmp/gpc-system-tar-files/s3cmd.tar.tar.gz
    rm -r /tmp/gpc-system-tar-files/s3cmd
    

Install with docker image

  1. To install the s3cmd image, run the following commands:

    docker load --input s3cmd-docker.tar
    export S3CFG=/EMPTY_FOLDER_PATH/
    alias s3cmd="docker run -it --net=host --mount=type=bind,source=/$S3CFG/,target=/g/
    s3cmd-docker:latest -c /g/s3cfg"
    
  2. Optional: Clean up the downloaded files:

    rm s3cmd-docker.tar
    
  3. Add the export and alias to the .bashrc file to persist after restarting the client.

Configure the s3cmd tool

Use the s3cmd tool for object-based operations.

Run the s3cmd --configure command and specify the following:

  1. Access Key: Enter the access key obtained from the secret in getting access credentials.
  2. Secret Key: Enter the secret key obtained from the secret in getting access credentials.
  3. Default Region: Press ENTER.
  4. S3 Endpoint: Enter the endpoint your Infrastructure Operator (IO) provides.
  5. For DNS style bucketnaming, enter s3://%(bucket).
  6. Optional: Enter an encryption password to protect files in transit.
  7. In Path to GPG, enter /usr/bin/gpg.
  8. Enter Yesto use the HTTPS protocol.
  9. Press Enter to skip entering the proxy server name.

Create storage buckets

Before you begin

A project namespace manages bucket resources in the root admin cluster. You must have a project to create a bucket. To create a new project, see Create a project. You must have appropriate bucket permissions to perform the following operations. See granting bucket access.

Create a bucket

To create a bucket, apply a bucket specification to your project namespace:

    kubectl apply -f bucket.yaml

The following is an example of a bucket specification:

    apiVersion: object.gdc.goog/v1alpha1
    kind: Bucket
    metadata:
      name: BUCKET_NAME
      namespace: NAMESPACE_NAME
    spec:
      description: DESCRIPTION
      storageClass: standard-rwo
      bucketPolicy :
        lockingPolicy :
          defaultObjectRetentionDays: RETENTION_DAY_COUNT

For more details, see the Bucket API reference.

List storage buckets

To list all the buckets that you have access to in a given object storage tenant, complete the following steps:

  1. Run the following commands to list all buckets:

    kubectl get buckets --all-namespaces
    kubectl get buckets --namespace NAMESPACE_NAME
    

Delete storage buckets

You can delete storage buckets by using the CLI. Buckets must be empty before you can delete them.

  1. Use the GET or DESCRIBE command in the View bucket configuration section to get the fully qualified bucket name.

  2. If the bucket is not empty, empty the bucket:

    s3cmd rm --recursive -—force s3://FULLY_QUALIFIED_BUCKET_NAME
    
  3. Delete the empty bucket:

    kubectl delete buckets/BUCKET_NAME --namespace NAMESPACE_NAME
    

View bucket configuration

Use either command to view the configuration details for a bucket:

    kubectl describe buckets/BUCKET_NAME --namespace NAMESPACE_NAME
    kubectl get buckets/BUCKET_NAME --namespace NAMESPACE_NAME -o yaml

Set an object retention period

By default, you can delete objects at any time. Enable object locking with a retention period to prevent all objects in the bucket from deletion for the specified number of days. You cannot delete a bucket until you delete all objects after the retention period.

You must enable object locking when creating the bucket. You cannot enable or disable object locking after you create a bucket. However, you can modify the default object retention period.

You can create a bucket with or without enabling object locking. If you've enabled object locking, specifying a default retention period is optional.

To modify the retention period, update the Bucket.spec.bucketPolicy.lockingPolicy.defaultObjectRetentionDays field in the Bucket resource.

The following is an example of updating the field in the Bucket resource:

apiVersion: object.gdc.goog/v1alpha1
kind: Bucket
metadata:
  name: BUCKET_NAME
  namespace: NAMESPACE_NAME
spec:
  description: "This bucket has a default retention period specified."
  storageClass: standard-rwo
  bucketPolicy :
    lockingPolicy :
      defaultObjectRetentionDays: RETENTION_DAY_COUNT
---
apiVersion: object.gdc.goog/v1alpha1
kind: Bucket
metadata:
  name: BUCKET_NAME
  namespace: NAMESPACE_NAME
spec:
  description: "This would enable object locking but not specify a default retention period."
  storageClass: standard-rwo
  bucketPolicy :
    lockingPolicy :
---
apiVersion: object.gdc.goog/v1alpha1
kind: Bucket
metadata:
  name: BUCKET_NAME
  namespace: NAMESPACE_NAME
spec:
  description: "This bucket does not have locking or retention enabled."
  storageClass: standard-rwo

Any updates to the retention period apply to objects created in the bucket after the update. For pre-existing objects, the retention period does not change.

When you've enabled object locking, if you attempt to overwrite an object, you add a new version of the object. You can retrieve both object versions. For details on how to list object versions, see ListObjectVersions from the Amazon Web Services documentation: https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html

To create a write-once, read-many (WORM) bucket, refer to the WORM Bucket section.

Grant bucket access

You can provide bucket access to other users or service accounts by creating and applying RoleBindings with predefined roles.

Predefined roles

  • project-bucket-object-viewer: This role lets a user list all buckets in the project, list objects in those buckets, and read objects and object metadata. This role does not let you write operations on objects, such as uploading, overwriting, or deleting

  • project-bucket-object-admin: This role lets a user list all buckets in the project, and write and read operations on objects, such as uploading, overwriting, or deleting.

  • project-bucket-admin: This role lets users manage all buckets in the given namespace, as well as all the objects in those buckets.

To see a complete list of the permissions granted for these roles, see the preset role permissions section.

To get the permissions that you need to create project role bindings, ask your Project IAM Admin to grant you the Project IAM Admin (project-iam-admin) role.

The following is an example of creating a RoleBinding for granting access to a user and a service account:

  1. Create a YAML file on your system, such as rolebinding-object-admin-all-buckets.yaml.

     apiVersion: rbac.authorization.k8s.io/v1
     kind: RoleBinding
     metadata:
       namespace: NAMESPACE_NAME
       name: readwrite-all-buckets
     roleRef:
       kind: Role
       name: project-bucket-object-admin
       apiGroup: rbac.authorization.k8s.io
     subjects:
     - kind: ServiceAccount
       namespace: NAMESPACE_NAME
       name: SA_NAME
     - kind: User
       namespace: NAMESPACE_NAME
       name: bob@example.com  # Could be bob or bob@example.com based on your organization settings.
       apiGroup: rbac.authorization.k8s.io
     ```
    
  2. Apply the YAML file:

    kubectl apply \
    -f rolebinding-object-admin-all-buckets.yaml
    

Get bucket access credentials

When you grant access to a bucket, the access credentials are created in a Secret.

The format of the secret name is object-storage-key-SUBJECT_TYPE-SUBJECT_HASH.

  • Values for SUBJECT_TYPE are the following:
    • user: the user.
    • sa: the ServiceAccount.
  • SUBJECT_HASH is the base32-encoded SHA256 hash of the subject name.

As an example, the user bob@foo.com has the secret named:

object-storage-key-user-oy6jdqd6bxfoqcecn2ozv6utepr5bgh355vfku7th5pmejqubdja

Access the user secret

For a user subject, the Secret is in the object-storage-access-keys namespace in the root admin cluster.

  1. Find the secret name:

    kubectl auth can-i --list --namespace object-storage-access-keys | grep object-storage-key-
    

    You receive an output similar to the following:

    secrets        []        [object-storage-key-nl-user-oy6jdqd6bxfoqcecn2ozv6utepr5bgh355vfku7th5pmejqubdja,object-storage-key-std-user-oy6jdqd6bxfoqcecn2ozv6utepr5bgh355vfku7th5pmejqubdja]        [get]
    
  2. Get the contents of the corresponding Secret to access buckets:

    kubectl get -o yaml --namespace object-storage-access-keys secret
    object-storage-key-rm-user-oy6jdqd6bxfoqcecn2ozv6utepr5bgh355vfku7th5pmejqubdja
    

    You receive an output similar to the following:

    data:
      access-key-id: MEhYM08wWUMySjcyMkVKTFBKRU8=
      create-time: MjAyMi0wNy0yMiAwMTowODo1OS40MTQyMTE3MDMgKzAwMDAgVVRDIG09KzE5OTAuMzQ3OTE2MTc3
      secret-access-key: Ump0MVRleVN4SmhCSVJhbmlnVDAwbTJZc0IvRlJVendqR0JuYVhiVA==
    
  3. Decode the access key ID and secret:

    echo "MEhYM08wWUMySjcyMkVKTFBKRU8=" | base64 -d \
        && echo \
        && echo "Ump0MVRleVN4SmhCSVJhbmlnVDAwbTJZc0IvRlJVendqR0JuYVhiVA==" | base64 -d
    

    You receive an output similar to the following:

    0HX3O0YC2J722EJLPJEO
    Rjt1TeySxJhBIRanigT00m2YsB/FRUzwjGBnaXbT
    
  4. Follow the section, Configure s3cmd, with the resulting information.

Access the service account secret

For a service account (SA) subject, the Secret is in the same namespace as the bucket. To find the name, run:

  kubectl get --namespace NAMESPACE_NAME secrets -o=jsonpath=
  '{.items[?(@.metadata.annotations.object\.gdc\.goog/subject=="SA_NAME")].metadata.name}'

You receive an output similar to the following:

  object-storage-key-rm-sa-mng3olp3vsynhswzasowzu3jgzct2ert72pjp6wsbzqhdwckwzbq

You can reference the Secret in your pod as environment variables (https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables) or files (https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod).

Preset role permissions

project-bucket-object-viewer permissions

This role grants permissions to get and list objects and objects' metadata in the bucket.

The project-bucket-object-viewer role has the following permissions:

  • Bucket API permissions:

    1. Get
    2. List
    3. Watch
  • S3 object storage permissions:

    1. GetObject
    2. GetObjectAcl
    3. GetObjectVersion
    4. ListBucket
    5. ListBucketVersions
    6. ListBucketMultipartUploads
    7. ListMultipartUploadParts

project-bucket-object-admin permissions

This role grants permissions to put and delete objects, object versions, and tags in the bucket. Additionally, it also grants all permissions in the project-bucket-object-viewer.

The project-bucket-object-admin role has the following object storage permissions:

  • S3 Object storage permissions:

    1. AbortMultipartUpload
    2. DeleteObject
    3. DeleteObjectVersion
    4. PutObject
    5. RestoreObject

project-bucket-admin permissions

This role grants permissions to create, update, or delete Bucket resources in the project namespace. Additionally, it also grants all permissions in project-bucket-object-admin.

The project-bucket-object-admin role has the following permissions:

  • Bucket API permissions:

    1. Create
    2. Update
    3. Delete

Create a WORM Bucket

A WORM bucket ensures that nothing else overwrites objects and it retains them for a minimum period of time. Audit logging is an example use case for a WORM bucket.

Take the following steps to create a WORM bucket:

  1. Set a retention period when creating the bucket. For example, the following example bucket has a retention period of 365 days.

    apiVersion: object.gdc.goog/v1alpha1
    kind: Bucket
    metadata:
      name: foo logging-bucket
      namespace: foo-service
    spec:
      description: "Audit logs for foo"
      storageClass: standard-rwo
      bucketPolicy :
        lockingPolicy :
          defaultObjectRetentionDays: 365
    
  2. Grant the project-bucket-object-viewer role to all users who need read-only access:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      namespace: foo-service
      name: object-readonly-access
    roleRef:
      kind: Role
      name: project-bucket-object-viewer
      apiGroup: rbac.authorization.k8s.io
    subjects:
    - kind: ServiceAccount
      namespace: foo-service
      name: foo-log-processor
    - kind: User
      name: bob@example.com
      apiGroup: rbac.authorization.k8s.io
    
  3. Grant the project-bucket-object-admin role to users who need to write content to the bucket:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      namespace: foo-service
      name: object-write-access
    roleRef:
      kind: Role
      name: project-bucket-object-viewer
      apiGroup: rbac.authorization.k8s.io
    subjects:
    - kind: ServiceAccount
      namespace: foo-service
      name: foo-service-account
    

Restore from object storage to file system on block storage

Allocate a persistent volume

To restore files from an object storage endpoint, follow these steps:

  1. Allocate a persistent volume (PV) to target in the restore. Use a persistent volume claim (PVC) to allocate the volume, as shown in the following example:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: restore-pvc
      namespace: restore-ns
    spec:
      storageClassName: standard-rwo
      accessModes:
    ReadWriteOnce
      resources:
        requests:
          storage: 1Gi # Need sufficient capacity for full restoration.
    
  2. Check the status of the PVC:

    kubectl get pvc restore-pvc -n restore-ns
    

    After the PVC is in a Bound state, it is ready to consume inside the Pod that rehydrates it.

  3. If a Stateful set eventually consumes the PV, you must match the rendered StatefulSet PVCs. The pods that StatefulSet produces consume the hydrated volumes. The following example shows volume claim templates in a StatefulSet named ss.

      volumeClaimTemplates:
      - metadata:
          name: pvc-name
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: standard-rwo
          resources:
            requests:
              storage: 1Gi
    
  4. Pre-allocate PVCs with names such as ss-pvc-name-0 and ss-pvc-name-1 to ensure that the resultant Pods consume the pre-allocated volumes.

Hydrate the Persistent Volume (PV)

After the PVC is bound to a PV, start the Job to populate the PV:

apiVersion: batch/v1
kind: Job
metadata:
  name: transfer-job
  namespace: transfer
spec:
  template:
    spec:
      serviceAccountName: data-transfer-sa
      volumes:
      - name: data-transfer-restore-volume
        persistentVolumeClaim:
          claimName: restore-pvc
      containers:
      - name: storage-transfer-pod
        image: gcr.io/private-cloud-staging/storage-transfer:latest
        command: /storage-transfer
        args:
        - --src_endpoint=https://your-src-endpoint.com
        - --src_path=/your-src-bucket
        - --src_credentials=transfer/src-secret
        - --dst_path=/restore-pv-mnt-path
        - --src_type=s3
        - --dst_type=local
      volumeMounts:
      - mountPath: /restore-pv-mnt-path
        name: data-transfer-restore-volume

After the Job has finished running, the data from the object storage bucket populates the volume. A separate pod can consume the data by using the same standard mechanisms for mounting a volume.