Mount Cloud Storage buckets as persistent volumes


This guide shows you how to use Kubernetes persistent volumes backed by your Cloud Storage buckets to manage storage resources for your Kubernetes Pods on Google Kubernetes Engine (GKE). Consider using this storage option if you are already familiar with PersistentVolumes and want consistency with your existing deployments that rely on this resource type.

This guide is for Platform admins and operators users who want to simplify storage management for their GKE applications.

Before reading this page, ensure you're familiar with Kubernetes persistent volumes, Kubernetes Pods, and Cloud Storage buckets.

If you want a streamlined Pod-based interface that requires no previous experience with Kubernetes persistent volumes, see Mount Cloud Storage buckets as CSI ephemeral volumes.

Before you begin

Make sure you have completed these prerequisites:

How persistent volumes for Cloud Storage buckets work

With static provisioning, you create one or more PersistentVolume objects containing the details of the underlying storage system. Pods in your clusters can then consume the storage through PersistentVolumeClaims.

Using a persistent volume backed by a Cloud Storage bucket involves these operations:

  1. Storage definition: You define a PersistentVolume in your GKE cluster, including the CSI driver to use and any required parameters. For Cloud Storage FUSE CSI driver, you specify the bucket name and other relevant details.

    Optionally, you can fine-tune the performance of your CSI driver by using the file caching feature. File caching can boost GKE app performance by caching frequently accessed Cloud Storage files on a faster local disk.

    Additionally, you can use the parallel download feature to accelerate reading large files from Cloud Storage for multi-threaded downloads. You can use this feature to improve model load times, especially for reads of over 1 GB in size.

  2. Driver invocation: When a PersistentVolumeClaim requests storage matching the PersistentVolume's specification, GKE invokes the Cloud Storage FUSE CSI driver.

  3. Bucket mounting: The CSI driver mounts the bucket to the node where the requesting Pod is scheduled. This makes the bucket's contents accessible to the Pod as a directory in the Pod's local file system. To fine-tune how buckets are mounted in the file system, you can use mount options. You can also use volume attributes to configure specific behavior of the Cloud Storage FUSE CSI driver.

  4. Re-attachment: If the Pod restarts or is rescheduled to another node, the CSI driver remounts the same bucket to the new node, ensuring data accessibility.

Create a PersistentVolume

  1. Create a PersistentVolume manifest with the following specification:

    Pod

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: gcs-fuse-csi-pv
    spec:
      accessModes:
      - ReadWriteMany
      capacity:
        storage: 5Gi
      storageClassName: example-storage-class  
      mountOptions:
        - implicit-dirs
      csi:
        driver: gcsfuse.csi.storage.gke.io
        volumeHandle: BUCKET_NAME
      claimRef:
        name: gcs-fuse-csi-static-pvc
        namespace: NAMESPACE  
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.

    The example manifest shows these required settings:

    • spec.csi.driver: use gcsfuse.csi.storage.gke.io as the CSI driver name.

    Optionally, you can adjust these variables:

    • spec.mountOptions: Pass mount options to Cloud Storage FUSE. Specify the flags in one string separated by commas, without spaces.
    • spec.csi.volumeAttributes: Pass additional volume attributes to Cloud Storage FUSE.

    Pod (file caching)

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: gcs-fuse-csi-pv
    spec:
      accessModes:
      - ReadWriteMany
      capacity:
        storage: 5Gi
      storageClassName: example-storage-class 
      mountOptions:
        - implicit-dirs
        - file-cache:max-size-mb:-1
      csi:
        driver: gcsfuse.csi.storage.gke.io
        volumeHandle: BUCKET_NAME
      claimRef:
        name: gcs-fuse-csi-static-pvc
        namespace: NAMESPACE 
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.

    Pod (parallel download)

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: gcs-fuse-csi-pv
    spec:
      accessModes:
      - ReadWriteMany
      capacity:
        storage: 5Gi
      storageClassName: example-storage-class 
      mountOptions:
        - implicit-dirs
        - file-cache:enable-parallel-downloads:true
        - file-cache:max-size-mb:-1
      csi:
        driver: gcsfuse.csi.storage.gke.io
        volumeHandle: BUCKET_NAME
      claimRef:
        name: gcs-fuse-csi-static-pvc
        namespace: NAMESPACE 
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.
  2. Apply the manifest to the cluster:

    kubectl apply -f PV_FILE_PATH
    

    Replace PV_FILE_PATH with the path to your YAML file.

Create a PersistentVolumeClaim

  1. Create a PersistentVolumeClaim manifest with the following specification:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: gcs-fuse-csi-static-pvc
      namespace: NAMESPACE
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 5Gi
      storageClassName: example-storage-class
    

    Replace the NAMESPACE with the Kubernetes namespace where you want todeploy your Pod.

    To bind your PersistentVolume to a PersistentVolumeClaim, check these configuration settings:

    • spec.storageClassName fields in your PersistentVolume and PersistentVolumeClaim manifests should match. The storageClassName doesn't need to refer to an existing StorageClass object. To bind the claim to a volume, you can use any name you want but it can't be empty.
    • spec.accessModes fields in your PersistentVolume and PersistentVolumeClaim manifests should match.
    • spec.capacity.storage field in your PersistentVolume manifest should match the spec.resources.requests.storage in the PersistentVolumeClaim manifest. Since Cloud Storage buckets don't have size limits, you can put any number for capacity but it can't be empty.
  2. Apply the manifest to the cluster:

    kubectl apply -f PVC_FILE_PATH
    

    Replace PVC_FILE_PATH with the path to your YAML file.

Consume the volume in a Pod

  1. Create a Pod manifest with the following specification:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-example-static-pvc  
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true"
        gke-gcsfuse/ephemeral-storage-limit: "50Gi" 
    spec:
      containers:
      - image: busybox
        name: busybox
        command: ["sleep"]
        args: ["infinity"]  
        volumeMounts:
        - name: gcs-fuse-csi-static
          mountPath: /data
          readOnly: true
      serviceAccountName: KSA_NAME
      volumes:
      - name: gcs-fuse-csi-static
        persistentVolumeClaim:
          claimName: gcs-fuse-csi-static-pvc
          readOnly: true  
    

    Replace the following values:

    The example manifest shows these required settings:

    Optionally, you can adjust these variables:

    • spec.containers[n].volumeMonts[n].readOnly: Specify true if only specific volume mounts are read-only.
    • spec.volumes[n].persistentVolumeClaim.readOnly: Specify true if all volume mounts are read-only.
  2. Apply the manifest to the cluster:

    kubectl apply -f POD_FILE_PATH
    

    Replace POD_FILE_PATH with the path to your YAML file.

Troubleshoot issues

If you need to troubleshoot Cloud Storage FUSE issues, you can set the log-severity flag to TRACE. You set the flag in the args section of the driver's container spec in the deployment YAML. This causes the gcsfuseLoggingSeverity volume attribute to be automatically set to trace.

For additional troubleshooting tips, see Troubleshooting Guide in the GitHub project documentation.

What's next