About Cloud Storage FUSE CSI driver for GKE


This overview introduces the Cloud Storage FUSE CSI driver for mounting Cloud Storage buckets as local file systems in Google Kubernetes Engine (GKE). This feature is particularly useful for machine learning workloads that need to store training data, models, and checkpoints in Cloud Storage.

This overview is for Developers and Data scientists who want to access or store training data, inference model weights, and checkpoints stored in Cloud Storage from their Kubernetes applications.

Before reading this page, ensure you're familiar with Kubernetes, GKE, and Cloud Storage.

How it works

The driver uses the Container Storage Interface (CSI) standard to allow your applications running in Pods to seamlessly access Cloud Storage buckets as if they were mounted filesystems. This means you can treat your Cloud Storage buckets as a persistent and scalable data source for your Kubernetes applications without complex configuration or code changes.

The Cloud Storage FUSE CSI driver provides a fully-managed experience powered by the open source Google Cloud Storage FUSE CSI plugin. The CSI driver lets you use the Kubernetes API to consume pre-existing Cloud Storage buckets as volumes. Your applications can upload and download objects using Cloud Storage FUSE file system semantics.

Filesystem in Userspace (FUSE) is an interface used to export a file system to the Linux kernel. Cloud Storage FUSE lets you mount Cloud Storage buckets as a file system so that applications can access the objects in a bucket using common File I/O operations (for example, open, read, write, and close) rather than using cloud-specific APIs.

The driver natively supports the following ways for you to configure your Cloud Storage-backed volumes:

  • CSI ephemeral volumes: You specify the Cloud Storage bucket in-line with the Pod specification. Use ephemeral CSI volumes if you want a streamlined Pod-based interface that requires no previous experience with Kubernetes persistent volumes. To use this option, see Mount Cloud Storage buckets as CSI ephemeral volumes.
  • PersistentVolumes: You create a PersistentVolume resource that refers to the Cloud Storage bucket, using static provisioning. Your Pod can then reference a PersistentVolumeClaim that is bound to this PersistentVolume. Use this option if you are already familiar with PersistentVolumes and want consistency with your existing deployments that rely on this resource type. To use this option, see Mount Cloud Storage buckets as persistent volumes.

Use cases

The Cloud Storage FUSE CSI driver is suitable for the scenarios like the following:

AI and machine learning

  • Training: You can use the Cloud Storage FUSE CSI driver to read training data and checkpoint saved models using Cloud Storage as the source of truth. For example, when training a model on GKE using PyTorch, JAX, or TensorFlow, the driver can provide access to training datasets stored in Cloud Storage buckets.
  • Inference: You can serve ML inference models that infer results from files stored in Cloud Storage. You can use Cloud Storage FUSE CSI to preload model weights stored in Cloud Storage. Additionally, you can use the parallel download feature to accelerate reading large files from Cloud Storage for multi-threaded downloads. You can use this feature to improve model load times, especially for reads of over 1 GB in size

Data analytics pipelines

You can use the Cloud Storage FUSE CSI driver to streamline data processing tasks by allowing applications to directly access and analyze large datasets stored in Cloud Storage. For example, a Spark job running on GKE could use the CSI driver to process data stored in Cloud Storage without needing to download it first.

Benefits

Using the CSI driver gives you these benefits:

  • Easy to set up: The Cloud Storage FUSE CSI driver automatically deploys and manages the driver on both Standard and Autopilot clusters. Using CSI ephemeral volumes simplifies volume configuration and management. This is because there is no need for PersistentVolumeClaim and PersistentVolume objects.
  • Security: The Cloud Storage FUSE CSI driver does not need privileged access. This minimizes the risks associated with privileged access and leads to a better security posture. You can use Workload Identity Federation for GKE to manage authentication, giving you granular control over how your Pods access Cloud Storage objects.
  • Performance: The Cloud Storage FUSE CSI driver enhances performance through features like a sidecar for optimized interactions, parallel downloads for faster data access, and metadata and file caching to improve read performance and reduce latency. To learn more about these features, see Performance tuning options and features.
  • Portability and flexibility: The Cloud Storage FUSE CSI driver lets you use standard file system semantics to mount and access Cloud Storage buckets. This provides a familiar interface that improves portability for ML workloads, and avoids the need for you to make extensive code or application changes. The driver is supported on all accelerators available on GKE including GPUs and TPUs. The Cloud Storage FUSE CSI driver supports the ReadWriteMany, ReadOnlyMany, and ReadWriteOnce access modes. You can consume Cloud Storage FUSE volumes in init containers.
  • Manageability: The driver lets you run Cloud Storage FUSE under the covers without needing to install or manage it. You can also view metrics insights for Cloud Storage FUSE, including file system, Cloud Storage, and file cache usage.

Performance tuning options and features

The Cloud Storage FUSE CSI driver comes with several performance tuning options and features for optimizing how your Pods access data stored in Cloud Storage buckets.

For example, by enabling file caching and adjusting the request concurrency, you could significantly reduce the time it takes to load the training data, leading to faster training times.

  • Native sidecar: The Cloud Storage FUSE CSI driver attaches a sidecar container in your Pods to manage interactions with Cloud Storage. The sidecar handles the mounting and interaction with Cloud Storage, allowing your applications to seamlessly access data. You can fine-tune performance by configuring resources like CPU and memory for the sidecar container, or by adjusting settings related to caching and buffering. The Cloud Storage FUSE CSI driver sidecar container and Istio can coexist and run concurrently in your Pod.

  • Parallel download: Starting from GKE version 1.30.3-gke.1571000 and Cloud Storage FUSE v.2.4.0 with file cache enabled, you can use the parallel download feature to accelerate reading large files from Cloud Storage for multi-threaded downloads. You can use this feature to improve model load times, especially for reads over 1 GB in size (for example, up to twice as fast when loading Llama 2 70B).

  • Metadata caching support: The Cloud Storage FUSE CSI driver enhances performance by caching file metadata, like size and modification time. The CSI driver enables this stat cache by default and reduces latency by storing information locally instead of repeatedly requesting it from Cloud Storage. You can configure its maximum size and the duration for which the data stays in the cache. By fine-tuning the metadata cache, you can reduce API calls to Google Cloud Storage, improving application performance and efficiency by minimizing network traffic and latency.

  • File caching support: You can use the Cloud Storage FUSE CSI driver with file caching to improve the read performance of applications handling small files from Cloud Storage buckets. The Cloud Storage FUSE file cache feature is a client-based read cache that allows repeated file reads to be served more quickly from cache storage of your choice. You can choose from a range of storage options for the read cache, including Local SSDs, Persistent Disk-based storage, and RAM disk, based on your price-performance needs.

For performance tuning best practices, refer to Optimize Cloud Storage FUSE CSI driver for GKE performance.

Limitations

The CSI driver has these limitations:

Requirements

To use the Cloud Storage FUSE CSI driver, your clusters must meet the following GKE version requirements:

To use specific features for the Cloud Storage FUSE CSI driver, you also need to meet these requirements:

Feature GKE version requirements
Private image for sidecar container, custom write buffer volume, and sidecar container resource requests 1.27.10-gke.1055000, 1.28.6-gke.1369000, 1.29.1-gke.1575000, or later.
File cache, volume attributes 1.27.12-gke.1190000, 1.28.8-gke.1175000, 1.29.3-gke.1093000, or later.
Cloud Storage FUSE volumes in init containers 1.29.3-gke.1093000 or later, with all nodes on GKE version 1.29 or later.
Parallel download 1.29.6-gke.1254000, 1.30.2-gke.1394000, or later.
Cloud Storage FUSE metrics 1.31.1-gke.1621000 or later.
Metadata prefetch 1.31.3-gke.1162000 or later.

What's next