This document describes how you can use profile-based configurations to streamline adoption and enhance the performance of Cloud Storage FUSE for your artificial intelligence or machine learning (AI/ML) workloads.
To help you streamline Cloud Storage FUSE configuration for your serving,
checkpointing, or training workloads, you can apply pre-configured profiles
based on your workload type using the profile
field or
--profile
option. Using the field or option, you can specify a
predefined, optimized set of Cloud Storage FUSE features for caching, threading, and
buffer sizes, ensuring high performance with minimal effort for training,
checkpointing, and serving workloads, with profile values aiml-training
,
aiml-checkpointing
, and aiml-serving
respectively.
Considerations
You can only set the
--profile
option orprofile
field during a mount operation. If you need to update the--profile
option orprofile
field, you need to remount your Cloud Storage FUSE bucket.When you use profile-based configurations, Cloud Storage FUSE sets the metadata cache capacity and time to live (TTL) to unlimited, meaning that entries are never evicted from the metadata cache. If your virtual machine doesn't have enough memory, you might experience Out of Memory (OOM) errors. Therefore, we recommend reviewing your memory capacity before you apply profile-based configurations. OOM errors are more likely to occur on machines with less than one TiB of memory.
When specifying configuration values using profiles, detected high-performance machine types, a
gcsfuse
command, or a Cloud Storage FUSE configuration file, the methods take precedence in the following order (where the top methods supersede the methods below it):Values set as part of a
gcsfuse
command or a Cloud Storage FUSE configuration file.Values set as the argument to the
--profiles
option in agcsfuse
command or theprofile
field in a Cloud Storage FUSE configuration file.Automated configuration values set when Cloud Storage FUSE detects that a high-performance machine type is being used. For more information, see Automated configuration values.
Cloud Storage FUSE CSI volumes in Google Kubernetes Engine Pods don't support the
profile
field or--profile
option.File caching cannot be enabled using profile-based configurations because file caching requires the use of Cloud Storage FUSE configuration fields and Cloud Storage FUSE CLI options that can't be generalized. To enable file caching for serving, training, or checkpointing workloads, you must configure file caching options or fields explicitly.
Apply profile-based configurations for training workloads
The training-specific profile optimizes performance for high throughput reads of large datasets and prevents Cloud GPUs and Cloud TPU hardware from waiting for data.
To apply the training-specific profile, specify either
profile=aiml-training
using a Cloud Storage FUSE configuration file or
--profile=aiml-training
using the
the Cloud Storage FUSE CLI. The following configurations are then applied:
# Create implicit directories locally when accessed:
- implicit-dirs
# Disable caching for lookups of files or directories that don't exist:
- metadata-cache:negative-ttl-secs:0
# Keep cached metadata (file attributes, types) indefinitely time-wise:
- metadata-cache:ttl-secs:-1
# Allow unlimited size for the file attribute (stat) cache:
- metadata-cache:stat-cache-max-size-mb:-1
# Allow unlimited size for the file/directory type cache:
- metadata-cache:type-cache-max-size-mb:-1
Apply profile-based configurations for checkpointing workloads
The checkpointing-specific profile optimizes performance for high throughput writes for large files by drastically reducing the time it takes to save multi-gigabyte checkpoints, minimizing training pauses.
To apply the training-specific profile, specify either
profile=aiml-checkpointing
using a Cloud Storage FUSE configuration file or
--profile=aiml-checkpointing
using the
the Cloud Storage FUSE CLI. The following configurations are then applied:
# Create implicit directories locally when accessed:
- implicit-dirs
# Disable caching for lookups of files/dirs that don't exist:
- metadata-cache:negative-ttl-secs:0
# Keep cached metadata (file attributes, types) indefinitely time-wise:
- metadata-cache:ttl-secs:-1
# Allow unlimited size for the file attribute (stat) cache:
- metadata-cache:stat-cache-max-size-mb:-1
# Allow unlimited size for the file/directory type cache:
- metadata-cache:type-cache-max-size-mb:-1
# Cache the entire file when any part is read sequentially:
- file-cache:cache-file-for-range-read:true
# Allow renaming directories with a lot of files in non-HNS buckets.
- file-system:rename-dir-limit:200000
Apply profile-based configurations for serving workloads
Serving optimizes performance for serving workloads by improving data access and caching mechanisms.
To apply the training-specific profile, specify either
profile=aiml-serving
using a Cloud Storage FUSE configuration file or
--profile=aiml-serving
using the
the Cloud Storage FUSE CLI. The following configurations are then applied:
# Create implicit directories locally when accessed:
- implicit-dirs
# Disable caching for lookups of files/dirs that don't exist:
- metadata-cache:negative-ttl-secs:0
# Keep cached metadata (file attributes, types) indefinitely time-wise:
- metadata-cache:ttl-secs:-1
# Allow unlimited size for the file attribute (stat) cache:
- metadata-cache:stat-cache-max-size-mb:-1
# Allow unlimited size for the file/directory type cache:
- metadata-cache:type-cache-max-size-mb:-1
# Cache the entire file when any part is read sequentially:
- file-cache:cache-file-for-range-read:true
# Enable kernel-list-cache to make listing faster as this is a readonly file system hierarchy.
- file-system:kernel-list-cache-ttl-secs:-1
What's next
Learn about automated configuration values for high-performance machine types.
Learn how you can optimize performance with pre-configured GKE YAML files.