Cloud Storage FUSE provides four types of optional caching to help increase the performance of data retrieval:
File caching overview
The Cloud Storage FUSE file cache is a client-based read cache that serves repeat file reads from a faster cache storage of your choice.
Benefits of file caching
Improved performance: file caching improves latency and throughput by serving reads directly from the cache media. Small and random I/O operations can be significantly faster when served from the cache.
Use existing capacity: file caching can use existing provisioned machine capacity for your cache directory without incurring charges for additional storage. This includes Local SSDs that come bundled with Cloud GPUs machine types such as
a2-ultragpu
,a3-highgpu
, Persistent Disk (which is the boot disk used by each VM), or in-memory/tmpfs
.Reduced charges: cache hits are served locally and don't incur Cloud Storage operation or network charges.
Improved total cost of ownership for AI and ML training: file caching increases Cloud GPUs and Cloud TPU utilization by loading data faster, which reduces time to training and provides a greater price-performance ratio for AI and ML training workloads.
Enable and configure the file cache
The file cache is disabled by default and can be enabled and configured using a Cloud Storage FUSE configuration file. You can control caching behavior using the following fields:
max-size-mb
: controls the maximum capacity in your cache directory that cached data can occupy. By default, themax-size-mb
field is set to let cached data grow until it occupies all the available capacity in your cache directory.cache-dir
: specifies a directory for storing file cache data. Note that specifying a cache directory is a prerequisite for enabling the file cache.ttl-secs
: determines when cached data becomes stale and needs to be refreshed from Cloud Storage. By default, thettl-secs
field is set to expire and refresh from Cloud Storage after 60 seconds of inactivity. We recommend increasing this value.To learn how to control cache data invalidation, see Configuring cache data invalidation. For more information about the eviction of cached data, see Eviction.
enable-parallel-downloads
: accelerates read performance for large files over 1 GB in size, including first-time reads, by using multiple workers to download a file in parallel using the file cache directory as a prefetch buffer. We recommend enabling parallel downloads for serving and checkpoint restore operations. For more information on enabling and configuring parallel downloads, see Configure parallel downloads.
Random & Partial Reads
If the first file read operation starts from the beginning of the file, at
offset 0
, the Cloud Storage FUSE file cache ingests and loads the entire file
into the cache, even if you're only reading from a small range subset. This
lets subsequent random or partial reads from the same object get served
directly from the cache.
If a file's first read operation starts from anywhere other than offset 0
,
Cloud Storage FUSE, by default, doesn't trigger an asynchronous full file fetch.
To change this behavior so that Cloud Storage FUSE ingests a file to
the cache upon an initial random read, set the cache-file-for-range-read
flag to true
. We recommend that you enable the cache-file-for-range-read
flag if many different random or partial read operations are performed on the
same object.
Eviction
The eviction of cached metadata and data is based on a least recently used
(LRU) algorithm that begins once the space threshold configured per
max-size-mb
limit is reached. If the entry expires based on its TTL, a
Get metadata call is first made to Cloud Storage and is subject to network
latencies. Since the data and metadata are managed separately, you might
experience one entity being evicted or invalidated and not the other.
Persistence
Cloud Storage FUSE caches aren't persisted on unmounts and restarts. For file caching, while the metadata entries needed to serve files from the cache are evicted on unmounts and restarts, data in the file cache might still be present in the file directory. You should delete data in the file cache directory after unmounts or restarts.
Security
When you enable caching, Cloud Storage FUSE uses the cache directory you specified
using the cache-dir
field as the underlying directory for the cache to
persist files from your Cloud Storage bucket in an unencrypted format. Any
user or process that has access to this cache directory can access these files.
We recommend restricting access to this directory.
Direct or multiple access to the file cache
Using a process other than Cloud Storage FUSE to access or modify a file in the cache directory can lead to data corruption. Cloud Storage FUSE caches are specific to each Cloud Storage FUSE running process with no awareness across different Cloud Storage FUSE processes running on the same or different machines. Therefore, we don't recommend using the same cache directory for different Cloud Storage FUSE processes.
If multiple Cloud Storage FUSE processes need to run on the same machine, each Cloud Storage FUSE process should get its own specific cache directory, or use one of following methods to ensure your data doesn't get corrupted:
Mount all buckets with a shared cache: use dynamic mounting to mount all buckets you have access to in a single process with a shared cache. To learn more, see Cloud Storage FUSE dynamic mounting.
Enable caching on a specific bucket: enable caching on only a specified bucket using static mounting. To learn more, see Cloud Storage FUSE static mounting.
Cache only a specific folder or directory: mount and cache only a specific bucket-level folder instead of mounting an entire bucket. To learn more, see Mount a directory within a bucket.
Stat caching overview
The Cloud Storage FUSE stat cache is a cache for object metadata that improves performance for operations specific to file attributes such as size, modification time, or permissions. Using stat cache improves latency by using cached data to perform operations instead of sending a stat object request to Cloud Storage. To learn more about stat caching, see the Semantics documentation on GitHub.
The stat cache is enabled by default and can be configured using a
Cloud Storage FUSE configuration file. The maximum size of the cache is
controlled by the stat-cache-max-size-mb
field, which has a default value of
32
(32 MB). The TTL of the cache is controlled by the ttl-secs
field, which
has a default value of 60
(60 seconds).
Best practices
For stat caching, we recommend using the default value of 32
for the
stat-cache-max-size-mb
field if your workload involves up to 20,000 files.
If your workload is larger than 20,000 files, increase the
stat-cache-max-size-mb
value by 10 for every additional 6,000 files, around
1,500 bytes per file.
stat-cache-max-size-mb
is a mount-level limit, and actual memory usage
might be lower than the value you specify. Alternatively, you can set
stat-cache-max-size-mb
to -1
to let the stat cache use as much
memory as needed.
Type caching overview
The Cloud Storage FUSE type cache is a metadata cache that accelerates performance for metadata operations specific to file or directory existence. Using type cache improves latency by reducing the number of requests made to Cloud Storage to check if a file or directory exists by storing this information locally. To learn more about type caching, see the Semantics documentation on GitHub.
The type cache is enabled by default and can be configured using a
Cloud Storage FUSE configuration file. The maximum size of the cache is
controlled by the type-cache-max-size-mb
field, which has a default value of
4
(4 MB). The TTL of the cache is controlled by the ttl-secs
field, which
has a default value of 60
(60 seconds).
Best practices
For type caching, we recommend using the default value of 4
for the
type-cache-max-size-mb
field if the maximum number of files within a single
directory from the bucket you're mounting contains 20,000 files or less. If the
maximum number of files within a single directory that you're mounting contains
more than 20,000 files, increase the type-cache-max-size-mb
value by 1
for
every 5,000 files, around 200 bytes per file.
type-cache-max-size-mb
is a mount-level limit, and actual memory usage
might be lower than the value specified. Alternatively, you can set the
type-cache-max-size-mb
value to -1
to let the type cache use as much
memory as needed.
List caching overview
The Cloud Storage FUSE list cache is for directory and file list, or ls
,
responses that improves list operation speeds. List caching is especially useful
for workloads that repeat full directory listings as part of execution,
such as AI/ML training runs.
The list cache is kept in memory in the page cache, which is controlled by the kernel based on memory availability, as opposed to the stat and type caches, which are kept in your machine's memory and controlled by Cloud Storage FUSE.
Enable list caching
The list cache is disabled by default.
You can enable list caching using the
kernel-list-cache-ttl-secs
field with one of the following values:
A positive value which represents the time to live (TTL) in seconds to keep the directory list response in the kernel's page cache.
A value of
-1
to bypass entry expiration and return the list response from the cache when it's available.
To enable and configure list caching, see the Cloud Storage FUSE configuration file.
Configure cache invalidation
The following sections describe how to configure cache invalidation for all cache types.
File, stat, and type cache invalidation
For file, stat, and type caches, the ttl-secs
field specifies the TTL in
seconds for how long cached metadata is used from when it's fetched
from Cloud Storage to when it expires and needs to be refreshed.
You can configure ttl-secs
in a Cloud Storage FUSE configuration file.
The ttl-secs
field is set to 60
by default.
When you specify a value for ttl-secs
that's greater than 0
, the metadata
for the file cache remains valid only for the amount of time you specified.
For file caching, we recommend increasing the ttl-secs
value based on the
expected time between repeat reads while you balance consistency needs.
Based on the importance and frequency of the data changing, we recommend
setting the ttl-secs
value as high as your workload lets you. When a
metadata entry becomes invalid, subsequent reads are queried from
Cloud Storage.
In addition to accepting values that represent a specific TTL in seconds before your cached metadata expires and needs to be refreshed, you can use the following values to specify how your file is read:
ttl-secs
value of0
: ensures the file with the most up to date data is read by issuing aGET
metadata call to Cloud Storage that checks the file it's serving from to ensure the cache is consistent. If the file in the cache is up to date, it's served directly from the cache. Specifying a value other than0
can lead to reduced performance because a call must always be made to Cloud Storage to check the metadata first. If the file is in the cache and hasn't changed, the file is served from the cache with consistency after theGET
metadata call.ttl-secs
value of-1
: ensures the file is always read from the cache if it's available, without checking for consistency. Serving files without checking for consistency can serve inconsistent data, and should only be used temporarily for workloads that run in jobs with non-changing data. For example, using a value of-1
is useful for machine learning training, where the same data is read across multiple epochs without changes.
List cache invalidation
List cache invalidation is set by specifying a value greater than 0
using the
kernel-list-cache-ttl-secs
field. The directory list response is kept in the
kernel's page cache and remains valid for the amount of time you specified.
By default, the list cache is disabled and set to a value of 0
. When you
specify a value of -1
, Cloud Storage FUSE disables list cache expiration and
returns the list response from the cache when it's available.
Read path for cached data
The Cloud Storage FUSE cache accelerates repeat reads after they've been ingested to the cache. Both first-time reads and cache misses go directly to Cloud Storage and are subject to normal Cloud Storage network latencies. To improve first-time read performance, see Improve first-time reads.
Considerations
Enabling file caching, stat caching, type caching, or list caching can increase performance but reduce consistency, which usually occurs when you access the same bucket using multiple clients with a high change rate. To reduce the impact on consistency, we recommend mounting buckets as read-only. To learn more about caching behavior, see Cloud Storage FUSE semantics documentation on GitHub.
If a file cache entry hasn't yet expired based on its TTL and the file is in the cache, the entire operation is served from the local client cache without any request being issued to Cloud Storage.
If a file cache entry has expired based on its TTL, a Get metadata call is first made to Cloud Storage, and if the file isn't in the cache, the file is retrieved from Cloud Storage. Both operations are subject to network latencies. If the metadata entry has been invalidated, but the file is in the cache, and its object generation has not changed, the file is served from the cache only after the Get metadata call is made to check if the data is valid.
If a Cloud Storage FUSE client modifies a cached file or its metadata, then the file is immediately invalidated and consistency is ensured in the following read by the same client. However, if different clients access the same file or its metadata, and its entries are cached, then the cached version of the file or metadata is read and not the updated version until the file is invalidated by that specific client's TTL setting.
To avoid cache thrashing, ensure that your entire dataset fits into the cache capacity. Also, consider the maximum capacity and performance that your cache media can provide. If you hit the provisioned cache's maximum performance, capacity limit, or both, it's beneficial to read directly from Cloud Storage which has much higher limits than Cloud Storage FUSE.
What's next
Learn to use and configure file caching.
Read more about how to improve read and writer performance.