Overview: A highly scalable, highly durable, low cost
object store. It's suitable for storing vast datasets required for
training and model checkpoints, as well as hosting the final trained
models. Cloud Storage with Cloud Storage FUSE is the recommended
storage solution for most AI and ML use cases because it lets you scale
your data storage with more cost efficiency than file system services.
Supports large scale (up to EBs) training data for GPU and TPU
clusters.
Supports high-throughput (up to 1.25TB/s bandwidth or greater). To
maximize your throughput in Cloud Storage,
request more bandwidth.
Through integration with Cloud Storage FUSE,
Cloud Storage buckets can be mounted as local file systems. The
Cloud Storage FUSE CSI driver
also lets you mount buckets as local file systems in
Google Kubernetes Engine (GKE) for scaled AI and ML workloads.
Use Anywhere Cache
to colocate storage in the same zone as compute workloads, providing
higher throughput (up to 2.5TB/s), lower latency, and location
flexibility when used with a multi-region bucket.
Overview: A high performance, fully managed parallel file system
optimized for AI and high performance computing (HPC) applications.
Suited for environments in which multiple compute nodes need fast and
consistent access to shared data for simulations, modeling, and
analysis.
Scales to 8 PB capacity and up to 1 TB/s of throughput.
Supports thousands of IOPS/TiB.
Delivers ultra low sub-ms latency.
Has full POSIX support which enables out of the box migration of
on-premises AI workloads to Google Cloud.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eStorage services are crucial for various AI and ML workloads, including loading model binaries, storing checkpoints, and managing training data.\u003c/p\u003e\n"],["\u003cp\u003eThe optimal storage solution depends on the specific use case, with options ranging from high durability for model binaries to low durability for temporary data.\u003c/p\u003e\n"],["\u003cp\u003eFilestore (Zonal tier) is ideal for small-scale AI/ML training and serving, providing an NFS mount and scaling to dozens of clients.\u003c/p\u003e\n"],["\u003cp\u003eCloud Storage, with or without Cloud Storage FUSE, supports large-scale training data, offering high scalability, durability, and throughput.\u003c/p\u003e\n"],["\u003cp\u003eParallelstore and Sycomp Storage Scale (GPFS) deliver high-performance, low-latency solutions for scratch data, model serving, and large training datasets, with features like POSIX support and on-premises data caching.\u003c/p\u003e\n"]]],[],null,["This document describes use cases and recommendations for storage services in\nartificial intelligence (AI) and machine learning (ML) workloads.\n\nStorage use cases\n\nStorage services might be used in the following AI and ML workloads:\n\n- Preparing and loading data for training\n- Loading model weights for inference\n- Saving and restoring model checkpoints\n- Loading VM images\n- Logging data\n- Home directories\n- Loading application libraries, packages, and dependencies\n\nStorage recommendations\n\nThe following storage solutions are recommended for optimizing AI and ML system\nperformance:\n\n| **Storage service** | **Features** | **Use cases** |\n|--------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| [Cloud Storage](/storage/docs/introduction) | **Overview**: A highly scalable, highly durable, low cost object store. It's suitable for storing vast datasets required for training and model checkpoints, as well as hosting the final trained models. Cloud Storage with Cloud Storage FUSE is the recommended storage solution for most AI and ML use cases because it lets you scale your data storage with more cost efficiency than file system services. - Supports large scale (up to EBs) training data for GPU and TPU clusters. - Supports high-throughput (up to 1.25TB/s bandwidth or greater). To maximize your throughput in Cloud Storage, [request more bandwidth](/storage/docs/bandwidth-usage#increase). - Through integration with [Cloud Storage FUSE](/storage/docs/gcs-fuse), Cloud Storage buckets can be mounted as local file systems. The [Cloud Storage FUSE CSI driver](/kubernetes-engine/docs/concepts/cloud-storage-fuse-csi-driver) also lets you mount buckets as local file systems in Google Kubernetes Engine (GKE) for scaled AI and ML workloads. - Use [Anywhere Cache](/storage/docs/anywhere-cache) to colocate storage in the same zone as compute workloads, providing higher throughput (up to 2.5TB/s), lower latency, and location flexibility when used with a multi-region bucket. - For more information about using Cloud Storage FUSE for AI and ML workload, see [Optimize AI and ML workloads with Cloud Storage FUSE](/architecture/optimize-ai-ml-workloads-cloud-storage-fuse). | **Recommended for**: - Cost efficiency - Data processing and preparation - Model training and inference - Saving and restoring model checkpoints **Not recommended for**: - Applications that require full POSIX compliance - Home directories |\n| [Google Cloud Managed Lustre](/managed-lustre/docs/overview) | **Overview**: A high performance, fully managed parallel file system optimized for AI and high performance computing (HPC) applications. Suited for environments in which multiple compute nodes need fast and consistent access to shared data for simulations, modeling, and analysis. - Scales to 8 PB capacity and up to 1 TB/s of throughput. - Supports thousands of IOPS/TiB. - Delivers ultra low sub-ms latency. - Has full POSIX support which enables out of the box migration of on-premises AI workloads to Google Cloud. - For more information about using Managed Lustre for AI and ML workload, see [Optimize AI and ML workloads with Google Cloud Managed Lustre](/architecture/optimize-ai-ml-workloads-managed-lustre). | **Recommended for**: - Migrating AI and ML workloads to the cloud - Model simulations - Model training and inference - Saving and restoring model checkpoints - Workloads with frequent small reads and writes - Home directories **Not recommended for**: - Workloads needing more than 8 PB of data |\n\nWhat's next\n\n- For more detailed information about the storage options for AI and ML workloads including training, checkpointing, and serving, see [Design storage for AI and ML workloads in Google Cloud](/architecture/ai-ml/storage-for-ai-ml)."]]