Storage & Data Transfer

Parallelstore is now GA, fueling the next generation of AI and HPC workloads

October 4, 2024

Barak Epstein

Sr Product Manager

Chinmayee Rathi

Product Manager

Try Gemini 3.1 Pro

Our most intelligent model available yet for complex tasks on Gemini Enterprise and Vertex AI

Try now

Organizations use artificial intelligence (AI) and high-performance computing (HPC) applications to process massive datasets, run complex simulations, and train generative models with billions of parameters for diverse use cases such as LLMs, genomic analysis, quantitative analysis, or real-time sports analytics. These workloads place big performance demands on their storage systems, requiring high throughput and I/O performance that scales and that maintains sub-millisecond latencies, even when thousands of clients are concurrently reading and writing the same shared files.

To power these next-generation AI and HPC workloads, we announced Parallelstore at Google Cloud Next 2024, and today, we are excited to announce that it is now generally available. Built on the Distributed Asynchronous Object Storage (DAOS) architecture, Parallelstore combines a fully distributed metadata and key-value architecture to deliver high-performance throughput and IOPS.

Read on to learn how Parallelstore serves the needs of complex AI and HPC workloads, allowing you to maximize goodput and GPU/TPU utilization, programmatically move data in and out of Parallelstore, and provision Google Kubernetes Engine and Compute Engine resources.

Maximize goodput and GPU/TPU utilization

To overcome the performance limitations of traditional parallel file systems, Parallelstore uses a distributed metadata management system and a key-value store architecture. Parallelstore’s high-throughput parallel data access minimizes latency and I/O bottlenecks, and allows it to saturate the network bandwidth of individual compute clients. This efficient data delivery maximizes goodput to GPUs and TPUs, a critical factor for optimizing AI workload costs. Parallelstore can also provide continuous read/write access to thousands of VMs, GPUs and TPUs, satisfying modest-to-massive AI and HPC workload requirements.

For a 100 TiB deployment, the maximum Parallelstore deployment, throughput scales to ~115 GiB/s, ~3 million read IOPS, ~1 million write IOPS, and a low-latency of ~0.3 ms. This means that Parallelstore is also a good platform for small files and random, distributed access across a large number of clients. For AI use cases, Parallelstore’s performance with small files and metadata operations enables up to 3.9x faster training times and up to 3.7x higher training throughput compared to native ML framework data loaders, as measured by Google Cloud benchmarking.

Programmatically move data in and out of Parallelstore

Many AI and HPC workloads store data in Cloud Storage for data preparation or archiving. You can use Parallelstore’s integrated import/export API to automate movement of the data you’d like to import to Parallelstore for processing. With the API, you can ingest massive datasets from Cloud Storage into Parallelstore at ~20GB/s for files larger than 32MB, and at ~5,000 files per second for files under 32MB.

Figure 1: Parallelstore Import gCloud API

When an AI training job or HPC workload is complete, you can export results programmatically to Cloud Storage for further assessment or longer-term storage. You can also automate data transfers via the API, minimizing manual intervention and streamlining data pipelines.

Figure 2: Parallelstore Export gCloud API

Programmatically provision GKE resources through the CSI driver

It’s easy to efficiently manage high-performance storage for containerized workloads through Parallelstores’ GKE CSI driver. You can dynamically provision and manage Parallelstore file systems as persistent volumes or access existing Parallelstore instances in Kubernetes workloads, directly within your GKE clusters using familiar Kubernetes APIs. This reduces the need to learn and manage a separate storage system, so you can focus on optimizing resources and lowering TCO.

Figure 3: Example of the Parallelstore CSI Driver creating a Storage Class

In the coming months, you’ll be able to preload data from Cloud Storage via the fully managed GKE Volume Populator, which automates the preloading of data from Cloud Storage directly into Parallelstore during the PersistentVolumeClaim provisioning process. This helps ensure your training data is readily available, so you can minimize idle compute-resource time and maximize GPU and TPU utilization.

Programmatically provision Compute Engine resources with the Cluster Toolkit

It’s easy to deploy Parallelstore instances for Compute Engine with the support of the Cluster Toolkit. Formerly known as Cloud HPC Toolkit, Cluster Toolkit is open-source software for deploying HPC and AI workloads. Cluster Toolkit provisions compute, network, and storage resources for your cluster/workload following best practices. You can get started with Cluster Toolkit today by incorporating the Parallelstore module into your blueprint with only a four-line change in your blueprint; we also provide starter blueprints for your convenience. In addition to the Cluster Toolkit, there are also Terraform templates for deploying Parallelstore, supporting operations and provisioning processes through code and minimizing manual operational overhead.

resource "google_parallelstore_instance" "instance" { 
instance_id = "instance" 
location = "us-central1-a" 
description = "test instance" 
capacity_gib = 12000 
network = google_compute_network.network.name 
file_stripe_level = "FILE_STRIPE_LEVEL_MIN" 
directory_stripe_level = "DIRECTORY_STRIPE_LEVEL_MIN" 
labels = { 
test = "value" 
} 
provider = google-beta 
depends_on = [google_service_networking_connection.default] 
}

resource "google_compute_network" "network" { 
name = "network" 
auto_create_subnetworks = true 
mtu = 8896 
provider = google-beta 
}

# Create an IP address 
resource "google_compute_global_address" "private_ip_alloc" { 
name = "address" 
purpose = "VPC_PEERING" 
address_type = "INTERNAL" 
prefix_length = 24 
network = google_compute_network.network.id 
provider = google-beta 
}

# Create a private connection 
resource "google_service_networking_connection" "default" { 
network = google_compute_network.network.id 
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name] 
provider = google-beta 
}

Figure 4: Configure and manage Parallelstore using Terraform

Real-world impact: Respo.vision sees more with Parallelstore

Respo.Vision, a leader in sports video analytics, is leveraging Parallelstore to accelerate an upgrade from 4K to 8K videos for their real-time system. By using Parallelstore as the transport layer, Respo.vision helps capture and label granular data markers, delivering actionable insights to coaches, scouts, and fans. With Parallelstore, Respo.vision avoided pricey infrastructure investments to manage surges of high-performance video processing, all while maintaining low compute latency.

“Our goal was to process 8K video streams at 25 frames per second to deliver richer quality sports analytical data to our customers, and Parallelstore exceeded expectations by effortlessly handling the required volume and delivering an impressive read latency of 0.3 ms. The integration into our system was remarkably smooth and thanks to its distributed nature, Parallelstore has significantly enhanced our system's scalability and resilience.” - Wojtek Rosinski, CTO, Respo.vision

HPC and AI usage is growing rapidly. With its combination of innovative architecture, performance, and integration with Cloud Storage, GKE, and Compute Engine, Parallelstore is the storage solution you need to keep the demanding GPU/TPUs and workloads satisfied. To learn more about Parallelstore, check out the documentation, and reach out to your sales team for more information.

Posted in