Parallelstore overview

Parallelstore is available by invitation only. If you'd like to request access to Parallelstore in your Google Cloud project, contact your sales representative.

Parallelstore is a fully managed, low-latency distributed file system designed to meet the demands of high performance computing (HPC) and data-intensive applications.

Parallelstore is ideal for use cases where multiple clients need concurrent access to shared files with data integrity.

Parallelstore supports the POSIX standard, ensuring compatibility with a wide range of existing applications and tools, simplifying migration and integration.

Parallelstore instances can be mounted to Compute Engine VMs or Google Kubernetes Engine clusters. The Parallelstore CSI driver enables customers to use Kubernetes APIs to access the file system as volumes for their stateful workloads.

Batch data transfers into and out of Cloud Storage are available from the command line and the REST API.

Specifications

  • Parallelstore is a "scratch" file system: it's backed by local SSD with 2+1 erasure coding, with a mean time to data loss (MTTDL) from 2 to 16 months, depending on instance capacity. See the Performance table for details.

  • Usable capacity can be configured from 12TiB to 100TiB.

  • Supported in multiple regions.

Performance

Expected performance from Parallelstore is shown in the following table.

Metric Result
Write Throughput 0.5 GiBps per TiB
Read throughput 1.15 GiBps per TiB
Read IOPS 30k IOPs per TiB
Write IOPS 10k IOPs per TiB
4K Read Latency 0.3 ms
Number of client processes supported 4000
Transfer speed (Parallelstore <> Cloud Storage) Large files (> 32 MB): 20 GBps
Small files (<= 32 MB): 5000 files per second
Mean time to data loss (MTTDL) 100 TiB capacity: 2 months
48 TiB capacity: 4 months
12 TiB capacity: 16 months

These numbers are measured using 256 client connections to a single instance. Latency is measured from a single client. Directory and file striping settings are optimized for each metric.

Use Cases

  • High-performance computing: Parallelstore excels in HPC environments where multiple compute nodes need fast and consistent access to shared data for simulations, modeling, and analysis.

  • Machine learning: Parallelstore can handle the large datasets and high throughput requirements of machine learning workloads, enabling efficient training and inference.

Pricing

See the Pricing page for details.