About Hyperdisk for GKE


Google Cloud Hyperdisk is a network block storage option offered on GKE. You can use this storage option in your GKE clusters in a similar way as with other Compute Engine Persistent Disk volumes with added flexibility to tune performance for your workload. Compared to Persistent Disk storage, Hyperdisk provides substantially higher maximum input/output operations per second (IOPS) and throughput. Unlike Persistent Disk volumes where performance is shared across all volumes attached to a node, with Hyperdisk, you can specify and tune the level of performance for each Hyperdisk volume.

You can choose from the following Hyperdisk options on GKE:

Storage option GKE operation mode Description
Hyperdisk Balanced Autopilot
Standard

The best fit for most workloads. This is a good option for deploying most enterprise and line-of-business apps, as well as databases and web servers.

Hyperdisk Throughput Autopilot
Standard

Optimized for cost-efficient high-throughput. This is a good option if your use case targets scale-out analytics (for example, Hadoop or Kafka) and throughput-oriented cost-sensitive workloads.

Hyperdisk Extreme Autopilot
Standard

Optimized for IOPS performance. This is a good option if you are deploying high-performance workloads, such as database management systems.

Hyperdisk ML Autopilot
Standard

Optimized for AI/ML training and inference workloads that need to load model weights quickly. Use this option for AI/ML workloads that have high peak read throughput requirements. This is the best option to reduce idleness of GPU/TPU resources when loading data.

Benefits

  • With Hyperdisk, you have more predictable performance on stateful workloads that you deploy.
  • With Hyperdisk, you can provision, manage, and scale your stateful workloads on GKE without the cost and complexity of managing a on-premises storage area network (SAN).
  • Hyperdisk storage capacity is partitioned and made available to GKE nodes as individual volumes. Hyperdisk volumes are decoupled from nodes enabling you to attach, detach, and move volumes between nodes. Data stored in Hyperdisk volumes persist over node reboots and deletions. You can also add multiple Hyperdisk volumes to a single GKE node.

Pricing

You are billed for the total provisioned capacity of your Hyperdisk volumes until you delete them. You are charged per GiB per month. Additionally, you are billed for the following:

  • Hyperdisk Balanced charges a monthly rate for the provisioned IOPS and provisioned throughput (in MiBps) in excess of the baseline values of 3,000 IOPS and 140 MiBps throughput.
  • Hyperdisk Extreme charges a monthly rate based on the provisioned IOPS.
  • Hyperdisk Throughput charges a monthly rate based on the provisioned throughput (in MiBps).

For pricing information, refer to Disk pricing in the Compute Engine documentation.

Limitations

  • After volume creation, you can only modify the following settings through the Compute Engine API:
    • Throughput: Hyperdisk Throughput and Hyperdisk Balanced volumes
    • IOPS: Hyperdisk Extreme and Hyperdisk Balanced volumes
  • You can only attach Hyperdisk volumes to specific instance types; Read-Only attachments are not supported.
  • Hyperdisk ML-specific limitations:
    • Hyperdisk ML can't be used as boot disks.
    • Hyperdisk ML can't be used in multi-writer mode.
    • Hyperdisk ML doesn't support Storage Pools.
  • See the Restrictions and Limitations section in the Compute Engine documentation for additional information.

Hyperdisk and Autopilot Compute Classes

If you want to use Hyperdisk on Autopilot clusters that use Compute Classes, make sure your node's machine type is both supported by Hyperdisk and supported by the Compute Class.

The following example shows how you can specify the nodeSelector property to control Pod scheduling on Autopilot clusters with the Performance Compute Class, when using Hyperdisk Balanced.

cloud.google.com/compute-class: "Performance"
cloud.google.com/machine-famility: "c3"

For more information, see Choose Compute Classes for Autopilot Pods.

Plan the performance level for your Hyperdisk volumes

Use the following considerations to plan the right level of performance for your Hyperdisk volumes.

Hyperdisk Balanced

With Hyperdisk Balanced, you can provision capacity separately from throughput and IOPS. To provision throughput or IOPS, you select the level for a given volume. Individual volumes have full throughput isolation—each volume can use all the specified throughput or IOPS capacity for that volume. However, the throughput or IOPS is ultimately limited by per-instance limits on the VM instance to which your volumes are attached. To learn more about these limits, see About Google Cloud Hyperdisk in the Compute Engine documentation.

Both read and write operations count against the throughput and IOPS limit provisioned for a Hyperdisk Balanced volume. The throughput or IOPS provisioned and the maximum limits apply to the combined total of read and write operations.

If the total throughput or IOPS provisioned for one or more Hyperdisk volumes exceeds the total throughput or IOPS available at the VM instance level, the performance is limited to the instance performance level.

Hyperdisk Throughput

With Hyperdisk Throughput, you can provision capacity separately from throughput. To provision throughput, you select the level for a given volume. Individual volumes have full throughput isolation—each gets the throughput provisioned to it. However, the throughput is ultimately capped by per-instance limits on the VM instance to which your volumes are attached. To learn more about these limits, see About Google Cloud Hyperdisk in the Compute Engine documentation.

Both read and write operations count against the throughput limit provisioned for a Hyperdisk Throughput volume. The throughput provisioned and the maximum limits apply to the combined total of read and write throughput.

When defining a StorageClass, throughput provisioned for Hyperdisk Throughput volumes must follow these rules:

  • At least 10 MiBps per TiB of capacity, and no more than 90 MiBps per TiB of capacity, depending on the machine type.
  • At most 600 MiBps per volume, depending on the machine type.

If the total throughput provisioned for one or more Hyperdisk Throughput volumes exceeds the total throughput available at the VM instance level, the throughput is limited to the instance throughput level.

Hyperdisk Extreme

With Hyperdisk Extreme, you can provision capacity separately from the IOPS level. To provision the IOPS level, you specify the IOPS limit for a given volume. Individual volumes have full IOPS level isolation—each gets the IOPS level provisioned to it. However, the IOPS is ultimately capped by per-instance limits on the VM instance to which your volumes are attached. To learn more about these limits, see About Google Cloud Hyperdisk in the Compute Engine documentation.

Both read and write operations count against the IOPS limit provisioned for a Hyperdisk Extreme volume. The IOPS provisioned, and the maximum limits listed in this document, apply to the total of read and write IOPS.

When defining a StorageClass, IOPS provisioned for Hyperdisk Extreme volumes must be no more than 350,000 IOPS, depending on the machine type.

If the total IOPS provisioned for one or more Hyperdisk Extreme volumes exceeds the total IOPS available at the VM instance level, the performance is limited to the instance IOPS level. If there are multiple Hyperdisk and Persistent Disk volumes attached to the same VM requesting IOPS at the same time, and the VM limits are reached, then each volume has an IOPS level proportional to their share in the total IOPS provisioned across all attached Hyperdisk Extreme volumes.

Hyperdisk ML

With Hyperdisk ML, you can provision capacity separately from performance. To provision performance, you select the throughput level for a given volume. Individual volumes have full performance isolation—each gets the performance provisioned to it.

When one volume is attached to multiple instances, the provisioned throughput will be dynamically distributed across instances. However, the throughput is ultimately capped by per-instance limits on the VM instance to which your volumes are attached.

Both read and write operations count against the throughput limits provisioned for a Hyperdisk ML volume when in READ-WRITE-SINGLE mode. The throughput provisioned and the maximum limits apply to the total of read and write throughput.

Throughput provisioned for Hyperdisk ML volumes must follow the following rules:

  • Minimum: the greater of (0.12 MBps * disk size) in GiB or 400 MBps.
  • Maximum: (1,600 MBps * disk size) in GiB, but not more than 1.2 TBps.
  • If the volume is attached to more than 20 instances in READ-ONLY-MANY mode, then the throughput value needs to be at least 100 MBps * number of instances attached.

If the total throughput provisioned for one or more Hyperdisk ML volumes exceeds the total throughput available at the instance level, the performance will be limited to the instance level performance.

What's next