This guide covers how to simplify and accelerate the loading of AI/ML model weights on Google Kubernetes Engine (GKE) using Hyperdisk ML. The Compute Engine Persistent Disk CSI driver is the primary way for you to access Hyperdisk ML storage with GKE clusters.
Overview
Hyperdisk ML is a high performance storage solution that can be used to scale out your applications. It provides high aggregate throughput to many virtual machines concurrently, making it ideal if you want to run AI/ML workloads that need access to large amounts of data.
When enabled in read-only-many mode, you can use Hyperdisk ML to accelerate the loading of model weights by up to 11.9X relative to loading directly from a model registry. This acceleration is made possible by the Google Cloud Hyperdisk architecture that allows scaling to 2,500 concurrent nodes at 1.2 TB/s. This lets you drive better load times and reduce Pod over-provisioning for your AI/ML inference workloads.
The high level steps to create and use Hyperdisk ML are as follows:
- Pre-cache or hydrate data in a Persistent Disk disk image: Load Hyperdisk ML volumes with data from an external data source (for example, Gemma weights loaded from Cloud Storage) that can be used for serving. The Persistent Disk for the disk image must be compatible with Google Cloud Hyperdisk.
- Create a Hyperdisk ML volume using a pre-existing Google Cloud Hyperdisk: Create a Kubernetes volume that references the Hyperdisk ML volume loaded with data. Optionally, you can create multi-zone storage classes to ensure your data is available in all zones that your Pods will run.
- Create a Kubernetes Deployment to consume the Hyperdisk ML volume: Reference the Hyperdisk ML volume with accelerated data loading for your applications to consume.
Multi-zone Hyperdisk ML volumes
Hyperdisk ML disks are only available in a single zone. Optionally, you can
use the Hyperdisk ML multi-zone feature to dynamically
link multiple zonal disks that contain the same content in a single logical
PersistentVolumeClaim and PersistentVolume. Zonal disks referenced by the
multi-zone feature must be located in the same region. For example, if your
regional cluster is created in us-central1
, the multi-zone disks must be located
in the same region (for example, us-central1-a
, us-central1-b
).
A common use case for AI/ML inference is to run Pods across zones for improved accelerator availability and cost efficiency with Spot VMs. Since Hyperdisk ML is zonal, if your inference server runs many Pods across zones, GKE automatically clone the disks across zones to ensure your data follows your application.
Multi-zone Hyperdisk ML volumes have the following limitations:
- Volume resize and volume snapshots operations are not supported.
- Multi-zone Hyperdisk ML volumes are only supported in read-only mode.
- When using pre-existing disks with a multi-zone Hyperdisk ML volume, GKE does not perform checks to validate that the disk content across zones are the same. If any of the disks contain diverging content, make sure your application takes potential inconsistency between zones into account.
To learn more, see Create a multi-zone ReadOnlyMany Hyperdisk ML volume from a VolumeSnapshot.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
- Set your default region and zone to one of the supported values.
- Ensure your Google Cloud project has sufficient quota to create the necessary nodes in this guide. The example code for GKE cluster and Kubernetes resource creation require the following minimum quota in the region of your choice: 88 C3 CPUs, 8 NVIDIA L4 GPUs.
Requirements
To use Hyperdisk ML volumes in GKE, your clusters must meet the following requirements:
- Use Linux clusters running GKE version 1.30.2-gke.1394000 or later. If you use a release channel, ensure that the channel has the minimum GKE version or later that is required for this driver.
- Make sure that the Compute Engine Persistent Disk CSI driver is enabled. The Compute Engine Persistent Disk driver is enabled by default on new Autopilot and Standard clusters and cannot be disabled or edited when using Autopilot. If you need to enable the Compute Engine Persistent Disk CSI driver from your cluster, see Enabling the Compute Engine Persistent Disk CSI Driver on an existing cluster.
- If you want to tune the readahead value, use GKE version 1.29.2-gke.1217000 or later.
- If you want to use the multi-zone dynamically provisioned feature, use GKE version 1.30.2-gke.1394000 or later.
- Hyperdisk ML is only supported on certain node types and zones. To learn more, see About Google Cloud Hyperdisk in the Compute Engine documentation.
Get access to the model
To get access to the Gemma models for deployment to GKE, you must first sign the license consent agreement then generate a Hugging Face access token.
Sign the license consent agreement
You must sign the consent agreement to use Gemma. Follow these instructions:
- Access the model consent page on Kaggle.com.
- Verify consent using your Hugging Face account.
- Accept the model terms.
Generate an access token
To access the model through Hugging Face, you'll need a Hugging Face token.
Follow these steps to generate a new token if you don't have one already:
- Click Your Profile > Settings > Access Tokens.
- Select New Token.
- Specify a Name of your choice and a Role of at least
Read
. - Select Generate a token.
- Copy the generated token to your clipboard.
Create a GKE cluster
You can serve LLMs on GPUs in a GKE Autopilot or Standard cluster. We recommend that you use a Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workloads, see Choose a GKE mode of operation.
Autopilot
In Cloud Shell, run the following command:
gcloud container clusters create-auto hdml-gpu-l4 \ --project=PROJECT \ --region=REGION \ --release-channel=rapid \ --cluster-version=1.30.2-gke.1394000
Replace the following values:
- PROJECT: the Google Cloud project ID.
- REGION: a region that supports the accelerator
type you want to use, for example,
us-east4
for L4 GPU.
GKE creates an Autopilot cluster with CPU and GPU nodes as requested by the deployed workloads.
Configure
kubectl
to communicate with your cluster:gcloud container clusters get-credentials hdml-gpu-l4 \ --region=REGION
Standard
In Cloud Shell, run the following command to create a Standard cluster and node pools:
gcloud container clusters create hdml-gpu-l4 \ --location=REGION \ --num-nodes=1 \ --machine-type=c3-standard-44 \ --release-channel=rapid \ --cluster-version=CLUSTER_VERSION \ --node-locations=ZONES \ --project=PROJECT gcloud container node-pools create gpupool \ --accelerator type=nvidia-l4,count=2,gpu-driver-version=latest \ --location=REGION \ --project=PROJECT \ --node-locations=ZONES \ --cluster=hdml-gpu-l4 \ --machine-type=g2-standard-24 \ --num-nodes=2
Replace the following values:
- CLUSTER_VERSION: the version of your GKE cluster (for example, 1.30.2-gke.1394000).
- REGION: the compute region
for the cluster control plane. The region must support the accelerator you
want to use, for example
us-east4
, for L4 GPU. Check which regions the L4 GPUs are available. - ZONES: the zones in which nodes are created.
You can specify as many zones as needed for your cluster. All zones must
be in the same region as the cluster's control plane, specified by the
--zone
flag. For zonal clusters,--node-locations
must contain the cluster's primary zone. - PROJECT: the Google Cloud project ID.
The cluster creation might take several minutes.
Configure
kubectl
to communicate with your cluster:gcloud container clusters get-credentials hdml-gpu-l4
Pre-cache data to a Persistent Disk disk image
To use Hyperdisk ML, you pre-cache data in a disk image, and create a Hyperdisk ML volume for read access by your workload in GKE. This approach (also called data hydration) ensures that your data is available when your workload needs it.
To copy the data from Cloud Storage to pre-cache a Persistent Disk disk image, follow these steps:
Create a StorageClass that supports Hyperdisk ML
Save the following StorageClass manifest in a file named
hyperdisk-ml.yaml
.apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: hyperdisk-ml parameters: type: hyperdisk-ml provisioner: pd.csi.storage.gke.io allowVolumeExpansion: false reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer
Create the StorageClass by running this command:
kubectl create -f hyperdisk-ml.yaml
Create a ReadWriteOnce (RWO) PersistentVolumeClaim
Save the following PersistentVolumeClaim manifest in a file named
producer-pvc.yaml
. You'll use the StorageClass you created earlier. Make sure that your disk has sufficient capacity to store your data.kind: PersistentVolumeClaim apiVersion: v1 metadata: name: producer-pvc spec: storageClassName: hyperdisk-ml accessModes: - ReadWriteOnce resources: requests: storage: 300Gi
Create the PersistentVolumeClaim by running this command:
kubectl create -f producer-pvc.yaml
Create a Kubernetes Job to populate the mounted Google Cloud Hyperdisk volume
This section shows an example of creating a Kubernetes Job that provisions a disk and downloads the Gemma 7B instruction tuned model from Hugging Face onto the mounted Google Cloud Hyperdisk volume.
To access the Gemma LLM that the examples in this guide uses, create a Kubernetes Secret that contains the Hugging Face token:
kubectl create secret generic hf-secret \ --from-literal=hf_api_token=HF_TOKEN\ --dry-run=client -o yaml | kubectl apply -f -
Replace HF_TOKEN with the Hugging Face token you generated earlier.
Save the following example manifest as
producer-job.yaml
:apiVersion: batch/v1 kind: Job metadata: name: producer-job spec: template: # Template for the Pods the Job will create spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cloud.google.com/compute-class operator: In values: - "Performance" - matchExpressions: - key: cloud.google.com/machine-family operator: In values: - "c3" - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - "ZONE" containers: - name: copy resources: requests: cpu: "32" limits: cpu: "32" image: huggingface/downloader:0.17.3 command: [ "huggingface-cli" ] args: - download - google/gemma-1.1-7b-it - --local-dir=/data/gemma-7b - --local-dir-use-symlinks=False env: - name: HUGGING_FACE_HUB_TOKEN valueFrom: secretKeyRef: name: hf-secret key: hf_api_token volumeMounts: - mountPath: "/data" name: volume restartPolicy: Never volumes: - name: volume persistentVolumeClaim: claimName: producer-pvc parallelism: 1 # Run 1 Pods concurrently completions: 1 # Once 1 Pods complete successfully, the Job is done backoffLimit: 4 # Max retries on failure
Replace ZONE with the compute zone where you want the Hyperdisk to be created. If you're using it with the Deployment example, ensure it is a zone that has G2 machine capacity.
Create the Job by running this command:
kubectl apply -f producer-job.yaml
It might take a few minutes for the Job to finish copying data to the Persistent Disk volume. When the Job completes provisioning, its status is marked "Complete".
To check the progress of your Job status, run the following command:
kubectl get job producer-job
Once the Job is complete, you can clean up the Job by running this command:
kubectl delete job producer-job
Create a ReadOnlyMany Hyperdisk ML volume from a pre-existing Google Cloud Hyperdisk
This section covers the steps for creating a ReadOnlyMany (ROM) PersistentVolume and PersistentVolumeClaim pair from a pre-existing Google Cloud Hyperdisk volume. To learn more, see Using pre-existing persistent disks as PersistentVolumes.
In GKE version 1.30.2-gke.1394000 and later, GKE automatically converts the access mode of a
READ_WRITE_SINGLE
Google Cloud Hyperdisk volume toREAD_ONLY_MANY
.If you are using a pre-existing Google Cloud Hyperdisk volume on an earlier version of GKE, you must modify the access mode manually by running the following command:
gcloud compute disks update HDML_DISK_NAME \ --zone=ZONE \ --access-mode=READ_ONLY_MANY
Replace the following values:
- HDML_DISK_NAME: the name of your Hyperdisk ML volume.
- ZONE: the compute zone where the pre-existing Google Cloud Hyperdisk volume is created.
Create a PersistentVolume and PersistentVolumeClaim pair, referencing the disk you previously populated.
Save the following manifest as
hdml-static-pv.yaml
:apiVersion: v1 kind: PersistentVolume metadata: name: hdml-static-pv spec: storageClassName: "hyperdisk-ml" capacity: storage: 300Gi accessModes: - ReadOnlyMany claimRef: namespace: default name: hdml-static-pvc csi: driver: pd.csi.storage.gke.io volumeHandle: projects/PROJECT/zones/ZONE/disks/DISK_NAME fsType: ext4 readOnly: true nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: topology.gke.io/zone operator: In values: - ZONE --- apiVersion: v1 kind: PersistentVolumeClaim metadata: namespace: default name: hdml-static-pvc spec: storageClassName: "hyperdisk-ml" volumeName: hdml-static-pv accessModes: - ReadOnlyMany resources: requests: storage: 300Gi
Replace the following values:
- PROJECT: the project where your GKE cluster is created.
- ZONE: the zone where the pre-existing Google Cloud Hyperdisk volume is created.
- DISK_NAME: the name of the pre-existing Google Cloud Hyperdisk volume.
Create the PersistentVolume and PersistentVolumeClaim resources by running this command:
kubectl apply -f hdml-static-pv.yaml
Create a multi-zone ReadOnlyMany Hyperdisk ML volume from a VolumeSnapshot
This section covers the steps for creating a multi-zone Hyperdisk ML volume in ReadOnlyMany access mode. You use a VolumeSnapshot for a pre-existing Persistent Disk disk image. To learn more, see Back up Persistent Disk storage using volume snapshots.
To create the multi-zone Hyperdisk ML volume, follow these steps:
Create a VolumeSnapshot of your disk
Save the following manifest as a file called
disk-image-vsc.yaml
.apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: disk-image-vsc driver: pd.csi.storage.gke.io deletionPolicy: Delete parameters: snapshot-type: images
Create the VolumeSnapshotClass by running the following command:
kubectl apply -f disk-image-vsc.yaml
Save the following manifest as a file called
my-snapshot.yaml
. You'll reference the PersistentVolumeClaim you created earlier in Create a ReadWriteOnce (RWO) PersistentVolumeClaim.apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: my-snapshot spec: volumeSnapshotClassName: disk-image-vsc source: persistentVolumeClaimName: producer-pvc
Create the VolumeSnapshot by running the following command:
kubectl apply -f my-snapshot.yaml
When the VolumeSnapshot is marked "Ready", run the following command to create the Hyperdisk ML volume:
kubectl wait --for=jsonpath='{.status.readyToUse}'=true \ --timeout=300s volumesnapshot my-snapshot
Create a multi-zone StorageClass
If you want copies of your data to be accessible in more than one zone, specify
the enable-multi-zone-provisioning
parameter in your StorageClass, which
creates disks in the zones you specified in the allowedTopologies
field.
To create the StorageClass, follow these steps:
Save the following manifest as a file called
hyperdisk-ml-multi-zone.yaml
.apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: hyperdisk-ml-multi-zone parameters: type: hyperdisk-ml provisioned-throughput-on-create: "2400Mi" enable-multi-zone-provisioning: "true" provisioner: pd.csi.storage.gke.io allowVolumeExpansion: false reclaimPolicy: Delete volumeBindingMode: Immediate allowedTopologies: - matchLabelExpressions: - key: topology.gke.io/zone values: - ZONE_1 - ZONE_2
Replace ZONE_1, ZONE_2, ..., ZONE_N with the zones where your storage can be accessed.
This example sets the volumeBindingMode to
Immediate
, allowing GKE to provision the PersistentVolumeClaim prior to any consumer referencing it.Create the StorageClass by running the following command:
kubectl apply -f hyperdisk-ml-multi-zone.yaml
Create a PersistentVolumeClaim that uses the multi-zone StorageClass
The next step is to create a PersistentVolumeClaim that references the StorageClass.
GKE uses the content of the disk image specified to automatically provision a Hyperdisk ML volume in each zone specified in your snapshot.
To create the PersistentVolumeClaim, follow these steps:
Save the following manifest as a file called
hdml-consumer-pvc.yaml
.kind: PersistentVolumeClaim apiVersion: v1 metadata: name: hdml-consumer-pvc spec: dataSource: name: my-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadOnlyMany storageClassName: hyperdisk-ml-multi-zone resources: requests: storage: 300Gi
Create the PersistentVolumeClaim by running the following command:
kubectl apply -f hdml-consumer-pvc.yaml
Create a Deployment to consume the Hyperdisk ML volume
When using Pods with PersistentVolumes, we recommend that you use a workload controller (such as a Deployment or StatefulSet).
If you want to use a pre-existing PersistentVolume in ReadOnlyMany mode with a Deployment, refer to Use persistent disks with multiple readers.
To create and test your Deployment, follow these steps:
Save the following example manifest as
vllm-gemma-deployment
.apiVersion: apps/v1 kind: Deployment metadata: name: vllm-gemma-deployment spec: replicas: 2 selector: matchLabels: app: gemma-server template: metadata: labels: app: gemma-server ai.gke.io/model: gemma-7b ai.gke.io/inference-server: vllm spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 topologyKey: topology.kubernetes.io/zone containers: - name: inference-server image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:latest resources: requests: cpu: "2" memory: "25Gi" ephemeral-storage: "25Gi" nvidia.com/gpu: 2 limits: cpu: "2" memory: "25Gi" ephemeral-storage: "25Gi" nvidia.com/gpu: 2 command: ["python3", "-m", "vllm.entrypoints.api_server"] args: - --model=$(MODEL_ID) - --tensor-parallel-size=2 env: - name: MODEL_ID value: /models/gemma-7b volumeMounts: - mountPath: /dev/shm name: dshm - mountPath: /models name: gemma-7b volumes: - name: dshm emptyDir: medium: Memory - name: gemma-7b persistentVolumeClaim: claimName: CLAIM_NAME nodeSelector: cloud.google.com/gke-accelerator: nvidia-l4 --- apiVersion: v1 kind: Service metadata: name: llm-service spec: selector: app: gemma-server type: ClusterIP ports: - protocol: TCP port: 8000 targetPort: 8000
Replace CLAIM_NAME with one of these values:
hdml-static-pvc
: if you are using a Hyperdisk ML volume from a existing Google Cloud Hyperdisk.hdml-consumer-pvc
: if you are using a Hyperdisk ML volume from a VolumeSnapshot disk image.
Run the following command to wait for the inference server to be available:
kubectl wait --for=condition=Available --timeout=700s deployment/vllm-gemma-deployment
To test that your vLLM server is up and running, follow these steps:
Run the following command to set up port forwarding to the model:
kubectl port-forward service/llm-service 8000:8000
Run a
curl
command to send a request to the model:USER_PROMPT="I'm new to coding. If you could only recommend one programming language to start with, what would it be and why?" curl -X POST http://localhost:8000/generate \ -H "Content-Type: application/json" \ -d @- <<EOF { "prompt": "<start_of_turn>user\n${USER_PROMPT}<end_of_turn>\n", "temperature": 0.90, "top_p": 1.0, "max_tokens": 128 } EOF
The following output shows an example of the model response:
{"predictions":["Prompt:\n<start_of_turn>user\nI'm new to coding. If you could only recommend one programming language to start with, what would it be and why?<end_of_turn>\nOutput:\nPython is often recommended for beginners due to its clear, readable syntax, simple data types, and extensive libraries.\n\n**Reasons why Python is a great language for beginners:**\n\n* **Easy to read:** Python's syntax is straightforward and uses natural language conventions, making it easier for beginners to understand the code.\n* **Simple data types:** Python has basic data types like integers, strings, and lists that are easy to grasp and manipulate.\n* **Extensive libraries:** Python has a vast collection of well-documented libraries covering various tasks, allowing beginners to build projects without reinventing the wheel.\n* **Large supportive community:**"]}
Tune the readahead value
If you have workloads that perform sequential I/O, they may benefit from tuning the readahead value. This typically applies to inference or training workloads that need to load AI/ML model weights into memory. Most workloads with sequential I/O typically see a performance improvement with a readahead value of 1024 KB or higher.
You can specify this option through the read_ahead_kb
mount option when you
statically provision a new PersistentVolume or when you modify an existing
dynamically provisioned PersistentVolume.
The following example shows how you can tune the readahead value to 4096 KB.
apiVersion: v1
kind: PersistentVolume
name: DISK_NAME
spec:
accessModes:
- ReadOnlyMany
capacity:
storage: 300Gi
csi:
driver: pd.csi.storage.gke.io
fsType: ext4
readOnly: true
volumeHandle: projects/PROJECT/zones/ZONE/disks/DISK_NAME
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.gke.io/zone
operator: In
values:
- ZONE
storageClassName: hyperdisk-ml
mountOptions:
- read_ahead_kb=4096
Replace the following values:
- DISK_NAME: the name of the pre-existing Google Cloud Hyperdisk volume.
- ZONE: the zone where the pre-existing Google Cloud Hyperdisk volume is created.
Test and benchmark your Hyperdisk ML volume performance
This section shows how you can use Flexible I/O Tester (FIO) to benchmark the performance of your Hyperdisk ML volumes for reading pre-existing data . You can use these metrics to evaluate your volume's performance for specific workloads and configurations.
Save the following example manifest as
benchmark-job.yaml
:apiVersion: batch/v1 kind: Job metadata: name: benchmark-job spec: template: # Template for the Pods the Job will create spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cloud.google.com/compute-class operator: In values: - "Performance" - matchExpressions: - key: cloud.google.com/machine-family operator: In values: - "c3" containers: - name: fio resources: requests: cpu: "32" image: litmuschaos/fio args: - fio - --filename - /models/gemma-7b/model-00001-of-00004.safetensors:/models/gemma-7b/model-00002-of-00004.safetensors:/models/gemma-7b/model-00003-of-00004.safetensors:/models/gemma-7b/model-00004-of-00004.safetensors:/models/gemma-7b/model-00004-of-00004.safetensors - --direct=1 - --rw=read - --readonly - --bs=4096k - --ioengine=libaio - --iodepth=8 - --runtime=60 - --numjobs=1 - --name=read_benchmark volumeMounts: - mountPath: "/models" name: volume restartPolicy: Never volumes: - name: volume persistentVolumeClaim: claimName: hdml-static-pvc parallelism: 1 # Run 1 Pods concurrently completions: 1 # Once 1 Pods complete successfully, the Job is done backoffLimit: 1 # Max retries on failure
Replace CLAIM_NAME with the name of your PersistentVolumeClaim (for example,
hdml-static-pvc
).Create the Job by running the following command:
kubectl apply -f benchmark-job.yaml.
Use
kubectl
logs to view the output of thefio
tool:kubectl logs benchmark-job-nrk88 -f
The output looks similar to the following:
read_benchmark: (g=0): rw=read, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=8 fio-2.2.10 Starting 1 process read_benchmark: (groupid=0, jobs=1): err= 0: pid=32: Fri Jul 12 21:29:32 2024 read : io=18300MB, bw=2407.3MB/s, iops=601, runt= 7602msec slat (usec): min=86, max=1614, avg=111.17, stdev=64.46 clat (msec): min=2, max=33, avg=13.17, stdev= 1.08 lat (msec): min=2, max=33, avg=13.28, stdev= 1.06 clat percentiles (usec): | 1.00th=[11072], 5.00th=[12352], 10.00th=[12608], 20.00th=[12736], | 30.00th=[12992], 40.00th=[13120], 50.00th=[13248], 60.00th=[13376], | 70.00th=[13504], 80.00th=[13632], 90.00th=[13888], 95.00th=[14016], | 99.00th=[14400], 99.50th=[15296], 99.90th=[22144], 99.95th=[25728], | 99.99th=[33024] bw (MB /s): min= 2395, max= 2514, per=100.00%, avg=2409.79, stdev=29.34 lat (msec) : 4=0.39%, 10=0.31%, 20=99.15%, 50=0.15% cpu : usr=0.28%, sys=8.08%, ctx=4555, majf=0, minf=8203 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=4575/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=8 Run status group 0 (all jobs): READ: io=18300MB, aggrb=2407.3MB/s, minb=2407.3MB/s, maxb=2407.3MB/s, mint=7602msec, maxt=7602msec Disk stats (read/write): nvme0n2: ios=71239/0, merge=0/0, ticks=868737/0, in_queue=868737, util=98.72%
Monitor throughput or IOPS on a Hyperdisk ML volume
To monitor the provisioned performance of your Hyperdisk ML volume, see Analyze provisioned IOPS and throughput in the Compute Engine documentation.
To update the provisioned throughput or IOPS of an existing Hyperdisk ML volume, or to learn about additional Google Cloud Hyperdisk parameters you can specify in your StorageClass, refer to Scale your storage performance using Google Cloud Hyperdisk.
Troubleshooting
This section provides troubleshooting guidance to resolve issues with Hyperdisk ML volumes on GKE.
The disk access mode cannot be updated
The following error occurs when a Hyperdisk ML volume is already being used by and attached by a node in ReadWriteOnce access mode.
AttachVolume.Attach failed for volume ... Failed to update access mode:
failed to set access mode for zonal volume ...
'Access mode cannot be updated when the disk is attached to instance(s).'., invalidResourceUsage
GKE automatically updates the Hyperdisk ML volume's accessMode
from READ_WRITE_SINGLE
to READ_ONLY_MANY
, when it is used by a
ReadOnlyMany access mode PersistentVolume. This update is done when the disk is
attached to a new node.
To resolve this issue, delete all Pods that are referencing the disk using a PersistentVolume in ReadWriteOnce mode. Wait for the disk to be detached, and then re-create the workload that consumes the PersistentVolume in ReadOnlyMany mode.
The disk cannot be attached with READ_WRITE
mode
The following error indicates that GKE attempted to attach a
Hyperdisk ML volume in READ_ONLY_MANY
access mode to a GKE
node using ReadWriteOnce access mode.
AttachVolume.Attach failed for volume ...
Failed to Attach: failed cloud service attach disk call ...
The disk cannot be attached with READ_WRITE mode., badRequest
GKE automatically updates the Hyperdisk ML volume's accessMode
from READ_WRITE_SINGLE
to READ_ONLY_MANY
, when it is used by a
ReadOnlyMany access mode PersistentVolume. However, GKE won't
automatically update the access mode from READ_ONLY_MANY
to READ_WRITE_SINGLE
.
This is a safety mechanism to ensure that multi-zone disks are
not written to by accident, as this could result in diverging content between
multi-zone disks.
To resolve this issue, we recommend that you follow the Pre-cache data to a Persistent Disk disk image workflow if you need updated content. If you need more control over the Hyperdisk ML volume's access mode and other settings, see Modify the settings for a Google Cloud Hyperdisk volume.
Quota exceeded - Insufficient throughput quota
The following error indicates that there was insufficient Hyperdisk ML throughput quota at the time of disk provisioning.
failed to provision volume with StorageClass ... failed (QUOTA_EXCEEDED): Quota 'HDML_TOTAL_THROUGHPUT' exceeded
To resolve this issue, see Disk Quotas to learn more about Hyperdisk quota and how to increase the disk quota in your project.
For additional troubleshooting guidance, refer to Scale your storage performance with Google Cloud Hyperdisk.
What's next
- Learn how to migrate Persistent Disk volumes to Hyperdisk.
- Read more about the Persistent Disk CSI driver on GitHub.