Troubleshooting storage in GKE

Autopilot Standard

Issues with storage in Google Kubernetes Engine (GKE) clusters can manifest in various ways, from performance bottlenecks and volume mounting failures to errors when using specific disk types with certain machine types. These problems can affect application statefulness, data persistence, and overall workload health.

Use this document to resolve common issues affecting storage functionality in your clusters. Find guidance on troubleshooting problems related to volume provisioning and attachment, data access and performance, and storage capacity management.

This information is important for both Platform admins and operators managing cluster infrastructure and storage and Application developers whose workloads rely on persistent storage. For more information about the common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Error 400: Cannot attach RePD to an optimized VM

Regional persistent disks are restricted from being used with memory-optimized machines or compute-optimized machines.

Consider using a non-regional persistent disk storage class if using a regional persistent disk is not a hard requirement. If using a regional persistent disk is a hard requirement, consider scheduling strategies such as taints and tolerations to ensure that the Pods that need regional persistent disks are scheduled on a node pool that are not optimized machines.

Troubleshooting issues with disk performance

The performance of the boot disk is important because the boot disk for GKE nodes is not only used for the operating system but also for the following:

Docker images.
The container filesystem for what is not mounted as a volume (that is, the overlay filesystem), and this often includes directories like /tmp.
Disk-backed emptyDir volumes, unless the node uses local SSD.

Disk performance is shared for all disks of the same disk type on a node. For example, if you have a 100 GB pd-standard boot disk and a 100 GB pd-standard PersistentVolume with lots of activity, the performance of the boot disk is that of a 200 GB disk. Also, if there is a lot of activity on the PersistentVolume, this impacts the performance of the boot disk as well.

If you encounter messages similar to the following on your nodes, these could be symptoms of low disk performance:

INFO: task dockerd:2314 blocked for more than 300 seconds.

fs: disk usage and inodes count on following dirs took 13.572074343s

PLEG is not healthy: pleg was last seen active 6m46.842473987s ago; threshold is 3m0s

To help resolve such issues, review the following:

Ensure you have consulted the Storage disk type comparisons and chosen a persistent disk type to suit your needs.
This issue often occurs for nodes that use standard persistent disks with a size of less than 200 GB. Consider increasing the size of your disks or switching to SSDs, especially for clusters used in production.
Consider enabling local SSD for ephemeral storage on your node pools. This is particularly effective if you have containers that frequently use emptyDir volumes.

Mounting a volume stops responding due to the `fsGroup` setting

One issue that can cause PersistentVolume mounting to fail is a Pod that is configured with the fsGroup setting. Normally, mounts automatically retry and the mount failure resolves itself. However, if the PersistentVolume has a large number of files, kubelet will attempt to change ownership on each file on the filesystem, which can increase volume mount latency.

Unable to attach or mount volumes for pod; skipping pod ... timed out waiting for the condition

To confirm if a failed mount error is due to the fsGroup setting, you can check the logs for the Pod. If the issue is related to the fsGroup setting, you see the following log entry:

Setting volume ownership for /var/lib/kubelet/pods/POD_UUID and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699

If the PersistentVolume does not mount within a few minutes, try the following steps to resolve this issue:

Reduce the number of files in the Volume.
Stop using the [fsGroup] setting.
Change the application fsGroupChangePolicy to OnRootMismatch.

Slow disk operations cause Pod creation failures

For more information, refer to containerd issue #4604.

Affected GKE node versions: 1.18, 1.19, 1.20.0 to 1.20.15-gke.2100, 1.21.0 to 1.21.9-gke.2000, 1.21.10 to 1.21.10-gke.100, 1.22.0 to 1.22.6-gke.2000, 1.22.7 to 1.22.7-gke.100, 1.23.0 to 1.23.3-gke.700, 1.23.4 to 1.23.4-gke.100

The following example errors might be displayed in the k8s_node container-runtime logs:

Error: failed to reserve container name "container-name-abcd-ef12345678-91011_default_12131415-1234-5678-1234-12345789012_0": name "container-name-abcd-ef12345678-91011_default_12131415-1234-5678-1234-12345789012_0" is reserved for "1234567812345678123456781234567812345678123456781234567812345678"

Mitigation

If Pods are failing, consider using restartPolicy:Always or restartPolicy:OnFailure in your PodSpec.
Increase the boot disk IOPS (for example, upgrade the disk type or increase the disk size).

Fix

This issue is fixed in containerd 1.6.0+. GKE versions with this fix are 1.20.15-gke.2100+, 1.21.9-gke.2000+, 1.21.10-gke.100+, 1.22.6-gke.2000+, 1.22.7-gke.100+, 1.23.3-gke.1700+ and 1.23.4-gke.100+

Volume expansion changes not reflecting in the container file system

When performing volume expansion, always make sure to update the PersistentVolumeClaim. Changing a PersistentVolume directly can result in volume expansion not happening. This could lead to one of the following scenarios:

If a PersistentVolume object is modified directly, both the PersistentVolume and PersistentVolumeClaim values are updated to a new value, but the file system size is not reflected in the container and is still using the old volume size.
If a PersistentVolume object is modified directly, followed by updates to the PersistentVolumeClaim where the status.capacity field is updated to a new size, this can result in changes to the PersistentVolume but not the PersistentVolumeClaim or the container file system.

To resolve this issue, complete the following steps:

Keep the modified PersistentVolume object as it was.
Edit the PersistentVolumeClaim object and set spec.resources.requests.storage to a value that is higher than was used in the PersistentVolume.
Verify if the PersistentVolume is resized to the new value.

After these changes, PersistentVolume, PersistentVolumeClaim and container file system should be automatically resized by the kubelet.

Verify if the changes are reflected in the Pod.

kubectl exec POD_NAME  -- /bin/bash -c "df -h"

Replace POD_NAME with the Pod attached to PersistentVolumeClaim.

The selected machine type should have local SSD(s)

You might encounter the following error when creating a cluster or a node pool that uses Local SSD:

The selected machine type (c3-standard-22-lssd) has a fixed number of local SSD(s): 4. The EphemeralStorageLocalSsdConfig's count field should be left unset or set to 4, but was set to 1.

In the error message, you might see LocalNvmeSsdBlockConfig instead of EphemeralStorageLocalSsdConfig depending on which you specified.

This error occurs when the number of Local SSD disks specified does not match the number of Local SSD disks included with the machine type.

To resolve this issue, specify a number of Local SSD disks that matches the machine type that you want. For third generation machine series, you must omit the Local SSD count flag and the correct value will be configured automatically.

Hyperdisk Storage Pools: Cluster or node pool creation fails

You might encounter the ZONE_RESOURCE_POOL_EXHAUSTED error or similar Compute Engine resource errors when trying to provision Hyperdisk Balanced disks as your node's boot or attached disks in a Hyperdisk Storage Pool.

This happens when you're trying to create a GKE cluster or node pool in a zone that's running low on resources, for example:

The zone might not have enough of the Hyperdisk Balanced disks available.
The zone might not have enough capacity to create the nodes of the machine type you specified, like c3-standard-4.

To resolve this issue:

Select a new zone within the same region with enough capacity for your chosen machine type and where Hyperdisk Balanced Storage Pools are available.
Delete the existing storage pool and recreate it in the new zone. This is because storage pools are zonal resources.
Create your cluster or node pool in the new zone.

What's next

If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
- Opening a support case by contacting Cloud Customer Care.
- Getting support from the community by asking questions on StackOverflow and using the google-kubernetes-engine tag to search for similar issues. You can also join the #kubernetes-engine Slack channel for more community support.
- Opening bugs or feature requests by using the public issue tracker.