Problem
Pods running on a node might be evicted due to node under disk pressure.
Environment
- Google Kubernetes Engine
Solution
Confirm if the disk was actually being used heavily. To do so:
- Find the evicted pod.
- Check the node on which the pod was running, in the Node section.
- Go to the node summary page. A message saying Disk Pressure: False means the device is ok. If it states True, run the following commands:
- To verify disk utilization, we will focus in the /mnt/stateful_partition partition:
df -h
- To check inodes availability:
df -i
- To query kubelet summary endpoint and see if information is consistent with the results of other commands:
curl localhost:10255/stats/summary
- To show bind mounts:
findmnt
- To verify disk utilization, we will focus in the /mnt/stateful_partition partition:
- If it is False, go to kubelet logs and add the following to the query builder, this will give you the previous disk usage as well which should be above 85% (the threshold for the confirmation of the node under disk pressure).
-
("disk" OR "storage" OR "ephemeral" OR "evict")
-
- While checking the logs, you can see who confirmed the eviction (nodefs or imagefs).
- For Nodefs:
- If nodefs is triggering evictions, kubelet sorts Pods based on the usage on nodefs: local volumes + logs of all its containers.
- For Imagefs:
- If imagefs is triggering evictions, kubelet sorts Pods based on the writable layer (When you create a new container, you add a new writable layer on top of the underlying layers. This layer is often called the container layer. All changes made to the running container, such as writing new files, modifying existing files, and deleting files, are written to this thin writable container layer) usage of all its containers.
- For Nodefs:
Cause
This happens when the node is being stressed for the underlying disk usage. It could be due to the nodefs or imagefs. To read more in depth, please refer to Node pressure eviction.