Persistent volumes with persistent disks

This page provides an overview of PersistentVolumes and PersistentVolumeClaims in Kubernetes, and their use with Google Kubernetes Engine (GKE). This page focuses on storage backed by Compute Engine persistent disks.

Overview

PersistentVolume resources are used to manage durable storage in a cluster. In GKE, PersistentVolumes are typically backed by Compute Engine persistent disks. PersistentVolumes can also be used with other storage types like NFS. Filestore is an NFS solution on Google Cloud. To learn how to set up a Filestore instance as an NFS PV solution for your GKE clusters, see Accessing file shares from Google Kubernetes Engine clusters in the Filestore documentation. Refer to the Kubernetes documentation for an extensive overview of PersistentVolumes in general.

Unlike Volumes, the PersistentVolumes lifecycle is managed by Kubernetes. PersistentVolumes can be dynamically provisioned; you do not have to manually create and delete the backing storage.

PersistentVolumes are cluster resources that exist independently of Pods. This means that the disk and data represented by a PersistentVolume continue to exist as the cluster changes and as Pods are deleted and recreated. PersistentVolume resources can be provisioned dynamically through PersistentVolumeClaims, or they can be explicitly created by a cluster administrator.

A PersistentVolumeClaim is a request for and claim to a PersistentVolume resource. PersistentVolumeClaim objects request a specific size, access mode, and StorageClass for the PersistentVolume. If a PersistentVolume that satisfies the request exists or can be provisioned, the PersistentVolumeClaim is bound to that PersistentVolume.

Pods use claims as Volumes. The cluster inspects the claim to find the bound Volume and mounts that Volume for the Pod.

Portability is another advantage of using PersistentVolumes and PersistentVolumeClaims. You can easily use the same Pod specification across different clusters and environments because PersistentVolume is an interface to the actual backing storage.

StorageClasses

Volume implementations such as gcePersistentDisk are configured through StorageClass resources. GKE creates a default StorageClass for you which uses the standard persistent disk type (ext4). The default StorageClass is used when a PersistentVolumeClaim doesn't specify a StorageClassName. You can replace the provided default StorageClass with your own.

If you are using a cluster with Windows node pools, you must create a StorageClass and specify a StorageClassName in the PersistentVolumeClaim because the default fstype, ext4, is not supported with Windows. If you are using a Compute Engine persistent disk, you must use NTFS as the file storage type.

You can create your own StorageClass resources to describe different classes of storage. For example, classes might map to quality-of-service levels, or to backup policies. This concept is sometimes called "profiles" in other storage systems.

Dynamically provisioning PersistentVolumes

Most of the time, you don't need to directly configure PersistentVolume objects or create Compute Engine persistent disks. Instead, you can create a PersistentVolumeClaim and Kubernetes automatically provisions a persistent disk for you.

The following manifest describes a request for a 30 gigabyte (GiB) disk whose access mode allows it to be mounted as read-write by a single node:

# pvc-demo.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-demo
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

When you create this PersistentVolumeClaim with kubectl apply -f pvc-demo.yaml, Kubernetes dynamically creates a corresponding PersistentVolume object. The following example shows the PersistentVolume created.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-cd3fd5e9-695a-11ea-a3da-42010a800003
  uid: ced478c1-695a-11ea-a3da-42010a800003
  annotations:
    kubernetes.io/createdby: gce-pd-dynamic-provisioner
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/gce-pd
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 30Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pvc-demo
    uid: cd3fd5e9-695a-11ea-a3da-42010a800003
  gcePersistentDisk:
    fsType: ext4
    pdName: gke-cluster-1-pvc-cd3fd5e9-695a-11ea-a3da-42010a800003
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - us-central1-c
        - key: failure-domain.beta.kubernetes.io/region
          operator: In
          values:
          - us-central1
  persistentVolumeReclaimPolicy: Delete
  storageClassName: standard
  volumeMode: Filesystem
status:
  phase: Bound

Assuming that you haven't replaced the GKE default storage class, this PersistentVolume is backed by a new, empty Compute Engine persistent disk. You use this disk in a Pod by using the claim as a volume. When you delete a claim, the corresponding PersistentVolume object and the provisioned Compute Engine persistent disk are also deleted.

Should you want to prevent deletion of dynamically provisioned persistent disks, set the reclaim policy of the PersistentVolume resource, or its StorageClass resource, to Retain. In this case, you are charged for the persistent disk for as long as it exists even if there is no PersistentVolumeClaim consuming it. For examples on how to change the reclaim policy, see Change the Reclaim Policy of a PersistentVolume. and StorageClass Resource.

Access modes

PersistentVolumes support the following access modes:

  • ReadWriteOnce: The volume can be mounted as read-write by a single node.
  • ReadOnlyMany: The volume can be mounted read-only by many nodes.
  • ReadWriteMany: The volume can be mounted as read-write by many nodes. PersistentVolumes that are backed by Compute Engine persistent disks don't support this access mode.

Using Compute Engine persistent disks as ReadOnlyMany

ReadWriteOnce is the most common use case for persistent disks and works as the default access mode for most applications. Compute Engine persistent disks also support ReadOnlyMany mode so that many applications or many replicas of the same application can consume the same disk at the same time. An example use case is serving static content across multiple replicas.

Refer to this article for instructions for creating persistent disks for multiple readers.

Using preexisting persistent disks as PersistentVolumes

Dynamically provisioned PersistentVolumes are empty when they are created. If you have an existing Compute Engine persistent disk populated with data, you can introduce it to your cluster by manually creating a corresponding PersistentVolume resource. The persistent disk must be in the same [zone] as the cluster nodes.

Refer to this example of how to create a Persistent Volume backed by a preexisting persistent disk.

Deployments vs. StatefulSets

You can use PersistentVolumeClaims or VolumeClaim templates in higher level controllers such as Deployments or StatefulSets respectively.

Deployments are designed for stateless applications so all replicas of a Deployment share the same PersistentVolumeClaim. Since the replica Pods created are identical to each other, only volumes with modes ReadOnlyMany or ReadWriteMany can work in this setting.

Even Deployments with one replica using a ReadWriteOnce volume are not recommended. This is because the default Deployment strategy creates a second Pod before bringing down the first Pod on a recreate. The Deployment may fail in deadlock as the second Pod can't start because the ReadWriteOnce volume is already in use, and the first Pod won't be removed because the second Pod has not yet started. Instead, use a StatefulSet with ReadWriteOnce volumes.

StatefulSets are the recommended method of deploying stateful applications that require a unique volume per replica. By using StatefulSets with PersistentVolumeClaim templates, you can have applications that can scale up automatically with unique PersistentVolumesClaims associated to each replica Pod.

Regional persistent disks

Regional persistent disks are multi-zonal resources that replicate data between two zones in the same region, and can be used similarly to zonal persistent disks. In the event of a zonal outage or if cluster nodes in one zone become unschedulable, Kubernetes can failover workloads using the volume to the other zone. You can use regional persistent disks to build highly available solutions for stateful workloads on GKE. You must ensure that both the primary and failover zones are configured with enough resource capacity to run the workload.

Regional SSD persistent disks are an option for applications such as databases that require both high availability and high performance. For more details, see Block storage performance comparison.

As with zonal persistent disks, regional persistent disks can be dynamically provisioned as needed or manually provisioned in advance by the cluster administrator. To learn how to add regional persistent disks, see Provisioning regional persistent disks.

Zones in persistent disks

Zonal persistent disks are zonal resources and regional persistent disks are multi-zonal resources. When you add persistent storage to your cluster, unless a zone is specified, GKE assigns the disk to a single zone. GKE chooses the zone at random. Once a persistent disk is provisioned, any Pods referencing the disk are scheduled to the same zone as the disk.

If you dynamically provision a persistent disk in your cluster, we recommend you set the WaitForFirstConsumer volume binding mode on your StorageClass. This setting instructs Kubernetes to provision a persistent disk in the same zone that the Pod gets scheduled to. It respects Pod scheduling constraints such as anti-affinity and node selectors. Anti-affinity on zones allows StatefulSet Pods to be spread across zones along with the corresponding disks.

Following is an example StorageClass for provisioning zonal persistent disks that sets WaitForFirstConsumer:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: slow
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard
  fstype: ext4
volumeBindingMode: WaitForFirstConsumer

For an example using regional persistent disks, see Provisioning regional persistent disks.

What's next