This guide describes how you can connect to an existing Parallelstore instance with the GKE Parallelstore CSI driver with static provisioning. This lets you access existing fully managed Parallelstore instances as volumes for your stateful workloads, in a controlled and predictable way.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Parallelstore API and the Google Kubernetes Engine API. Enable APIs
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
- See the CSI driver overview for limitations and requirements.
- Create a Parallelstore instance if you haven't done so already.
- Configure a VPC network.
- If you want to use a GKE Standard cluster, make sure to enable the CSI driver.
Access an existing Parallelstore instance using the Parallelstore CSI driver
If you have already provisioned a Parallelstore instance within the same network as your GKE cluster, you can follow these instructions to statically provision a PersistentVolume that refers to your instance.
The following sections describe the typical process for accessing an existing Parallelstore instance using the Parallelstore CSI driver:
- Create a PersistentVolume that refers to the Parallelstore instance..
- Use a PersistentVolumeClaim to access the volume.
- (Optional) Configure resources for the sidecar container.
- Create a workload that consumes the volume.
Create a PersistentVolume
This section shows an example of how you can create a PersistentVolume that references an existing Parallelstore instance.
Run the following command to locate your Parallelstore instance.
gcloud beta parallelstore instances list \ --project=PROJECT_ID \ --location=LOCATION
Replace the following:
- PROJECT_ID: the Google Cloud project ID.
- LOCATION: the Compute Engine zone containing the cluster. You must specify a supported zone for the Parallelstore CSI driver.
The output should look similar to the following. Make sure to note down the Parallelstore instance name and the IP access points, before you proceed to the next step.
NAME capacity DESCRIPTION CREATE_TIME UPDATE_TIME STATE network RESERVED_IP_RANGE ACCESS_POINTS projects/my-project/locations/us-central1-a/instances/pvc-eff1ed02-a8ed-48d2-9902-bd70a2d60563 12000 2024-03-06T19:18:26.036463730Z 2024-03-06T19:24:44.561441556Z ACTIVE 10.51.110.2,10.51.110.4,10.51.110.3
Save the following manifest in a file named
parallelstore-pv.yaml
:apiVersion: v1 kind: PersistentVolume metadata: name: parallelstore-pv spec: storageClassName: "STORAGECLASS_NAME" capacity: storage: STORAGE_SIZE accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain volumeMode: Filesystem csi: driver: parallelstore.csi.storage.gke.io volumeHandle: "PROJECT_ID/LOCATION/INSTANCE_NAME/default-pool/default-container" volumeAttributes: accessPoints: ACCESS_POINTS network: NETWORK_NAME claimRef: name: parallelstore-pvc namespace: default
Replace the following:
- PROJECT_ID: the Google Cloud project ID
- LOCATION: the zonal location of your Parallelstore instance. You must specify a supported zone for the Parallelstore CSI driver.
- INSTANCE_NAME: the name of your Parallelstore
instance. An example of a valid
volumeHandle
value is"my-project/us-central1-a/pvc-eff1ed02-a8ed-48d2-9902-bd70a2d60563/default-pool/default-container".
- ACCESS_POINTS: the access points of your
Parallelstore instance; for example,
10.51.110.2,10.51.110.4,10.51.110.3
. - NETWORK_NAME: the VPC network where your Parallelstore instance can be accessed.
- STORAGECLASS_NAME: the name of your StorageClass. This can be an empty string, but must match the specification in your PersistentVolumeClaim.
- STORAGE_SIZE: the storage size; for example,
12000Gi
.
For the full list of fields supported in the PersistentVolume object, refer to the Parallelstore CSI reference documentation.
Create the PersistentVolume by running this command:
kubectl apply -f parallelstore-pv.yaml
Use a PersistentVolumeClaim to access the volume
You can create a PersistentVolumeClaim resource that references the Parallelstore CSI driver's StorageClass.
The following manifest file shows an example of how to create a
PersistentVolumeClaim in ReadWriteMany
access mode
that references the StorageClass you created earlier.
Save the following manifest in a file named
parallelstore-pvc.yaml
:kind: PersistentVolumeClaim apiVersion: v1 metadata: name: parallelstore-pvc namespace: default spec: accessModes: - ReadWriteMany storageClassName: STORAGECLASS_NAME resources: requests: storage: STORAGE_SIZE
Replace the following:
- STORAGECLASS_NAME: the name of your StorageClass. It must match the specification in your PersistentVolume.
- STORAGE_SIZE: Storage size; for example,
12000Gi
. It must match the specification in your PersistentVolume.
Create the PersistentVolumeClaim by running this command:
kubectl create -f parallelstore-pvc.yaml
(Optional) Configure resources for the sidecar container
When you create a workload Pod that uses Parallelstore-backed volumes, the CSI driver determines whether your volume is based on Parallelstore instances.
If the driver detects that your volume is Parallelstore-based, or if you
specify the annotation gke-parallelstore/volumes: "true"
, the CSI driver
automatically injects a sidecar container named gke-parallelstore-sidecar
into
your Pod. This sidecar container mounts the Parallelstore instance to your
workload.
By default, the sidecar container is configured with the following resource requests, with resource limits unset:
- 250 m CPU
- 512 MiB memory
- 10 MiB ephemeral storage
To overwrite these values, you can optionally specify the annotation
gke-parallelstore/[cpu-request|memory-request|cpu-limit|memory-limit|ephemeral-storage-request]
as shown in the following example:
apiVersion: v1
kind: Pod
metadata:
annotations:
gke-parallelstore/volumes: "true"
gke-parallelstore/cpu-request: 500m
gke-parallelstore/memory-request: 1Gi
gke-parallelstore/ephemeral-storage-request: 500Mi
gke-parallelstore/cpu-limit: 1000m
gke-parallelstore/memory-limit: 2Gi
gke-parallelstore/ephemeral-storage-limit: 1Gi
Use the following considerations when deciding the amount of resources to allocate:
- If one of the request or limit values is set and another is unset, GKE sets them both to the same, specified value.
- Allocate more CPU to the sidecar container if your workloads need higher throughput. Insufficient CPU will cause I/O throttling.
- You can use value "0" to unset any resource limits on Standard clusters;
for example,
gke-parallelstore/memory-limit: "0"
removes the memory limit for the sidecar container. This is useful when you cannot decide on the amount of resourcesgke-parallelstore-sidecar
needs for your workloads, and want to let the sidecar consume all the available resources on a node.
Create a workload that consumes the volume
This section shows an example of how to create a Pod that consumes the PersistentVolumeClaim resource you created earlier.
Multiple Pods can share the same PersistentVolumeClaim resource.
Save the following manifest in a file named
my-pod.yaml
.apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers: - name: nginx image: nginx volumeMounts: - name: parallelstore-volume mountPath: /data volumes: - name: parallelstore-volume persistentVolumeClaim: claimName: parallelstore-pvc
Run the following command to apply the manifest to the cluster:
kubectl apply -f my-pod.yaml
The Pod waits until GKE provisions the PersistentVolumeClaim before it starts running. This operation might take several minutes to complete.
Manage the Parallelstore CSI driver
This section covers how you can enable and disable the Parallelstore CSI driver, if needed.
Enable the Parallelstore CSI driver on a new cluster
To enable the Parallelstore CSI driver when creating a new Standard cluster, run the following command with the Google Cloud CLI:
gcloud container clusters create CLUSTER_NAME \
--location=LOCATION \
--network=NETWORK_NAME \
--addons=ParallelstoreCsiDriver \
--cluster-version=VERSION
Replace the following:
- CLUSTER_NAME: the name of your cluster.
- LOCATION: the Compute Engine zone containing the cluster. You must specify a supported zone for the Parallelstore CSI driver.
- NETWORK_NAME: name of the VPC network you created in Configure a VPC network.
- VERSION: the GKE version number.
You must specify a supported version number to use this feature, such as
GKE version 1.29 or later. Alternatively, you can
use the
--release-channel
flag and specify a release channel.
Enable the Parallelstore CSI driver on an existing cluster
To enable the driver on an existing GKE Standard cluster, run the following command with the Google Cloud CLI:
gcloud container clusters update CLUSTER_NAME \
--location=LOCATION \
--update-addons=ParallelstoreCsiDriver=ENABLED
Replace the following:
- CLUSTER_NAME : the name of your cluster.
- LOCATION: the Compute Engine zone containing the cluster. You must specify a supported zone for the Parallelstore CSI driver.
Make sure that your GKE cluster is running in the same VPC network
that you set up in Configure a VPC network. To verify
the VPC network for a GKE cluster, you can check in the
Google Cloud console, or through the command gcloud container clusters describe $(CLUSTER) --format="value(networkConfig.network)" --location=$(LOCATION)
.
Disable the Parallelstore CSI driver
You can disable the Parallelstore CSI driver on an existing Autopilot or Standard cluster by using the Google Cloud CLI.
gcloud container clusters update CLUSTER_NAME \
--location=LOCATION \
--update-addons=ParallelstoreCsiDriver=DISABLED
Replace the following:
- CLUSTER_NAME : the name of your cluster.
- LOCATION: the Compute Engine zone containing the cluster. You must specify a supported zone for the Parallelstore CSI driver.
Use fsGroup with Parallelstore volumes
The Parallelstore CSI driver supports changing the group ownership of the root level directory of the mounted file system to match a user-requested fsGroup specified in the Pod's SecurityContext. This feature is only supported in GKE clusters version 1.29.5 or later, or version 1.30.1 or later.
Troubleshooting
For troubleshooting guidance, refer to the Troubleshooting page in the Parallelstore documentation.
What's next
- Explore the Parallelstore CSI reference documentation.
- Learn how to use the Parallelstore interception library to improve workload performance.
- Learn how to transfer data to Parallelstore from Cloud Storage.
- Learn how to use the GKE Volume Populator to automate data transfer from a Cloud Storage bucket source storage to a destination PersistentVolumeClaim backed by a Parallelstore instance.
- Try the tutorial to train a TensorFlow model with Keras on GKE.