Transfer data from Cloud Storage during dynamic provisioning using GKE Volume Populator


GKE Volume Populator is available by invitation only. If you'd like to request access to GKE Volume Populator in your Google Cloud project, contact your sales representative.

GKE Volume Populator lets you preload data from a source storage to a destination PersistentVolumeClaim during dynamic provisioning, without the need to run additional scripts or CLI commands for manual data transfer. This feature handles automating and streamlining the data transfer process by leveraging the Kubernetes Volume Populator feature. It provides seamless data portability so that you can swap storage types to benefit from price or performance optimizations.

Use this feature if you need to transfer large amounts of data from Cloud Storage buckets to a PersistentVolumeClaim backed by another Google Cloud storage type (such as Parallelstore).

You primarily interact with GKE Volume Populator through the gcloud CLI and kubectl CLI. GKE Volume Populator is supported on both Autopilot and Standard clusters. You don't need to enable the GKE Volume Populator. It's a GKE managed component that's enabled by default.

Benefits

  • If you want to take advantage of the performance of a managed parallel file system, but your data is stored in Cloud Storage, you can use GKE Volume Populator to simplify data transfer.
  • GKE Volume Populator allows for data portability; you can move data per your needs.
  • GKE Volume Populator supports IAM-based authentication so you can transfer data while maintaining fine-grained access control.

Data transfer from source data storage and creation of PV for destination storage using the GKE Volume Populator

The diagram shows how data flows from the source storage to the destination storage, and the creation of the PersistentVolume for the destination storage using GKE Volume Populator.

Limitations

  • GKE Volume Populator only supports Cloud Storage buckets as the source storage and Parallelstore instances as the destination storage type.
  • GKE Volume Populator only supports StorageClass resources that have their volumeBindingMode set to Immediate.
  • The GCPDataSource custom resource must be in the same namespace as your Kubernetes workload. Volumes with cross-namespace data sources are not supported.
  • GKE Volume Populator only supports Workload Identity Federation for GKE binding of IAM service accounts to a Kubernetes service account. Granting IAM permissions to the Kubernetes service account directly is not supported.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Parallelstore API and the Google Kubernetes Engine API.
  • Enable APIs
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Requirements

To use GKE Volume Populator, your clusters must meet the following requirements:

  • Use GKE cluster version 1.31.1-gke.1729000 or later.
  • Have the Parallelstore CSI driver enabled. GKE enables the CSI driver for you by default on new and existing GKE Autopilot clusters. On new and existing Standard clusters, you'll need to enable the CSI driver.

Prepare your environment

This section covers the steps to create your GKE clusters and set up the necessary permissions to use GKE Volume Populator.

Set up your VPC network

You must specify the same Virtual Private Cloud (VPC) network when creating the Parallelstore instance and client Compute Engine VMs or GKE clusters. To enable VPC to privately connect to Google Cloud services without exposing traffic to the public internet, you need to do a one-time configuration of private services access (PSA), if you have not already done so.

To configure PSA, follow these steps:

  1. Configure the Compute Network Admin (roles/compute.networkAdmin) IAM permission in order to set up network peering for your project.

    To grant the role, run the following command:

    gcloud projects add-iam-policy-binding PROJECT_ID \
        --member="user:EMAIL_ADDRESS" \
        --role=roles/compute.networkAdmin
    

    Replace EMAIL_ADDRESS with your email address.

  2. Enable service networking:

    gcloud services enable servicenetworking.googleapis.com
    
  3. Create a VPC network:

    gcloud compute networks create NETWORK_NAME \
      --subnet-mode=auto \
      --mtu=8896 \
      --project=PROJECT_ID
    

    Replace the following:

    • NETWORK_NAME: the name of the VPC network where you will create your Parallelstore instance.
    • PROJECT_ID: your Google Cloud project ID.
  4. Create an IP range.

    Private services access requires an IP address range (CIDR block) with prefix length of at least /24 (256 addresses). Parallelstore reserves 64 addresses per instance, which means that you can re-use this IP range with other services or other Parallelstore instances if needed.

    gcloud compute addresses create IP_RANGE_NAME \
      --global \
      --purpose=VPC_PEERING \
      --prefix-length=24 \
      --description="Parallelstore VPC Peering" \
      --network=NETWORK_NAME \
      --project=PROJECT_ID
    

    Replace IP_RANGE_NAME with the name of the VPC network IP range name.

  5. Set an environment variable with the CIDR range associated with the range you created in the previous step:

    CIDR_RANGE=$(
      gcloud compute addresses describe IP_RANGE_NAME \
        --global  \
        --format="value[separator=/](address, prefixLength)" \
        --project=PROJECT_ID \
    )
    
  6. Create a firewall rule to allow TCP traffic from the IP range you created:

    gcloud compute firewall-rules create FIREWALL_NAME \
      --allow=tcp \
      --network=NETWORK_NAME \
      --source-ranges=$CIDR_RANGE \
      --project=PROJECT_ID
    

    Replace FIREWALL_NAME with the name of the firewall rule to allow TCP traffic from the IP range you will create.

  7. Connect the peering:

    gcloud services vpc-peerings connect \
      --network=NETWORK_NAME \
      --ranges=IP_RANGE_NAME \
      --project=PROJECT_ID \
      --service=servicenetworking.googleapis.com
    

If you encounter issues while setting up the VPC network, check the Parallelstore troubleshooting guide.

Create your GKE cluster

We recommend that you use an Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workload needs, see Choose a GKE mode of operation.

Autopilot

To create a GKE cluster using Autopilot, run the following command:

gcloud container clusters create-auto CLUSTER_NAME  \
    --network=NETWORK_NAME  \
    --cluster-version=CLUSTER_VERSION \
    --location=CLUSTER_LOCATION

GKE enables Workload Identity Federation for GKE and the Parallelstore CSI Driver by default in Autopilot clusters.

Replace the following values:

  • CLUSTER_NAME: the name of your cluster.
  • CLUSTER_VERSION : the GKE version number. You must specify 1.31.1-gke.1729000 or later.
  • NETWORK_NAME: the name of the VPC network you created for the Parallelstore instance. To learn more, see Configure a VPC network.
  • CLUSTER_LOCATION: the region where you want to create your cluster. We recommend that you create the cluster in a supported Parallelstore location for best performance. If you want to create your cluster in a non-supported Parallelstore location, when creating a Parallelstore StorageClass, you must specify a custom topology that uses supported Parallelstore location, otherwise provisioning will fail.

Standard

Create a Standard cluster with the Parallelstore CSI Driver and Workload Identity Federation for GKE enabled using the following command:

gcloud container clusters create CLUSTER_NAME \
    --addons=ParallelstoreCsiDriver \
    --cluster-version=CLUSTER_VERSION \
    --workload-pool=PROJECT_ID.svc.id.goog \
    --network=NETWORK_NAME \
    --location=CLUSTER_LOCATION

Replace the following values:

  • CLUSTER_NAME: the name of your cluster.
  • CLUSTER_VERSION: the GKE version number. You must specify 1.31.1-gke.1729000 or later.
  • PROJECT_ID: your Google Cloud project ID.
  • NETWORK_NAME: the name of the VPC network you created for the Parallelstore instance. To learn more, see Configure a VPC network.
  • CLUSTER_LOCATION: the region or zone where you want to create your cluster. We recommend that you create the cluster in a supported Parallelstore location for best performance. If you want to create your cluster in a non-supported Parallelstore location, when creating a Parallelstore StorageClass, you must specify a custom topology that uses supported Parallelstore location, otherwise provisioning will fail.

Set up necessary permissions

To transfer data from a Cloud Storage bucket, you need to set up permissions for Workload Identity Federation for GKE.

  1. Create a Kubernetes namespace:

    kubectl create namespace NAMESPACE
    

    Replace NAMESPACE with the namespace that your workloads will run on.

  2. Create a Kubernetes service account.

    kubectl create serviceaccount KSA_NAME \
        --namespace=NAMESPACE
    

    Replace KSA_NAME with the name of the Kubernetes service account that your Pod uses to authenticate to Google Cloud APIs.

  3. Create an IAM service account. You can also use any existing IAM service account in any project in your organization:

    gcloud iam service-accounts create IAM_SA_NAME \
        --project=PROJECT_ID
    

    Replace the following:

    • IAM_SA_NAME: the name for your IAM service account.
    • PROJECT_ID: your Google Cloud project ID.
  4. Grant your IAM service account the role roles/storage.objectViewer so that it can access your Cloud Storage bucket:

    gcloud storage buckets \
        add-iam-policy-binding gs://GCS_BUCKET \
        --member "serviceAccount:IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
        --role "roles/storage.objectViewer"
    

    Replace GCS_BUCKET with your Cloud Storage bucket name.

  5. Create the IAM allow policy that gives the Kubernetes service account access to impersonate the IAM service account:

    gcloud iam service-accounts \
        add-iam-policy-binding IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com \
        --role roles/iam.workloadIdentityUser \
        --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
    
  6. Annotate the Kubernetes service account so that GKE sees the link between the service accounts.

    kubectl annotate serviceaccount KSA_NAME \
        --namespace NAMESPACE \
        iam.gke.io/gcp-service-account=IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com
    
  7. Create the Parallelstore service identity:

    gcloud beta services identity create \
        --service=parallelstore.googleapis.com \
        --project=PROJECT_ID
    
  8. Grant the Parallelstore service identity the role roles/iam.serviceAccountTokenCreator to allow it to impersonate the IAM service account. Set the PROJECT_NUMBER environment variable so you can use it in subsequent steps.

    export PROJECT_NUMBER=$(gcloud projects describe PROJECT_ID --format="value(projectNumber)")
    gcloud iam service-accounts \
        add-iam-policy-binding "IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
        --member=serviceAccount:"service-${PROJECT_NUMBER?}@gcp-sa-parallelstore.iam.gserviceaccount.com" \
        --role=roles/iam.serviceAccountTokenCreator
    

    The PROJECT_NUMBER value is the automatically generated unique identifier for your project. To find this value, refer to Creating and managing projects.

  9. Grant the Parallelstore service identity the role roles/iam.serviceAccountUser to allow it to access all the resources that the IAM service account can access:

    gcloud iam service-accounts \
        add-iam-policy-binding "IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
        --member=serviceAccount:"service-${PROJECT_NUMBER?}@gcp-sa-parallelstore.iam.gserviceaccount.com" \
        --role=roles/iam.serviceAccountUser
    
  10. Grant the GKE service identity the role roles/iam.serviceAccountUser to allow it to access all the resources that the IAM service account can access. This step is not required if the GKE cluster and the IAM service account are in the same project.

    gcloud iam service-accounts \
        add-iam-policy-binding "IAM_SA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
        --member=serviceAccount:"service-${PROJECT_NUMBER?}@container-engine-robot.iam.gserviceaccount.com" \
        --role=roles/iam.serviceAccountUser
    

Create a Parallelstore volume with preloaded data

The following sections describe the typical process for creating a Parallelstore volume with data preloaded from a Cloud Storage bucket, using the GKE Volume Populator.

  1. Create a GCPDataSource resource.
  2. Create a Parallelstore StorageClass.
  3. Create a PersistentVolumeClaim to access the volume.
  4. Verify that the PersistentVolumeClaim provisioning completed.
  5. (Optional) View the data transfer progress.
  6. Create a workload that consumes the volume.

Create a GCPDataSource resource

To use GKE Volume Populator, create a GCPDataSource custom resource. This resource defines the source storage properties to use for volume population.

  1. Save the following manifest in a file named gcpdatasource.yaml.

    apiVersion: datalayer.gke.io/v1
    kind: GCPDataSource
    metadata:
      name: GCP_DATA_SOURCE
      namespace: NAMESPACE
    spec:
      cloudStorage:
        serviceAccountName: KSA_NAME
        uri: gs://GCS_BUCKET/
    

    Replace the following values:

    • GCP_DATA_SOURCE: the name of the GCPDataSource CRD that holds a reference to your Cloud Storage bucket. See the GCPDataSource CRD reference for more details.
    • NAMESPACE: the namespace that your workloads will run on. The namespace value should be the same as your workload namespace.
    • KSA_NAME: the name of the Kubernetes service account that your Pod uses to authenticate to Google Cloud APIs. The cloudStorage.serviceAccountName value should be the Kubernetes service account you set up for Workload Identity Federation for GKE in the Set up necessary permissions step.
    • GCS_BUCKET: your Cloud Storage bucket name. Alternatively, you can also specify gs://GCS_BUCKET/PATH_INSIDE_BUCKET/ for the uri field.
  2. Create the GCPDataSource resource by running this command:

    kubectl apply -f gcpdatasource.yaml
    

Create a Parallelstore StorageClass

Create a StorageClass to direct the Parallelstore CSI driver to provision Parallelstore instances in the same region as your GKE cluster. This ensures optimal I/O performance.

  1. Save the following manifest as parallelstore-class.yaml. Make sure that the volumeBindingMode field in the StorageClass definition is set to Immediate.

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: parallelstore-class
    provisioner: parallelstore.csi.storage.gke.io
    volumeBindingMode: Immediate
    reclaimPolicy: Delete
    
  2. Create the StorageClass by running this command:

    kubectl apply -f parallelstore-class.yaml
    

If you want to create a custom StorageClass with a specific topology, refer to the Parallelstore CSI guide.

Create a PersistentVolumeClaim to access the volume

The following manifest file shows an example of how to create a PersistentVolumeClaim in ReadWriteMany access mode that references the StorageClass you created earlier.

  1. Save the following manifest in a file named volume-populator-pvc.yaml:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: PVC_NAME
      namespace: NAMESPACE
    spec:
      accessModes:
      - ReadWriteMany
      storageClassName: parallelstore-class
      resources:
        requests:
          storage: 12Gi
      dataSourceRef:
        apiGroup: datalayer.gke.io
        kind: GCPDataSource
        name: GCP_DATA_SOURCE
    

    Replace the following values:

    • PVC_NAME: the name of the PersistentVolumeClaim where you want to transfer your data. The PersistentVolumeClaim must be backed by a Parallelstore instance.
    • NAMESPACE: the namespace where your workloads will run. The namespace value should be the same as your workload namespace.
    • GCP_DATA_SOURCE: the name of the GCPDataSource CRD that holds a reference to your Cloud Storage bucket. See the GCPDataSource CRD reference for more details.
  2. Create the PersistentVolumeClaim by running the following command:

    kubectl apply -f volume-populator-pvc.yaml
    

GKE won't schedule the workload Pod until the PersistentVolumClaim provisioning is complete. To check on your data transfer progress, see View the data transfer progress. If you encounter errors during provisioning, refer to Troubleshooting.

Verify that the PersistentVolumeClaim provisioning completed

GKE Volume Populator uses a temporary PersistentVolumeClaim in the gke-managed-volumepopulator namespace for volume provisioning.

The temporary PersistentVolumeClaim is essentially a snapshot of your PersistentVolumeClaim that is still in transit (waiting for data to be fully loaded). Its name has the format prime-YOUR_PVC_UID.

To check its status:

  1. Run the following commands:

    PVC_UID=$(kubectl get pvc PVC_NAME -n NAMESPACE -o yaml | grep uid | awk '{print $2}')
    
    TEMP_PVC=prime-$PVC_UID
    
    echo $TEMP_PVC
    
    kubectl describe pvc ${TEMP_PVC?} -n gke-managed-volumepopulator
    

    If the output is empty, this means the temporary PersistentVolumeClaim was not created. In that case, refer to the Troubleshooting section.

    If provisioning is successful, the output is similar to the following. Look for the ProvisioningSucceeded log:

    Warning  ProvisioningFailed     9m12s                   parallelstore.csi.storage.gke.io_gke-10fedd76bae2494db688-2237-793f-vm_5f284e53-b25c-46bb-b231-49e894cbba6c  failed to provision volume with StorageClass "parallelstore-class": rpc error: code = DeadlineExceeded desc = context deadline exceeded
    Warning  ProvisioningFailed     3m41s (x11 over 9m11s)  parallelstore.csi.storage.gke.io_gke-10fedd76bae2494db688-2237-793f-vm_5f284e53-b25c-46bb-b231-49e894cbba6c  failed to provision volume with StorageClass "parallelstore-class": rpc error: code = DeadlineExceeded desc = Volume pvc-808e41a4-b688-4afe-9131-162fe5d672ec not ready, current state: CREATING
    Normal   ExternalProvisioning   3m10s (x43 over 13m)    persistentvolume-controller                                                                                  Waiting for a volume to be created either by the external provisioner 'parallelstore.csi.storage.gke.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
    Normal  Provisioning  8s (x13 over 10m)  "xxx"  External provisioner is provisioning volume for claim "xxx"
    Normal  ProvisioningSucceeded  7s  "xxx"  Successfully provisioned volume "xxx"
    
  2. Check if the Parallelstore instance creation has started.

    gcloud beta parallelstore instances list \
        --project=PROJECT_ID \
        --location=-
    

    The output is similar to the following. Verify that your volume is in the CREATING state. When the Parallelstore instance creation is finished, the state will change to ACTIVE.

    "projects/PROJECT_ID/locations/<my-location>/<my-volume>"  12000  2024-10-09T17:59:42.582857261Z  2024-10-09T17:59:42.582857261Z  CREATING  projects/PROJECT_ID/global/NETWORK_NAME
    

If provisioning failed, refer to the Parallelstore troubleshooting guide for additional guidance.

(Optional) View the data transfer progress

This section shows how you can track the progress of your data transfers from a Cloud Storage bucket to a Parallelstore volume. You can do this to monitor the status of your transfer and ensure that your data is copied successfully. You should also run this command if your PersistentVolumeClaim binding operation is taking too long.

  1. Verify the status of your PersistentVolumeClaim by running the following command:

    kubectl describe pvc PVC_NAME -n NAMESPACE
    
  2. Check the PersistentVolumeClaim events message to find the data transfer progress. GKE logs the messages about once per minute. The output is similar to the following:

    Reason                          Message
    ------                          -------
    PopulateOperationStartSuccess   Populate operation started
    PopulateOperationStartSuccess   Populate operation started
    Provisioning                    External provisioner is provisioning volume for claim "my-namespace/my-pvc"
    Provisioning                    Assuming an external populator will provision the volume
    ExternalProvisioning            Waiting for a volume to be created either by the external provisioner 'parallelstore.csi.storage.gke.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
    PopulateOperationStartSuccess   Populate operation started
    PopulatorPVCCreationProgress    objects found 7, objects copied 7, objects skipped 0. bytes found 1000020010, bytes copied 1000020010, bytes skipped 0
    PopulateOperationFinished       Populate operation finished
    PopulatorFinished               Populator finished
    

It can take some time for the populate operation to start; this operation is dependent on file size. If you don't see any data transfer progress after several minutes, refer to the Troubleshooting section.

Create a workload that consumes the volume

This section shows an example of how to create a Pod that consumes the PersistentVolumeClaim resource you created earlier.

  1. Save the following YAML manifest for your Pod as pod.yaml.

    apiVersion: v1
    kind: Pod
    metadata:
      name: POD_NAME
      namespace: NAMESPACE
    spec:
      volumes:
      - name: parallelstore-volume
        persistentVolumeClaim:
          claimName: PVC_NAME
      containers:
      - image: nginx
        name: nginx
        volumeMounts:
        - name: parallelstore-volume
          mountPath: /mnt/data
    

    Replace the following values:

    • POD_NAME: the name of the Pod that runs your workload.
    • NAMESPACE: the namespace where your workloads will run. The namespace value should be the same as your workload namespace.
    • PVC_NAME: the name of the PersistentVolumeClaim where you want to transfer your data. The PersistentVolumeClaim must be backed by a Parallelstore instance.
  2. Run the following command to apply the manifest to the cluster:

    kubectl apply -f pod.yaml
    
  3. Check the status of your Pod and wait until its status is RUNNING. Your PersistentVolumeClaim should be bound before the workload can run.

    kubectl describe pod POD_NAME -n NAMESPACE
    
  4. Verify that the files were successfully transferred and can be accessed by your workload.

    kubectl exec -it POD_NAME -n NAMESPACE -c nginx -- /bin/sh
    

    Change to the /mnt/data directory and run ls:

    cd /mnt/data
    ls
    

    The output should list all the files that exist in your Cloud Storage bucket URI.

Delete a PersistentVolumeClaim during dynamic provisioning

If you need to delete your PersistentVolumeClaim while data is still being transferred during dynamic provisioning, you have two options: graceful deletion and forced deletion.

Graceful deletion requires less effort, but can be more time-consuming and doesn't account for user misconfiguration that prevents data transfer from completing. Forceful deletion offers a faster alternative that allows for greater flexibility and control; this option is suitable when you need to quickly restart or correct misconfigurations.

Graceful deletion

Use this deletion option to ensure that the data transfer process is completed before GKE deletes the associated resources.

  1. Delete the workload Pod if it exists, by running this command:

    kubectl delete pod POD_NAME -n NAMESPACE
    
  2. Find the name of the temporary PersistentVolumeClaim:

    PVC_UID=$(kubectl get pvc PVC_NAME -n NAMESPACE -o yaml | grep uid | awk '{print $2}')
    TEMP_PVC=prime-$PVC_UID
    
    echo $TEMP_PVC
    
  3. Find the name of the PersistentVolume:

    PV_NAME=$(kubectl describe pvc ${TEMP_PVC?} -n gke-managed-volumepopulator | grep "Volume:" | awk '{print $2}')
    
    echo ${PV_NAME?}
    

    If the output is empty, that means that the PersistentVolume has not been created yet.

  4. Delete your PersistentVolumeClaim by running this command. The finalizer will block your deletion operation. Press Ctrl+C, then move on to the next step.

    kubectl delete pvc PVC_NAME -n NAMESPACE
    

    Wait for data transfer to complete. GKE will eventually delete the PersistentVolumeClaim, PersistentVolume, and Parallelstore instance.

  5. Check that the temporary PersistentVolumeClaim, PersistentVolumeClaim, and PersistentVolume resources are deleted:

    kubectl get pvc,pv -A | grep -E "${TEMP_PVC?}|PVC_NAME|${PV_NAME?}"
    
  6. Check that the Parallelstore instance is deleted. The Parallelstore instance will share the same name as the PersistentVolume. You don't need to run this command if you confirmed in Step 3 that the PersistentVolume was not created.

    gcloud beta parallelstore instances list \
        --project=PROJECT_ID \
        --location=- | grep ${PV_NAME?}
    

Forced deletion

Use this deletion option when you need to delete a PersistentVolumeClaim and its associated resources before the data transfer process is complete. This might be necessary in situations where the data transfer is taking too long or has encountered errors, or if you need to reclaim resources quickly.

  1. Delete the workload Pod if it exists:

    kubectl delete pod POD_NAME -n NAMESPACE
    
  2. Update the PersistentVolume reclaim policy to Delete. This ensures that the PersistentVolume, along with the underlying storage, is automatically deleted when the associated PersistentVolumeClaim is deleted.

    Skip the following command if any of the following apply:

    • You don't want to delete the PersistentVolume or the underlying storage.
    • Your current reclaim policy is Retain and you want to keep the underlying storage. Clean up the PersistentVolume and storage instance manually as needed.
    • The following echo $PV_NAME command outputs an empty string, that means that the PersistentVolume has not been created yet.
    PV_NAME=$(kubectl describe pvc $TEMP_PVC -n gke-managed-volumepopulator | grep "Volume:" | awk '{print $2}')
    
    echo $PV_NAME
    
    kubectl patch pv $PV_NAME -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
    
  3. Find the name of the temporary PersistentVolumeClaim and set the environment variable for a later step:

    PVC_UID=$(kubectl get pvc PVC_NAME -n NAMESPACE -o yaml | grep uid | awk '{print $2}')
    
    TEMP_PVC=prime-$PVC_UID
    
    echo $TEMP_PVC
    
  4. Delete the PersistentVolumeClaim by running this command. The finalizer will block your deletion operation. Press Ctrl+C, then move on to the next step.

    kubectl delete pvc PVC_NAME -n NAMESPACE
    
  5. Remove the datalayer.gke.io/populate-target-protection finalizer from your PersistentVolumeClaim. This step is needed after deleting the PersistentVolumeClaim, otherwise gke-volume-populator adds the finalizer back to the PersistentVolumeClaim.

    kubectl get pvc PVC_NAME -n NAMESPACE -o=json | \
    jq '.metadata.finalizers = null' | kubectl apply -f -
    
  6. Delete the temporary PersistentVolumeClaim in the gke-managed-volumepopulator namespace.

    kubectl delete pvc $TEMP_PVC -n gke-managed-volumepopulator
    
  7. Check that the temporary PersistentVolumeClaim, PersistentVolumeClaim, and PersistentVolume resources are deleted:

    kubectl get pvc,pv -A | grep -E "${TEMP_PVC?}|PVC_NAME|${PV_NAME?}"
    
  8. Check that the Parallelstore instance is deleted. The Parallelstore instance will share the same name as the PersistentVolume. You don't need to run this command if you confirmed in Step 2 that the PersistentVolume was not created.

    gcloud beta parallelstore instances list \
        --project=PROJECT_ID \
        --location=- | grep ${PV_NAME?}
    

Troubleshooting

This section shows you how to resolve issues related to GKE Volume Populator.

Before proceeding, run the following command to check for PersistentVolumeClaim event warnings:

kubectl describe pvc PVC_NAME -n NAMESPACE

Error: An internal error has occurred

If you encounter the following error, this indicates that a Parallelstore API internal error has occurred.

Warning
PopulateOperationStartError
gkevolumepopulator-populator                                                            Failed to start populate operation: populate data for PVC "xxx". Import data failed, error: rpc error: code = Internal desc = An internal error has occurred ("xxx")

To resolve this issue, you'll need to follow these steps to gather data for Support:

  1. Run the following commands to get the name of the temporary PersistentVolumeClaim, replacing placeholders with the actual names:

    PVC_UID=$(kubectl get pvc PVC_NAME -n NAMESPACE -o yaml | grep uid | awk '{print $2}')
    
    TEMP_PVC=prime-${PVC_UID?}
    
    echo ${TEMP_PVC?}
    
  2. Run the following command to get the volume name:

    PV_NAME=$(kubectl describe pvc ${TEMP_PVC?} -n gke-managed-volumepopulator | grep "Volume:" | awk '{print $2}')
    
  3. Contact the support team with the error message, your project name, and the volume name.

Permission issues

If you encounter errors like the following during volume population, it indicates GKE encountered a permissions problem:

  • Cloud Storage bucket doesn't exist: PopulateOperationStartError with code = PermissionDenied
  • Missing permissions on the Cloud Storage bucket or service accounts: PopulateOperationFailed with "code: "xxx" message:"Verify if bucket "xxx" exists and grant access".
  • Service account not found: PopulateOperationStartError with code = Unauthenticated.

To resolve these, double-check the following:

  • Cloud Storage bucket access: Verify the bucket exists and the service account has the roles/storage.objectViewer permission.
  • Service accounts: Confirm both the Kubernetes service account and the IAM service account exist and are correctly linked.
  • Parallelstore service account: Ensure it exists and has the necessary permissions (roles/iam.serviceAccountTokenCreator and roles/iam.serviceAccountUser on the IAM account).

For detailed steps and verification commands, refer to Set up necessary permissions. If errors persist, contact support with the error message, your project name, and the Cloud Storage bucket name.

Invalid argument errors

If you encounter InvalidArgument errors, it means you've likely provided incorrect values in either the GCPDataSource or PersistentVolumeClaim. The error log will pinpoint the exact fields containing the invalid data. Check your Cloud Storage bucket URI and other relevant fields for accuracy.

What's next