此页面由 Cloud Translation API 翻译。

使用 Managed Lustre CSI 驱动程序访问 GKE 上现有的 Managed Lustre 实例

Autopilot Standard

本指南介绍了如何使用 Managed Lustre CSI 驱动程序连接到现有的 Managed Lustre 实例。这样，您就能够以受控且可预测的方式访问现有的 Managed Lustre 实例，将其用作有状态工作负载的卷。

准备工作

在开始之前，请确保您已执行以下任务：

启用 Google Cloud Managed Lustre API 和 Google Kubernetes Engine API。

启用 API

如果您要使用 Google Cloud CLI 执行此任务，请安装并初始化 gcloud CLI。如果您之前安装了 gcloud CLI，请运行 gcloud components update 命令以获取最新版本。较早版本的 gcloud CLI 可能不支持运行本文档中的命令。
注意：对于现有 gcloud CLI 安装，请务必设置 compute/region 属性。如果您主要使用可用区级集群，请改为设置 compute/zone。通过设置默认位置，您可以避免 gcloud CLI 中出现如下错误：One of [--zone, --region] must be supplied: Please specify location。如果集群的位置与您设置的默认位置不同，您可能需要在某些命令中指定位置。

如需了解限制和要求，请参阅 [CSI 驱动程序概览]。
请务必启用 Managed Lustre CSI 驱动程序。在 Standard 和 Autopilot 集群中，此功能默认处于停用状态。

设置环境变量

设置以下环境变量：

export CLUSTER_NAME=CLUSTER_NAME
export PROJECT_ID=PROJECT_ID
export NETWORK_NAME=LUSTRE_NETWORK
export LOCATION=ZONE

替换以下内容：

CLUSTER_NAME：集群的名称。
PROJECT_ID：您的 Google Cloud 项目 ID。
LUSTRE_NETWORK：GKE 集群和Managed Lustre 实例所在的共享 Virtual Private Cloud 网络。
ZONE：GKE 集群的地理可用区，例如 us-central1-a。

配置 Managed Lustre CSI 驱动程序

本部分介绍了如何视需要启用和停用 Managed Lustre CSI 驱动程序。

在新 GKE 集群上启用 Managed Lustre CSI 驱动程序

如需在创建新的 GKE 集群时启用 Managed Lustre CSI 驱动程序，请按照以下步骤操作：

Autopilot

gcloud container clusters create-auto "${CLUSTER_NAME}" \
    --location=${LOCATION} \
    --network="${NETWORK_NAME}" \
    --cluster-version=1.33.2-gke.1111000 \
    --enable-lustre-csi-driver \
    --enable-legacy-lustre-port

标准版

gcloud container clusters create "${CLUSTER_NAME}" \
    --location=${LOCATION} \
    --network="${NETWORK_NAME}" \
    --cluster-version=1.33.2-gke.1111000 \
    --addons=LustreCsiDriver \
    --enable-legacy-lustre-port

在现有 GKE 集群上启用 Managed Lustre CSI 驱动程序

如果您想在现有 GKE 集群上启用Managed Lustre CSI 驱动程序，请使用以下命令：

gcloud container clusters update ${CLUSTER_NAME} \
    --location=${LOCATION} \
    --enable-legacy-lustre-port

在集群中启用Managed Lustre CSI 驱动程序后，您可能会注意到节点已重新创建，并且 CPU 节点在Google Cloud 控制台或 CLI 输出中似乎正在使用 GPU 映像。例如：

config:
  imageType: COS_CONTAINERD
  nodeImageConfig:
    image: gke-1330-gke1552000-cos-121-18867-90-4-c-nvda

这是预期行为。 GPU 映像正在 CPU 节点上重复使用，以安全地安装 Managed Lustre 内核模块。您不会因 GPU 使用而支付过高的费用。

停用 Managed Lustre CSI 驱动程序

您可以使用 Google Cloud CLI 在现有 GKE 集群上停用Managed Lustre CSI 驱动程序。

gcloud container clusters update ${CLUSTER_NAME} \
    --location=${LOCATION} \
    --update-addons=LustreCsiDriver=DISABLED

停用 CSI 驱动程序后，系统会自动重新创建节点，并从 GKE 节点中卸载Managed Lustre内核模块。

使用 Managed Lustre CSI 驱动程序访问现有 Managed Lustre 实例

如果您已在与 GKE 集群相同的网络中预配 Managed Lustre 实例，则可以按照相关说明静态预配引用实例的 PersistentVolume。

以下部分介绍了使用 Managed Lustre CSI 驱动程序访问现有 Managed Lustre 实例的典型过程：

创建引用 Managed Lustre 实例的 PersistentVolume。
使用 PersistentVolumeClaim 访问该卷。
创建使用该卷的工作负载。

创建一个 PersistentVolume

如需找到您的Managed Lustre 实例，请运行以下命令。

gcloud lustre instances list \
    --project=${PROJECT_ID} \
    --location=${LOCATION}

输出应如下所示。在继续执行下一步之前，请务必记下 Managed Lustre 实例名称、文件系统和 mountPoint 字段。

capacityGib: '9000'
createTime: '2025-04-28T22:42:11.140825450Z'
filesystem: testlfs
gkeSupportEnabled: true
mountPoint: 10.90.1.4@tcp:/testlfs
name: projects/my-project/locations/us-central1-a/instances/my-lustre
network: projects/my-project/global/networks/default
perUnitStorageThroughput: '1000'
state: ACTIVE
updateTime: '2025-04-28T22:51:41.559098631Z'

将以下清单保存在名为 lustre-pv.yaml 的文件中：
```
apiVersion: v1
kind: PersistentVolume
metadata:
  name: lustre-pv
spec:
  storageClassName: "STORAGE_CLASS_NAME"
  capacity:
    storage: 9000Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem
  claimRef:
    namespace: default
    name: lustre-pvc
  csi:
    driver: lustre.csi.storage.gke.io
    volumeHandle: "PROJECT_ID/LOCATION/INSTANCE_NAME"
    volumeAttributes:
      ip: IP_ADDRESS
      filesystem: FILESYSTEM
```
替换以下内容：
- storageClassName：StorageClass 的名称。该值可以是空字符串，但必须满足 PersistentVolumeClaim 的规范。
- volumeHandle：相应卷的标识符。
  - PROJECT_ID： Google Cloud 项目 ID。
  - LOCATION：Lustre实例的可用区级位置。您必须为Managed Lustre CSI 驱动程序指定支持的可用区。
  - INSTANCE_NAME：Lustre 实例的名称。
- ip：Lustre 实例的 IP 地址。您可以从上一个命令的输出中的 mountPoint 字段获取此值。
- filesystem：Managed Lustre 实例的文件系统名称。
如需查看 PersistentVolume 对象中支持的字段的完整列表，请参阅 Managed Lustre CSI 驱动程序参考文档。
运行以下命令以创建 PersistentVolume：
```
kubectl apply -f lustre-pv.yaml
```

使用 PersistentVolumeClaim 访问卷

您可以创建一个 PersistentVolumeClaim 资源，该资源引用 Managed Lustre CSI 驱动程序的 StorageClass。

以下清单文件展示了如何在 ReadWriteMany 访问模式中创建引用您之前创建的 StorageClass 的 PersistentVolumeClaim 的示例。

将以下清单保存在名为 lustre-pvc.yaml 的文件中：

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: lustre-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: "STORAGE_CLASS_NAME"
  volumeName: lustre-pv
  resources:
    requests:
      storage: STORAGE_SIZE

将 STORAGE_SIZE 替换为存储空间大小，例如 9000Gi。它必须与 PersistentVolume 中的规范匹配。

运行以下命令以创建 PersistentVolumeClaim：
```
kubectl create -f lustre-pvc.yaml
```

创建使用该卷的工作负载

本部分展示了如何创建使用您之前创建的 PersistentVolumeClaim 资源的 Pod。

多个 Pod 可以共享同一 PersistentVolumeClaim 资源。

将以下清单保存在名为 my-pod.yaml 的文件中：

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: nginx
    image: nginx
    volumeMounts:
      - name: lustre-volume
        mountPath: /data
  volumes:
  - name: lustre-volume
    persistentVolumeClaim:
      claimName: lustre-pvc

运行以下命令以将该清单应用于集群：
```
kubectl apply -f my-pod.yaml
```
Pod 会等到 GKE 预配 PersistentVolumeClaim 后再开始运行。此操作可能需要几分钟才能完成。
可使用以下命令验证 Pod 是否正在运行：
```
kubectl get pods
```
Pod 可能需要几分钟才能进入 Running 状态。

输出类似于以下内容：
```
NAME           READY   STATUS    RESTARTS   AGE
my-pod         1/1     Running   0          11s
```

将 fsGroup 与 Managed Lustre 卷搭配使用

您可以更改已装载文件系统的根级目录的群组所有权，以匹配 Pod 的 SecurityContext 中指定的用户请求的 fsGroup。

问题排查

如需获取问题排查指导，请参阅 Managed Lustre 文档中的问题排查页面。

清理

为避免系统向您的 Google Cloud 账号收取费用，请删除您在本指南中创建的存储资源。

删除 Pod 和 PersistentVolumeClaim。

注意：如果您创建的 PersistentVolume 具有“Retain”persistentVolumeReclaimPolicy，那么删除 PersistentVolumeClaim 不会移除 PersistentVolume 或底层Managed Lustre 实例。
```
kubectl delete pod my-pod
kubectl delete pvc lustre-pvc
```

检查 PersistentVolume 状态。删除 Pod 和 PersistentVolumeClaim 后，PersistentVolume 应报告“Released”状态：

kubectl get pv

输出类似于以下内容：

NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                 STORAGECLASS   REASON   AGE
lustre-pv   9000Gi      RWX            Retain        Released   default/preprov-pvc                           2m28s

重复使用 PersistentVolume。如需重复使用 PersistentVolume，请移除声明引用 (claimRef)：

kubectl patch pv lustre-pv --type json -p '[{"op": "remove", "path": "/spec/claimRef"}]'

PersistentVolume 现在应报告“Available”状态，表明它已准备好绑定到新的 PersistentVolumeClaim。检查 PersistentVolume 状态：

kubectl get pv

输出类似于以下内容：

NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
lustre-pv   9000Gi      RWX           Retain         Available                                   19m

如果不再需要该 PersistentVolume，请将其删除。如果不再需要 PersistentVolume，请将其删除：
```
kubectl delete pv lustre-pv
```
删除 PersistentVolume 不会移除底层 Managed Lustre 实例。

后续步骤

浏览 Managed Lustre 文档。