1.4 版。如 Anthos 版本支持政策中所述，此版本已不再受支持。如需获取影响 VMware 上的 Anthos 集群 (GKE On-Prem) 的安全漏洞、威胁和问题的最新补丁程序和更新，请升级到支持的版本。您可以在此处找到最新版本。

可用版本

创建和管理节点池

从 GKE On-Prem 1.3 开始，您可以在用户集群中创建一组具有相同配置的节点，只需在该集群的配置文件中定义一个节点池即可。然后，您可以单独管理该节点池，而不会影响集群中的任何其他节点。详细了解节点池。

可以在任何用户集群的配置文件中定义一个或多个节点池。创建节点池会在用户集群中创建其他节点。节点池管理（包括创建、更新和删除用户集群中的节点池）是通过使用 gkectl update cluster 命令修改配置文件的 NodePool 部分并将这些更改部署到现有集群来完成的。请注意，删除节点池会导致立即移除相关节点，无论这些节点中有没有节点正在运行工作负载。

示例节点池：

nodePools:
  - name: pool-1
    cpus: 4
    memoryMB: 8192
    replicas: 5

新安装提示：创建您的第一个用户集群并在该集群中定义节点池。然后使用该集群的配置文件创建具有相同节点池设置的其他用户集群。

准备工作

支持：
- 仅支持 1.3.0 或更高版本的用户集群。
- 不支持管理员集群中的节点池。
- gkectl update cluster 命令目前仅支持节点池管理。配置文件中存在的所有其他更改都会被忽略。
- 虽然节点池中的节点可以与其他节点分开进行管理，但无法单独升级任何集群的节点。当您升级集群时，所有节点都会被升级。
资源：
- 您可以只部署对节点池 replicas 的更改，而不中断节点的工作负载。
  
  重要提示：如果部署任何其他节点池配置更改，则会重新创建节点池中的节点。您必须确保任何此类节点池都没有运行不应中断的工作负载。
- 部署节点池更改时，系统会在创建或更新所需节点后删除不需要的节点。此政策的一个缺点是，即使更新前后节点总数保持不变，更新期间也可能需要更多资源（例如 IP 地址）。您必须验证有足够的 IP 地址可用以满足高峰使用期间的需要。

创建和更新节点池

您可以通过修改和部署用户集群的配置文件来管理节点池。您可以在用户集群中创建和部署一个或多个节点池。

要创建或更新节点池，请执行以下步骤：

在编辑器中，打开要在其中创建或更新节点池的用户集群的配置文件。
在用户集群配置文件的 nodePools 部分中定义一个或多个节点池：
1. 配置必需的最低节点池特性。您必须为每个节点池指定以下特性：
  - nodePools.name：指定节点池的唯一名称。更新此特性会重新创建节点。示例：- name: pool-1
  - nodePools.cpus：指定为池中每个工作器节点分配的 CPU 数量。更新此特性会重新创建节点。示例：cpus: 4
  - nodePools.memoryMB：指定为用户集群的每个工作节点分配多少内存（以兆字节为单位）。更新此特性会重新创建节点。示例：memoryMB: 8192
  - nodePools.replicas：指定池中的工作器节点总数。用户集群使用所有池中的节点来运行工作负载。您可以在不影响任何节点或运行的工作负载的情况下更新此特性。示例：replicas: 5
  请注意，虽然某些 nodePools 特性与旧配置文件中的 workernode（DHCP | 静态 IP）相同，但每个用户集群的旧配置文件中仍然需要 workernode 部分。您无法移除 workernode 部分，也无法将其替换为 nodepools。在新用户集群配置文件中，不再有 workernode 部分。您必须为用户集群至少定义一个节点池，并确保有足够的未受污染节点替换旧配置文件中的默认 workernode 池。
  
  例如：
```
nodePools:
- name: pool-1
  cpus: 4
  memoryMB: 8192
  replicas: 5
```
  如需查看具有多个节点池的用户集群配置文件，请参阅示例。
2. 配置可选的节点池特性。您可以向节点池配置添加标签和污点，以控制节点工作负载。您还可以定义节点池使用的 vSphere 数据存储区。
  - nodePools.labels：指定一个或多个 key : value 对，以唯一标识您的节点池。key 和 value 必须以字母或数字开头，可以包含字母、数字、连字符、英文句点和下划线，长度不超过 63 个字符。
    
    如需了解详细的配置信息，请参阅标签。
    
    重要提示：您不能为标签指定以下键，因为它们已保留供 GKE On-Prem 使用：kubernetes.io、k8s.io、googleapis.com。
    
    示例：
```
labels:
  key1: value1
  key2: value2
```
  - nodePools.taints：指定 key、value 和 effect，以为节点池定义 taints。这些 taints 与您为 pod 配置的 tolerations 对应。
    
    key 是必需的，value 是可选的。两者都必须以字母或数字开头，可以包含字母、数字、连字符、英文句点和下划线，长度不超过 253 个字符。或者，您可以在 key 前添加一个 DNS 子域（后跟 /）。例如：example.com/my-app。
    
    有效 effect 值为 NoSchedule、PreferNoSchedule 或 NoExecute。
    
    如需了解详细的配置信息，请参阅污点。
    
    示例：
```
taints:
  - key: key1
    value: value1
    effect: NoSchedule
```
  - nodePools.vsphere.datastore：指定将在其中创建池中每个节点的 vSphere 数据存储区。这将替换用户集群的默认 vSphere 数据存储区。
    
    示例：
```
vsphere:
  datastore: datastore_name
```
如需查看多个节点池的配置示例，请参阅示例。
使用 gkectl update cluster 命令将更改部署到用户集群。

注意：gkectl update cluster 仅支持节点池管理。只有 nodepools 部分的更改会被部署。配置文件中的所有其他更改都会被忽略。
```
gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [USER_CLUSTER_CONFIG_FILE] --dry-run --yes
```
其中：
- [ADMIN_CLUSTER_KUBECONFIG]：指定管理员集群的 kubeconfig 文件。
- [USER_CLUSTER_CONFIG_FILE]：指定用户集群的 configuration 文件。
- --dry-run：可选标志。如果添加此标志，则仅查看更改。不会将任何更改部署到用户集群。
- --yes：可选标志。如果添加此标志，则以静默方式运行命令。确认要继续操作的提示会被停用。
如果您提前中止该命令，可以再次运行同一命令来完成操作并将更改部署到用户集群。

如果您需要还原更改，则必须还原配置文件中的更改，然后将这些更改重新部署到用户集群。
通过检查所有节点来验证更改是否成功。运行以下命令以列出用户集群中的所有节点：
```
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
```
其中，[USER_CLUSTER_KUBECONFIG] 是您的用户集群的 kubeconfig 文件。

删除节点池

要从用户集群中删除节点池，请执行以下操作：

从用户集群配置文件的 nodePools 部分移除其定义。
确保受影响的节点上没有工作负载运行。
通过运行 gkectl update cluster 命令部署更改：
```
gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [USER_CLUSTER_CONFIG_FILE] --dry-run --yes
```
其中：
- [ADMIN_CLUSTER_KUBECONFIG]：指定管理员集群的 kubeconfig 文件。
- [USER_CLUSTER_CONFIG_FILE]：指定用户集群的 configuration 文件。
- --dry-run：可选标志。如果添加此标志，则仅查看更改。不会将任何更改部署到用户集群。
- --yes：可选标志。如果添加此标志，则以静默方式运行命令。确认要继续操作的提示会被停用。
通过检查所有节点来验证更改是否成功。运行以下命令以列出用户集群中的所有节点：
```
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
```
其中，[USER_CLUSTER_KUBECONFIG] 是您的用户集群的 kubeconfig 文件。

示例

以下示例配置中有四个节点池，每个节点池具有不同的特性：

pool-1：仅指定了所需的最少属性
pool-2：包括 vSphere 数据存储区
pool-3：包括污点和标签
pool-4：包括所有特性

apiVersion: v1
kind: UserCluster
...
# (Required) List of node pools. The total un-tainted replicas across all node pools
# must be greater than or equal to 3
nodePools:
- name: pool-1
  cpus: 4
  memoryMB: 8192
  replicas: 5
- name: pool-2
  cpus: 8
  memoryMB: 16384
  replicas: 3
  vsphere:
    datastore: my_datastore
- name: pool-3
  cpus: 4
  memoryMB: 8192
  replicas: 5
  taints:
    - key: "example-key"
      effect: NoSchedule
  labels:
    environment: production
    app: nginx
- name: pool-4
  cpus: 8
  memoryMB: 16384
  replicas: 3
  taints:
    - key: "my_key"
      value: my_value1
      effect: NoExecute
  labels:
    environment: test
  vsphere:
    datastore: my_datastore
...

点击查看生成的模板。

apiVersion: v1
kind: UserCluster
# (Required) A unique name for this cluster
name: ""
# (Required) GKE on-prem version (example: 1.3.0-gke.16)
gkeOnPremVersion: ""
# # (Optional) vCenter configuration (default: inherit from the admin cluster)
# vCenter:
#   resourcePool: ""
#   datastore: ""
#   # Provide the path to vCenter CA certificate pub key for SSL verification
#   caCertPath: ""
#   # The credentials and address to connect to vCenter
#   credentials:
#     username: ""
#     password: ""
# (Required) Network configuration; vCenter section is optional and inherits from
# the admin cluster if not specified
network:
  ipMode:
    # (Required) Define what IP mode to use ("dhcp" or "static")
    type: dhcp
    # # (Required when using "static" mode) The absolute or relative path to the yaml file
    # # to use for static IP allocation
    # ipBlockFilePath: ""
  # (Required) The Kubernetes service CIDR range for the cluster. Must not overlap
  # with the pod CIDR range
  serviceCIDR: 10.96.0.0/12
  # (Required) The Kubernetes pod CIDR range for the cluster. Must not overlap with
  # the service CIDR range
  podCIDR: 192.168.0.0/16
  vCenter:
    # vSphere network name
    networkName: ""
# (Required) Load balancer configuration
loadBalancer:
  # (Required) The VIPs to use for load balancing
  vips:
    # Used to connect to the Kubernetes API
    controlPlaneVIP: ""
    # Shared by all services for ingress traffic
    ingressVIP: ""
  # (Required) Which load balancer to use "F5BigIP" "Seesaw" or "ManualLB". Uncomment
  # the corresponding field below to provide the detailed spec
  kind: Seesaw
  # # (Required when using "ManualLB" kind) Specify pre-defined nodeports
  # manualLB:
  #   ingressHTTPNodePort: 30243
  #   ingressHTTPSNodePort: 30879
  #   controlPlaneNodePort: 30562
  #   addonsNodePort: 0
  # # (Required when using "F5BigIP" kind) Specify the already-existing partition and
  # # credentials
  # f5BigIP:
  #   address: ""
  #   credentials:
  #     username: ""
  #     password: ""
  #   partition: ""
  #   # # (Optional) Specify a pool name if using SNAT
  #   # snatPoolName: ""
  # (Required when using "Seesaw" kind) Specify the Seesaw configs
  seesaw:
    # (Required) The absolute or relative path to the yaml file to use for IP allocation
    # for LB VMs. Must contain one or two IPs.
    ipBlockFilePath: ""
    # (Required) The Virtual Router IDentifier of VRRP for the Seesaw group. Must
    # be between 1-255 and unique in a VLAN.
    vrid: 0
    # (Required) The IP announced by the control plane of Seesaw group
    masterIP: ""
    # (Required) The number CPUs per machine
    cpus: 4
    # (Required) Memory size in MB per machine
    memoryMB: 8192
    # (Optional) Network that the LB interface of Seesaw runs in (default: cluster
    # network)
    vCenter:
      # vSphere network name
      networkName: ""
    # (Optional) Run two LB VMs to achieve high availability (default: false)
    enableHA: false
# (Optional) User cluster control plane nodes must have either 1 or 3 replicas (default:
# 4 CPUs; 16384 MB memory; 1 replica)
masterNode:
  cpus: 4
  memoryMB: 8192
  # How many machines of this type to deploy
  replicas: 1
# (Required) List of node pools. The total un-tainted replicas across all node pools
# must be greater than or equal to 3
nodePools:
- name: pool-1
  # # Labels to apply to Kubernetes Node objects
  # labels: {}
  # # Taints to apply to Kubernetes Node objects
  # taints:
  # - key: ""
  #   value: ""
  #   effect: ""
  cpus: 4
  memoryMB: 8192
  # How many machines of this type to deploy
  replicas: 3
# Spread nodes across at least three physical hosts (requires at least three hosts)
antiAffinityGroups:
  # Set to false to disable DRS rule creation
  enabled: true
# # (Optional): Configure additional authentication
# authentication:
#   # (Optional) Configure OIDC authentication
#   oidc:
#     issuerURL: ""
#     kubectlRedirectURL: ""
#     clientID: ""
#     clientSecret: ""
#     username: ""
#     usernamePrefix: ""
#     group: ""
#     groupPrefix: ""
#     scopes: ""
#     extraParams: ""
#     # Set value to string "true" or "false"
#     deployCloudConsoleProxy: ""
#     # # The absolute or relative path to the CA file (optional)
#     # caPath: ""
#   # (Optional) Provide an additional serving certificate for the API server
#   sni:
#     certPath: ""
#     keyPath: ""
# (Optional) Specify which GCP project to connect your logs and metrics to
stackdriver:
  projectID: ""
  # A GCP region where you would like to store logs and metrics for this cluster.
  clusterLocation: ""
  enableVPC: false
  # The absolute or relative path to the key file for a GCP service account used to
  # send logs and metrics from the cluster
  serviceAccountKeyPath: ""
# (Optional) Specify which GCP project to connect your GKE clusters to
gkeConnect:
  projectID: ""
  # The absolute or relative path to the key file for a GCP service account used to
  # register the cluster
  registerServiceAccountKeyPath: ""
  # The absolute or relative path to the key file for a GCP service account used by
  # the GKE connect agent
  agentServiceAccountKeyPath: ""
# (Optional) Specify Cloud Run configuration
cloudRun:
  enabled: false
# # (Optional/Alpha) Configure the GKE usage metering feature
# usageMetering:
#   bigQueryProjectID: ""
#   # The ID of the BigQuery Dataset in which the usage metering data will be stored
#   bigQueryDatasetID: ""
#   # The absolute or relative path to the key file for a GCP service account used by
#   # gke-usage-metering to report to BigQuery
#   bigQueryServiceAccountKeyPath: ""
#   # Whether or not to enable consumption-based metering
#   enableConsumptionMetering: false
# # (Optional/Alpha) Configure kubernetes apiserver audit logging
# cloudAuditLogging:
#   projectid: ""
#   # A GCP region where you would like to store audit logs for this cluster.
#   clusterlocation: ""
#   # The absolute or relative path to the key file for a GCP service account used to
#   # send audit logs from the cluster
#   serviceaccountkeypath: ""

问题排查

通常，gkectl update cluster 命令会在失败时提供详细信息。如果命令成功并且您没有看到节点，则可以使用诊断集群问题指南进行问题排查。
集群资源可能不足，例如在创建或更新节点池期间缺少可用的 IP 地址。请参阅调整用户集群大小主题，详细了解如何验证是否有可用的 IP 地址。
您还可以查看常规问题排查指南。
卡在 Creating node MachineDeployment(s) in user cluster…。

创建或更新用户集群中的节点池可能需要一些时间。但是，如果等待时间非常长并且您怀疑某些操作可能失败，则可以运行以下命令：
1. 运行 kubectl get nodes 以获取节点的状态。
2. 对于任何未准备就绪的节点，运行 kubectl describe node [node_name] 以获取详细信息。