从 VMware 上的 Anthos 集群 (GKE On-Prem) 1.3 版开始,您可以在用户集群中创建一组全部具有相同配置的节点,方法是在该集群的配置文件中定义一个节点池。然后,您可以单独管理该节点池,而不会影响集群中的任何其他节点。详细了解节点池。
可以在任何用户集群的配置文件中定义一个或多个节点池。创建节点池会在用户集群中创建其他节点。节点池管理(包括创建、更新和删除用户集群中的节点池)是通过使用 gkectl update cluster
命令修改配置文件的 NodePool 部分并将这些更改部署到现有集群来完成的。请注意,删除节点池会导致立即移除相关节点,无论这些节点中有没有节点正在运行工作负载。
示例节点池:
nodePools:
- name: pool-1
cpus: 4
memoryMB: 8192
replicas: 5
新安装提示:创建您的第一个用户集群并在该集群中定义节点池。然后使用该集群的配置文件创建具有相同节点池设置的其他用户集群。
准备工作
支持:
资源:
您可以只部署对节点池 replicas
的更改,而不中断节点的工作负载。
重要提示:如果部署任何其他节点池配置更改,则会重新创建节点池中的节点。您必须确保任何此类节点池都没有运行不应中断的工作负载。
部署节点池更改时,系统会在创建或更新所需节点后删除不需要的节点。此政策的一个缺点是,即使更新前后节点总数保持不变,更新期间也可能需要更多资源(例如 IP 地址)。
假设节点池在更新结束时将有 N 个节点。那么,您必须至少有 N+1 个 IP 地址可用于该池中的节点。这意味着,如果要通过向一个或多个池添加节点来调整集群大小,则在结束大小调整时,您所拥有的的 IP 地址数量应该至少比集群的所有节点池中的节点总数多出一个。
如需了解详情,请参阅验证是否有足够的可用 IP 地址。
创建和更新节点池
您可以通过修改和部署用户集群的配置文件来管理节点池。您可以在用户集群中创建和部署一个或多个节点池。
要创建或更新节点池,请执行以下步骤:
在编辑器中,打开要在其中创建或更新节点池的用户集群的配置文件。
在用户集群配置文件的 nodePools 部分中定义一个或多个节点池:
配置必需的最低节点池特性。您必须为每个节点池指定以下特性:
nodePools.name
:指定节点池的唯一名称。更新此特性会重新创建节点。示例:- name: pool-1
nodePools.cpus
:指定为池中每个工作器节点分配的 CPU 数量。更新此特性会重新创建节点。示例:cpus: 4
nodePools.memoryMB
:指定为用户集群的每个工作节点分配多少内存(以兆字节为单位)。更新此特性会重新创建节点。示例:memoryMB: 8192
nodePools.replicas
:指定池中的工作器节点总数。用户集群使用所有池中的节点来运行工作负载。您可以在不影响任何节点或运行的工作负载的情况下更新此特性。示例:replicas: 5
请注意,虽然某些 nodePools
特性与旧配置文件中的 workernode
(DHCP | 静态 IP)相同,但每个用户集群的旧配置文件中仍然需要 workernode
部分。您无法移除 workernode
部分,也无法将其替换为 nodepools
。在新用户集群配置文件中,不再有 workernode
部分。您必须为用户集群至少定义一个节点池,并确保有足够的未受污染节点替换旧配置文件中的默认 workernode
池。
例如:
nodePools:
- name: pool-1
cpus: 4
memoryMB: 8192
replicas: 5
如需查看具有多个节点池的用户集群配置文件,请参阅示例。
配置可选的节点池特性。您可以向节点池配置添加标签和污点,以控制节点工作负载。您还可以定义节点池使用的 vSphere 数据存储区。
nodePools.labels
:指定一个或多个 key : value
对,以唯一标识您的节点池。key
和 value
必须以字母或数字开头,可以包含字母、数字、连字符、英文句点和下划线,长度不超过 63 个字符。
如需了解详细的配置信息,请参阅标签。
重要提示:您不能为标签指定以下键,因为它们已保留供 Anthos clusters on VMware 使用:kubernetes.io
、k8s.io
、googleapis.com
。
例如:
labels:
key1: value1
key2: value2
nodePools.taints
:指定 key
、value
和 effect
,以为节点池定义 taints
。这些 taints
与您为 pod 配置的 tolerations
对应。
key
是必需的,value
是可选的。两者都必须以字母或数字开头,可以包含字母、数字、连字符、英文句点和下划线,长度不超过 253 个字符。或者,您可以在 key
前添加一个 DNS 子域(后跟 /
)。例如:example.com/my-app
。
有效 effect
值为 NoSchedule
、PreferNoSchedule
或 NoExecute
。
如需了解详细的配置信息,请参阅污点。
例如:
taints:
- key: key1
value: value1
effect: NoSchedule
nodePools.bootDiskSizeGB
:指定分配给池中每个工作器节点的启动磁盘的大小(以千兆字节为单位)。 从 Anthos clusters on VMware 1.5.0 版开始可以使用此配置
例如:
bootDiskSizeGB: 40
nodePools.vsphere.datastore
:指定将在其中创建池中每个节点的 vSphere 数据存储区。这将替换用户集群的默认 vSphere 数据存储区。
示例:
vsphere:
datastore: datastore_name
如需查看多个节点池的配置示例,请参阅示例。
使用 gkectl update
cluster
命令将更改部署到用户集群。
gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [USER_CLUSTER_CONFIG_FILE] --dry-run --yes
其中:- [ADMIN_CLUSTER_KUBECONFIG]:指定管理员集群的
kubeconfig
文件。
- [USER_CLUSTER_CONFIG_FILE]:指定用户集群的
configuration
文件。
--dry-run
:可选标志。如果添加此标志,则仅查看更改。不会将任何更改部署到用户集群。
--yes
:可选标志。如果添加此标志,则以静默方式运行命令。确认要继续操作的提示会被停用。
如果您提前中止该命令,可以再次运行同一命令来完成操作并将更改部署到用户集群。
如果您需要还原更改,则必须还原配置文件中的更改,然后将这些更改重新部署到用户集群。
通过检查所有节点来验证更改是否成功。运行以下命令以列出用户集群中的所有节点:
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
其中,[USER_CLUSTER_KUBECONFIG] 是您的用户集群的 kubeconfig
文件。
删除节点池
要从用户集群中删除节点池,请执行以下操作:
从用户集群配置文件的 nodePools 部分移除其定义。
确保受影响的节点上没有工作负载运行。
通过运行 gkectl update
cluster
命令部署更改:
gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [USER_CLUSTER_CONFIG_FILE] --dry-run --yes
其中:- [ADMIN_CLUSTER_KUBECONFIG]:指定管理员集群的
kubeconfig
文件。
- [USER_CLUSTER_CONFIG_FILE]:指定用户集群的
configuration
文件。
--dry-run
:可选标志。如果添加此标志,则仅查看更改。不会将任何更改部署到用户集群。
--yes
:可选标志。如果添加此标志,则以静默方式运行命令。确认要继续操作的提示会被停用。
通过检查所有节点来验证更改是否成功。运行以下命令以列出用户集群中的所有节点:
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
其中,[USER_CLUSTER_KUBECONFIG] 是您的用户集群的 kubeconfig
文件。
示例
以下示例配置中有四个节点池,每个节点池具有不同的特性:
pool-1
:仅指定了所需的最少属性
pool-2
:包括 vSphere 数据存储区
pool-3
:包括 bootDiskSizeGB
pool-4
:包括污点和标签
pool-5
:包括所有特性
apiVersion: v1
kind: UserCluster
...
# (Required) List of node pools. The total un-tainted replicas across all node pools
# must be greater than or equal to 3
nodePools:
- name: pool-1
cpus: 4
memoryMB: 8192
replicas: 5
- name: pool-2
cpus: 8
memoryMB: 16384
replicas: 3
vsphere:
datastore: my_datastore
- name: pool-3
cpus: 8
memoryMB: 8192
replicas: 3
bootDiskSizeGB: 40
- name: pool-4
cpus: 4
memoryMB: 8192
replicas: 5
taints:
- key: "example-key"
effect: NoSchedule
labels:
environment: production
app: nginx
- name: pool-5
cpus: 8
memoryMB: 16384
replicas: 3
taints:
- key: "my_key"
value: my_value1
effect: NoExecute
labels:
environment: test
vsphere:
datastore: my_datastore
bootDiskSizeGB: 60
...
点击查看生成的模板。
apiVersion: v1
kind: UserCluster
# (Required) A unique name for this cluster
name: ""
# (Required) Anthos clusters on VMware (GKE on-prem) version (example: 1.3.0-gke.16)
gkeOnPremVersion: ""
# # (Optional) vCenter configuration (default: inherit from the admin cluster)
# vCenter:
# # Resource pool to use. Specify [VSPHERE_CLUSTER_NAME]/Resources to use the default
# # resource pool
# resourcePool: ""
# datastore: ""
# # Provide the path to vCenter CA certificate pub key for SSL verification
# caCertPath: ""
# # The credentials to connect to vCenter
# credentials:
# # reference to external credentials file
# fileRef:
# # read credentials from this file
# path: ""
# # entry in the credential file
# entry: ""
# (Required) Network configuration; vCenter section is optional and inherits from
# the admin cluster if not specified
network:
# # (Optional) This section overrides ipBlockFile values. Use with ipType "static" mode.
# # Used for seesaw nodes as well
# hostConfig:
# # List of DNS servers
# dnsServers:
# - ""
# # List of NTP servers
# ntpServers:
# - ""
# # # List of DNS search domains
# # searchDomainsForDNS:
# # - ""
ipMode:
# (Required) Define what IP mode to use ("dhcp" or "static")
type: dhcp
# # (Required when using "static" mode) The absolute or relative path to the yaml file
# # to use for static IP allocation. Hostconfig part will be overwritten by network.hostconfig
# # if specified
# ipBlockFilePath: ""
# (Required) The Kubernetes service CIDR range for the cluster. Must not overlap
# with the pod CIDR range
serviceCIDR: 10.96.0.0/12
# (Required) The Kubernetes pod CIDR range for the cluster. Must not overlap with
# the service CIDR range
podCIDR: 192.168.0.0/16
vCenter:
# vSphere network name
networkName: ""
# (Required) Load balancer configuration
loadBalancer:
# (Required) The VIPs to use for load balancing
vips:
# Used to connect to the Kubernetes API
controlPlaneVIP: ""
# Shared by all services for ingress traffic
ingressVIP: ""
# (Required) Which load balancer to use "F5BigIP" "Seesaw" or "ManualLB". Uncomment
# the corresponding field below to provide the detailed spec
kind: Seesaw
# # (Required when using "ManualLB" kind) Specify pre-defined nodeports
# manualLB:
# # NodePort for ingress service's http (only needed for user cluster)
# ingressHTTPNodePort: 30243
# # NodePort for ingress service's https (only needed for user cluster)
# ingressHTTPSNodePort: 30879
# # NodePort for control plane service
# controlPlaneNodePort: 30562
# # NodePort for addon service (only needed for admin cluster)
# addonsNodePort: 0
# # (Required when using "F5BigIP" kind) Specify the already-existing partition and
# # credentials
# f5BigIP:
# address: ""
# credentials:
# # reference to external credentials file
# fileRef:
# # read credentials from this file
# path: ""
# # entry in the credential file
# entry: ""
# partition: ""
# # # (Optional) Specify a pool name if using SNAT
# # snatPoolName: ""
# (Required when using "Seesaw" kind) Specify the Seesaw configs
seesaw:
# (Required) The absolute or relative path to the yaml file to use for IP allocation
# for LB VMs. Must contain one or two IPs. Hostconfig part will be overwritten
# by network.hostconfig if specified.
ipBlockFilePath: ""
# (Required) The Virtual Router IDentifier of VRRP for the Seesaw group. Must
# be between 1-255 and unique in a VLAN.
vrid: 0
# (Required) The IP announced by the master of Seesaw group
masterIP: ""
# (Required) The number CPUs per machine
cpus: 4
# (Required) Memory size in MB per machine
memoryMB: 3072
# (Optional) Network that the LB interface of Seesaw runs in (default: cluster
# network)
vCenter:
# vSphere network name
networkName: ""
# (Optional) Run two LB VMs to achieve high availability (default: false)
enableHA: false
# # (Optional/Preview) Enable dataplane v2
# enableDataplaneV2: false
# # (Optional) Storage specification for the cluster
# storage:
# # Whether to disable vSphere CSI components deployment. The feature is enabled by
# # default.
# vSphereCSIDisabled: false
# (Optional) User cluster master nodes must have either 1 or 3 replicas (default:
# 4 CPUs; 16384 MB memory; 1 replica)
masterNode:
cpus: 4
memoryMB: 8192
# How many machines of this type to deploy
replicas: 1
# (Required) List of node pools. The total un-tainted replicas across all node pools
# must be greater than or equal to 3
nodePools:
- name: pool-1
cpus: 4
memoryMB: 8192
# How many machines of this type to deploy
replicas: 3
# # (Optional) boot disk size; must be at least 40 (default: 40)
# bootDiskSizeGB: 40
# # Labels to apply to Kubernetes Node objects
# labels: {}
# # Taints to apply to Kubernetes Node objects
# taints:
# - key: ""
# value: ""
# effect: ""
# vsphere:
# # (Optional) vSphere datastore the node pool will be created on (default: vCenter.datastore)
# datastore: ""
# Spread nodes across at least three physical hosts (requires at least three hosts)
antiAffinityGroups:
# Set to false to disable DRS rule creation
enabled: true
# # (Optional): Configure additional authentication
# authentication:
# # (Optional) Configure OIDC authentication
# oidc:
# # URL for OIDC Provider.
# issuerURL: ""
# # (Optional) Default is http://kubectl.redirect.invalid
# kubectlRedirectURL: ""
# # ID for OIDC client application.
# clientID: ""
# # (Optional) Secret for OIDC client application.
# clientSecret: ""
# username: ""
# # (Optional) Prefix prepended to username claims.
# usernamePrefix: ""
# # (Optional) JWT claim to use as group name.
# group: ""
# # (Optional) Prefix prepended to group claims.
# groupPrefix: ""
# # (Optional) Additional scopes to send to OIDC provider as comma separated list.
# # Default is "openid".
# scopes: ""
# # (Optional) Additional key-value parameters to send to OIDC provider as comma
# # separated list.
# extraParams: ""
# # (Optional) Set value to string "true" or "false". Default is false.
# deployCloudConsoleProxy: ""
# # # (Optional) The absolute or relative path to the CA file
# # caPath: ""
# # (Optional) Provide an additional serving certificate for the API server
# sni:
# certPath: ""
# keyPath: ""
# # (Optional/Preview) Configure LDAP authentication
# ldap:
# # Name of LDAP provider.
# name: ""
# # Hostname or IP of the LDAP provider.
# host: ""
# # (Optional) Only support "insecure" for now
# connectionType: insecure
# # # (Optional) The absolute or relative path to the CA file
# # caPath: ""
# user:
# # Location in LDAP directory where user entries exist.
# baseDN: ""
# # (Optional) Name of the attribute that precedes the username in a DN. Default
# # is "CN".
# userAttribute: ""
# # (Optional) Name of the attribute that records a user's group membership. Default
# # is "memberOf".
# memberAttribute: ""
# (Optional) Specify which GCP project to connect your logs and metrics to
stackdriver:
projectID: ""
# A GCP region where you would like to store logs and metrics for this cluster.
clusterLocation: ""
enableVPC: false
# The absolute or relative path to the key file for a GCP service account used to
# send logs and metrics from the cluster
serviceAccountKeyPath: ""
# (Optional/Preview) Disable vsphere resource metrics collection from vcenter. True
# by default
disableVsphereResourceMetrics: true
# (Optional) Specify which GCP project to connect your GKE clusters to
gkeConnect:
projectID: ""
# The absolute or relative path to the key file for a GCP service account used to
# register the cluster
registerServiceAccountKeyPath: ""
# The absolute or relative path to the key file for a GCP service account used by
# the GKE connect agent
agentServiceAccountKeyPath: ""
# (Optional) Specify Cloud Run configuration
cloudRun:
enabled: false
# # (Optional/Alpha) Configure the GKE usage metering feature
# usageMetering:
# bigQueryProjectID: ""
# # The ID of the BigQuery Dataset in which the usage metering data will be stored
# bigQueryDatasetID: ""
# # The absolute or relative path to the key file for a GCP service account used by
# # gke-usage-metering to report to BigQuery
# bigQueryServiceAccountKeyPath: ""
# # Whether or not to enable consumption-based metering
# enableConsumptionMetering: false
# # (Optional/Alpha) Configure kubernetes apiserver audit logging
# cloudAuditLogging:
# projectID: ""
# # A GCP region where you would like to store audit logs for this cluster.
# clusterLocation: ""
# # The absolute or relative path to the key file for a GCP service account used to
# # send audit logs from the cluster
# serviceAccountKeyPath: ""
# # (Optional/Preview) Enable auto repair for the cluster
# autoRepair:
# # Whether to enable auto repair feature. The feature is disabled by default.
# enabled: false
问题排查
通常,gkectl update cluster
命令会在失败时提供详细信息。如果命令成功并且您没有看到节点,则可以使用诊断集群问题指南进行问题排查。
集群资源可能不足,例如在创建或更新节点池期间缺少可用的 IP 地址。请参阅调整用户集群大小主题,详细了解如何验证是否有可用的 IP 地址。
您还可以查看常规问题排查指南。
卡在 Creating node MachineDeployment(s) in user cluster…
。
创建或更新用户集群中的节点池可能需要一些时间。但是,如果等待时间非常长并且您怀疑某些操作可能失败,则可以运行以下命令:
- 运行
kubectl get nodes
以获取节点的状态。
- 对于任何未准备就绪的节点,运行
kubectl describe node [node_name]
以获取详细信息。