This page describes how to resize a GKE On-Prem user cluster. Resizing a user cluster means adding or removing nodes from that cluster. Removing nodes from a cluster should release that cluster's IP addresses, making them available for use by other nodes. Adding nodes requires that IP addresses are available for those nodes.
You resize a user cluster by changing the replicas
fields of the cluster's
MachineDeployment configuration. You can patch the configuration from the
command-line using kubectl patch
.
For information on maximum and minimum limits for user clusters, refer to Quotas and limits.
Verify that enough IP addresses are available
If you are adding additional nodes to a cluster, be sure that the cluster has enough IP addresses. Verifying that you have enough IP addresses depends on whether the cluster uses DHCP or static IPs.
DHCP
If the cluster uses DHCP, check that the DHCP server in the network in which the nodes are created has enough IP addresses. There should be more IP addresses than there are nodes running in the user cluster.
Static IPs
If the cluster uses static IPs, check that you've allocated enough IP addresses in the cluster:
kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \ -n [USER_CLUSTER_NAME] [USER_CLUSTER_NAME] -o yaml
where:
- [ADMIN_CLUSTER_KUBECONFIG] tells kubectl to use the admin cluster's kubeconfig, which is used to view and/or change user cluster configurations.
-n [USER_CLUSTER_NAME]
tells kubectl to look in a namespace named after the user cluster.[USER_CLUSTER_NAME] -o yaml
tells kubectl which user cluster you're running the command against.-o yaml
displays the user cluster's configuration.
In the command's output, look for the reservedAddresses
field. There should
be more IP addresses in the field than there are nodes running in the user
cluster.
If you need to add more addresses to the reservedAddresses
field, perform
the following steps:
Open the user cluster's configuration file for editing:
kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] edit cluster [USER_CLUSTER_NAME] \ -n [USER_CLUSTER_NAME] --validate=false
The cluster configuration is opened in your shell's default editor.
Add as many additional static IP blocks as required. An IP block is composed of
gateway
,hostname
,ip
, andnetmask
fields.
Below is an example reservedAddresses
field with its four static IP blocks
highlighted:
... networkSpec: dns: - 172.x.x.x ntp: 129.x.x.x reservedAddresses: - gateway: 100.x.x.x hostname: host-1 ip: 100.x.x.x netmask: x - gateway: 100.x.x.x hostname: host-2 ip: 100.x.x.x netmask: x - gateway: 100.x.x.x hostname: host-3 ip: 100.x.x.x netmask: x - gateway: 100.x.x.x hostname: host-4 ip: 100.x.x.x netmask: x ...
Before you begin
Export a KUBECONFIG
environment variable pointing to the kubeconfig of the
user cluster that you want to resize:
export KUBECONFIG=[USER_CLUSTER_KUBECONFIG]
Resizing a user cluster
You resize a cluster by editing the user cluster's MachineDeployment resource. To find the name of the user cluster's MachineDeployment resource, run the following command:
kubectl get machinedeployments
The user cluster's MachineDeployment includes the user cluster's name.
To resize the user cluster, you need to patch the cluster's MachineDeployment
configuration. You change the value of the configuration's replicas
field,
which indicates how many nodes the cluster should run:
kubectl patch machinedeployment [MACHINE_DEPLOYMENT_NAME] -p "{\"spec\": {\"replicas\": [INT] }}" --type=merge
where [INT] is the number of nodes you want the user cluster to run.
Verify resize
To verify that the resize was successful, run:
kubectl get nodes kubectl describe machinedeployments [MACHINE_DEPLOYMENT_NAME] | grep Replicas
The number of nodes you chose should be reflected in these commands' output.
Troubleshooting
For more information, refer to Troubleshooting.
Resizing a user cluster fails
- Symptoms
A resize operation on a user cluster fails.
- Potential causes
Several factors could cause resize operations to fail.
- Resolution
If a resize fails, follow these steps:
Check the cluster's MachineDeployment status to see if there are any events or error messages:
kubectl describe machinedeployments [MACHINE_DEPLOYMENT_NAME]
Check if there are errors on the newly-created Machines:
kubectl describe machine [MACHINE_NAME]
Error: "no addresses can be allocated"
- Symptoms
After resizing a user cluster,
kubectl describe machine [MACHINE_NAME]
displays the following error:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 9s (x13 over 56s) machineipam-controller ipam: no addresses can be allocated
- Potential causes
There aren't enough IP addresses available for the user cluster.
- Resolution
Allocate more IP addresses for the cluster. Then, delete the affected Machine:
kubectl delete machine [MACHINE_NAME]
If the cluster is configured correctly, a replacement Machine is created with an IP address.
Sufficient number of IP addresses allocated, but Machine fails to register with cluster
- Symptoms
Network has enough addresses allocated but the Machine still fails to register with the user cluster.
- Possible causes
There might be an IP conflict. The IP might be taken by another Machine or by your load balancer.
- Resolution
Check that the affected Machine's IP address is not taken. If there is a conflict, you need to resolve the conflict in your environment.
New nodes created but not healthy
- Symptoms
New nodes don't register themselves to the user cluster control plane when using manual load balancing mode.
- Possible causes
In-node Ingress validation might be enabled that blocks the boot up process of the nodes.
- Resolution
To disable the validation, run:
kubectl patch machinedeployment [MACHINE_DEPLOYMENT_NAME] -p '{"spec":{"template":{"spec":{"providerSpec":{"value":{"machineVariables":{"net_validation_ports": null}}}}}}}' --type=merge
Diagnosing cluster issues using gkectl
Use gkectl diagnose
commands to identify cluster issues
and share cluster information with Google. See
Diagnosing cluster issues.
Default logging behavior
For gkectl
and gkeadm
it is sufficient to use the
default logging settings:
-
By default, log entries are saved as follows:
-
For
gkectl
, the default log file is/home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log
, and the file is symlinked with thelogs/gkectl-$(date).log
file in the local directory where you rungkectl
. -
For
gkeadm
, the default log file islogs/gkeadm-$(date).log
in the local directory where you rungkeadm
.
-
For
- All log entries are saved in the log file, even if they are not printed in
the terminal (when
--alsologtostderr
isfalse
). - The
-v5
verbosity level (default) covers all the log entries needed by the support team. - The log file also contains the command executed and the failure message.
We recommend that you send the log file to the support team when you need help.
Specifying a non-default location for the log file
To specify a non-default location for the gkectl
log file, use
the --log_file
flag. The log file that you specify will not be
symlinked with the local directory.
To specify a non-default location for the gkeadm
log file, use
the --log_file
flag.
Locating Cluster API logs in the admin cluster
If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:
Find the name of the Cluster API controllers Pod in the
kube-system
namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use
grep
or a similar tool to search for errors:kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager