This page describes how to resize an GKE on-prem user cluster. Resizing a user cluster means adding or removing nodes from that cluster. Removing nodes from a cluster should release that cluster's IP addresses, making them available for use by other nodes. Adding nodes requires that IP addresses are available for those nodes.
You resize a user cluster by changing the replicas
fields in the nodePools section
of your configuration file and deploying those changes to your existing
cluster with the gkectl update cluster
command.
For information on maximum and minimum limits for user clusters, refer to Quotas and limits.
For information on managing node pools with gkectl update cluster
, refer to
creating and managing node pools.
Verify that enough IP addresses are available
If you are adding additional nodes to a cluster, be sure that the cluster has enough IP addresses. Verifying that you have enough IP addresses depends on whether the cluster uses a DHCP server or static IPs.
DHCP
If the cluster uses DHCP, check that the DHCP server in the network in which the nodes are created has enough IP addresses. There should be more IP addresses than there are nodes running in the user cluster.
Static IPs
If the cluster uses static IPs, running gkectl update cluster
first verifies
whether you've allocated enough IP addresses in the cluster. If not, you can find
the number of extra IP addresses needed for the update operation in the error message.
If you need to add more IP addresses to the user cluster, perform the following steps:
Open the user cluster's hostconfig file for editing.
To view the addresses reserved for a user cluster:
kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG]
-n [USER_CLUSTER_NAME] [USER_CLUSTER_NAME] -o yaml
where:
- [ADMIN_CLUSTER_KUBECONFIG] tells kubectl to use the admin cluster's kubeconfig, which is used to view and/or change user cluster configurations.
-n [USER_CLUSTER_NAME]
tells kubectl to look in a namespace named after the user cluster.[USER_CLUSTER_NAME] -o yaml
tells kubectl which user cluster you're running the command against.-o yaml
displays the user cluster's configuration.
If any of the addresses reserved for a user cluster are included in the hostconfig file, add them to the corresponding block based on
netmask
andgateway
.Add as many additional static IP addresses to the corresponding block as required, and then run gkectl update cluster.
Below is an example hostconfig file with its four static IP blocks highlighted:
hostconfig: dns: 172.16.255.1 tod: 216.239.35.0 blocks: - netmask: 255.255.248.0 gateway: 21.0.135.254 ips: - ip: 21.0.133.41 hostname: user-node-1 - ip: 21.0.133.50 hostname: user-node-2 - ip: 21.0.133.56 hostname: user-node-3 - ip: 21.0.133.47 hostname: user-node-4
Resizing a user cluster
Starting from 1.5.0, you resize a cluster by changing the replicas
fields in the nodePools section
of your configuration file and deploying those changes to your existing
cluster with the gkectl update cluster command.
Verify resize
To verify that the resize was successful, run:
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] describe machinedeployments [NODE_POOL_NAME] | grep Replicas
where [USER_CLUSTER_KUBECONFIG] is the
kubeconfig
file of your user cluster.
The number of nodes you chose should be reflected in these commands' output.
Troubleshooting
For more information, refer to Troubleshooting.
Resizing a user cluster fails
- Symptoms
A resize operation on a user cluster fails.
- Potential causes
Several factors could cause resize operations to fail.
- Resolution
If a resize fails, follow these steps:
Check the cluster's MachineDeployment status to see if there are any events or error messages:
kubectl describe machinedeployments [MACHINE_DEPLOYMENT_NAME]
Check if there are errors on the newly-created Machines:
kubectl describe machine [MACHINE_NAME]
Error: "no addresses can be allocated"
- Symptoms
After resizing a user cluster,
kubectl describe machine [MACHINE_NAME]
displays the following error:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Failed 9s (x13 over 56s) machineipam-controller ipam: no addresses can be allocated
- Potential causes
There aren't enough IP addresses available for the user cluster.
- Resolution
Allocate more IP addresses for the cluster. Then, delete the affected Machine:
kubectl delete machine [MACHINE_NAME]
If the cluster is configured correctly, a replacement Machine is created with an IP address.
Sufficient number of IP addresses allocated, but Machine fails to register with cluster
- Symptoms
Network has enough addresses allocated but the Machine still fails to register with the user cluster.
- Possible causes
There might be an IP conflict. The IP might be taken by another Machine or by your load balancer.
- Resolution
Check that the affected Machine's IP address is not taken. If there is a conflict, you need to resolve the conflict in your environment.
Diagnosing cluster issues using gkectl
Use gkectl diagnose
commands to identify cluster issues
and share cluster information with Google. See
Diagnosing cluster issues.
Default logging behavior
For gkectl
and gkeadm
it is sufficient to use the
default logging settings:
-
By default, log entries are saved as follows:
-
For
gkectl
, the default log file is/home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log
, and the file is symlinked with thelogs/gkectl-$(date).log
file in the local directory where you rungkectl
. -
For
gkeadm
, the default log file islogs/gkeadm-$(date).log
in the local directory where you rungkeadm
.
-
For
- All log entries are saved in the log file, even if they are not printed in
the terminal (when
--alsologtostderr
isfalse
). - The
-v5
verbosity level (default) covers all the log entries needed by the support team. - The log file also contains the command executed and the failure message.
We recommend that you send the log file to the support team when you need help.
Specifying a non-default location for the log file
To specify a non-default location for the gkectl
log file, use
the --log_file
flag. The log file that you specify will not be
symlinked with the local directory.
To specify a non-default location for the gkeadm
log file, use
the --log_file
flag.
Locating Cluster API logs in the admin cluster
If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:
Find the name of the Cluster API controllers Pod in the
kube-system
namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use
grep
or a similar tool to search for errors:kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager
Update command fails
When you rungkectl update
to update a cluster, you might see the following error message:
Failed to update the cluster: failed to begin updating user cluster "CLUSTER_NAME": timed out waiting for the condition
kubectl get --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] onpremusercluster
cluster [CLUSTER_NAME] is READY=true