Resizing a user cluster

This page describes how to resize an GKE on-prem user cluster. Resizing a user cluster means adding or removing nodes from that cluster. Removing nodes from a cluster should release that cluster's IP addresses, making them available for use by other nodes. Adding nodes requires that IP addresses are available for those nodes.

You resize a user cluster by changing the replicas fields in the nodePools section of your configuration file and deploying those changes to your existing cluster with the gkectl update cluster command.

For information on maximum and minimum limits for user clusters, refer to Quotas and limits.

For information on managing node pools with gkectl update cluster, refer to creating and managing node pools.

Verify that enough IP addresses are available

If you are adding additional nodes to a cluster, be sure that the cluster has enough IP addresses. Verifying that you have enough IP addresses depends on whether the cluster uses a DHCP server or static IPs.

DHCP

If the cluster uses DHCP, check that the DHCP server in the network in which the nodes are created has enough IP addresses. There should be more IP addresses than there are nodes running in the user cluster.

Static IPs

If the cluster uses static IPs, running gkectl update cluster first verifies whether you've allocated enough IP addresses in the cluster. If not, you can find the number of extra IP addresses needed for the update operation in the error message.

If you need to add more IP addresses to the user cluster, perform the following steps:

  1. Open the user cluster's hostconfig file for editing.

  2. To view the addresses reserved for a user cluster:

    kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] 
    -n [USER_CLUSTER_NAME] [USER_CLUSTER_NAME] -o yaml

where:

  • [ADMIN_CLUSTER_KUBECONFIG] tells kubectl to use the admin cluster's kubeconfig, which is used to view and/or change user cluster configurations.
  • -n [USER_CLUSTER_NAME] tells kubectl to look in a namespace named after the user cluster.
  • [USER_CLUSTER_NAME] -o yaml tells kubectl which user cluster you're running the command against. -o yaml displays the user cluster's configuration.
  1. If any of the addresses reserved for a user cluster are included in the hostconfig file, add them to the corresponding block based on netmask and gateway.

  2. Add as many additional static IP addresses to the corresponding block as required, and then run gkectl update cluster.

Below is an example hostconfig file with its four static IP blocks highlighted:

hostconfig:
  dns: 172.16.255.1
  tod: 216.239.35.0
blocks:
- netmask: 255.255.248.0
  gateway: 21.0.135.254
  ips:
    - ip: 21.0.133.41
      hostname: user-node-1
    - ip: 21.0.133.50
      hostname: user-node-2
    - ip: 21.0.133.56
      hostname: user-node-3
    - ip: 21.0.133.47
      hostname: user-node-4

Resizing a user cluster

Starting from 1.5.0, you resize a cluster by changing the replicas fields in the nodePools section of your configuration file and deploying those changes to your existing cluster with the gkectl update cluster command.

Verify resize

To verify that the resize was successful, run:

kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] describe machinedeployments [NODE_POOL_NAME] | grep Replicas

where [USER_CLUSTER_KUBECONFIG] is the kubeconfig file of your user cluster. The number of nodes you chose should be reflected in these commands' output.

Troubleshooting

For more information, refer to Troubleshooting.

Resizing a user cluster fails

Symptoms

A resize operation on a user cluster fails.

Potential causes

Several factors could cause resize operations to fail.

Resolution

If a resize fails, follow these steps:

  1. Check the cluster's MachineDeployment status to see if there are any events or error messages:

    kubectl describe machinedeployments [MACHINE_DEPLOYMENT_NAME]
  2. Check if there are errors on the newly-created Machines:

    kubectl describe machine [MACHINE_NAME]

Error: "no addresses can be allocated"

Symptoms

After resizing a user cluster, kubectl describe machine [MACHINE_NAME] displays the following error:

Events:
   Type     Reason  Age                From                    Message
   ----     ------  ----               ----                    -------
   Warning  Failed  9s (x13 over 56s)  machineipam-controller  ipam: no addresses can be allocated
   
Potential causes

There aren't enough IP addresses available for the user cluster.

Resolution

Allocate more IP addresses for the cluster. Then, delete the affected Machine:

kubectl delete machine [MACHINE_NAME]

If the cluster is configured correctly, a replacement Machine is created with an IP address.

Sufficient number of IP addresses allocated, but Machine fails to register with cluster

Symptoms

Network has enough addresses allocated but the Machine still fails to register with the user cluster.

Possible causes

There might be an IP conflict. The IP might be taken by another Machine or by your load balancer.

Resolution

Check that the affected Machine's IP address is not taken. If there is a conflict, you need to resolve the conflict in your environment.

Diagnosing cluster issues using gkectl

Use gkectl diagnosecommands to identify cluster issues and share cluster information with Google. See Diagnosing cluster issues.

Default logging behavior

For gkectl and gkeadm it is sufficient to use the default logging settings:

  • By default, log entries are saved as follows:

    • For gkectl, the default log file is /home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log, and the file is symlinked with the logs/gkectl-$(date).log file in the local directory where you run gkectl.
    • For gkeadm, the default log file is logs/gkeadm-$(date).log in the local directory where you run gkeadm.
  • All log entries are saved in the log file, even if they are not printed in the terminal (when --alsologtostderr is false).
  • The -v5 verbosity level (default) covers all the log entries needed by the support team.
  • The log file also contains the command executed and the failure message.

We recommend that you send the log file to the support team when you need help.

Specifying a non-default location for the log file

To specify a non-default location for the gkectl log file, use the --log_file flag. The log file that you specify will not be symlinked with the local directory.

To specify a non-default location for the gkeadm log file, use the --log_file flag.

Locating Cluster API logs in the admin cluster

If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:

  1. Find the name of the Cluster API controllers Pod in the kube-system namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
  2. Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use grep or a similar tool to search for errors:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager

Update command fails

When you run gkectl update to update a cluster, you might see the following error message:
Failed to update the cluster: failed to begin updating user cluster "CLUSTER_NAME": timed out waiting for the condition
To examine this error message further, run this command:
kubectl get --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] onpremusercluster
If you get the following output, and the update has taken effect, the error message can be ignored:
cluster [CLUSTER_NAME] is READY=true