Resizing a user cluster

This page describes how to resize a GKE On-Prem user cluster. Resizing a user cluster means adding or removing nodes from that cluster. Removing nodes from a cluster should release that cluster's IP addresses, making them available for use by other nodes. Adding nodes requires that IP addresses are available for those nodes.

You resize a user cluster by changing the replicas fields of the cluster's MachineDeployment configuration. You can patch the configuration from the command-line using kubectl patch.

For information on maximum and minimum limits for user clusters, refer to Quotas and limits.

Verify that enough IP addresses are available

If you are adding additional nodes to a cluster, be sure that the cluster has enough IP addresses. Verifying that you have enough IP addresses depends on whether the cluster uses DHCP or static IPs.

DHCP

If the cluster uses DHCP, check that the DHCP server in the network in which the nodes are created has enough IP addresses. There should be more IP addresses than there are nodes running in the user cluster.

Static IPs

If the cluster uses static IPs, check that you've allocated enough IP addresses in the cluster:

kubectl get cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] \
-n [USER_CLUSTER_NAME] [USER_CLUSTER_NAME] -o yaml

where:

  • [ADMIN_CLUSTER_KUBECONFIG] tells kubectl to use the admin cluster's kubeconfig, which is used to view and/or change user cluster configurations.
  • -n [USER_CLUSTER_NAME] tells kubectl to look in a namespace named after the user cluster.
  • [USER_CLUSTER_NAME] -o yaml tells kubectl which user cluster you're running the command against. -o yaml displays the user cluster's configuration.

In the command's output, look for the reservedAddresses field. There should be more IP addresses in the field than there are nodes running in the user cluster.

If you need to add more addresses to the reservedAddresses field, perform the following steps:

  1. Open the user cluster's configuration file for editing:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] edit cluster [USER_CLUSTER_NAME] \
    -n [USER_CLUSTER_NAME] --validate=false
    

    The cluster configuration is opened in your shell's default editor.

  2. Add as many additional static IP blocks as required. An IP block is composed of gateway, hostname, ip, and netmask fields.

Below is an example reservedAddresses field with its four static IP blocks highlighted:

...
networkSpec:
  dns:
  - 172.x.x.x
  ntp: 129.x.x.x
  reservedAddresses:
  - gateway: 100.x.x.x
    hostname: host-1
    ip: 100.x.x.x
    netmask: x
  - gateway: 100.x.x.x
    hostname: host-2
    ip: 100.x.x.x
    netmask: x
  - gateway: 100.x.x.x
    hostname: host-3
    ip: 100.x.x.x
    netmask: x
  - gateway: 100.x.x.x
    hostname: host-4
    ip: 100.x.x.x
    netmask: x
...

Before you begin

Export a KUBECONFIG environment variable pointing to the kubeconfig of the user cluster that you want to resize:

export KUBECONFIG=[USER_CLUSTER_KUBECONFIG]

Resizing a user cluster

You resize a cluster by editing the user cluster's MachineDeployment resource. To find the name of the user cluster's MachineDeployment resource, run the following command:

kubectl get machinedeployments

The user cluster's MachineDeployment includes the user cluster's name.

To resize the user cluster, you need to patch the cluster's MachineDeployment configuration. You change the value of the configuration's replicas field, which indicates how many nodes the cluster should run:

kubectl patch machinedeployment [MACHINE_DEPLOYMENT_NAME] -p "{\"spec\": {\"replicas\": [INT] }}" --type=merge

where [INT] is the number of nodes you want the user cluster to run.

Verify resize

To verify that the resize was successful, run:

kubectl get nodes
kubectl describe machinedeployments [MACHINE_DEPLOYMENT_NAME] | grep Replicas

The number of nodes you chose should be reflected in these commands' output.

Troubleshooting

For more information, refer to Troubleshooting.

Resizing a user cluster fails

Symptoms

A resize operation on a user cluster fails.

Potential causes

Several factors could cause resize operations to fail.

Resolution

If a resize fails, follow these steps:

  1. Check the cluster's MachineDeployment status to see if there are any events or error messages:

    kubectl describe machinedeployments [MACHINE_DEPLOYMENT_NAME]
  2. Check if there are errors on the newly-created Machines:

    kubectl describe machine [MACHINE_NAME]

Error: "no addresses can be allocated"

Symptoms

After resizing a user cluster, kubectl describe machine [MACHINE_NAME] displays the following error:

Events:
   Type     Reason  Age                From                    Message
   ----     ------  ----               ----                    -------
   Warning  Failed  9s (x13 over 56s)  machineipam-controller  ipam: no addresses can be allocated
   
Potential causes

There aren't enough IP addresses available for the user cluster.

Resolution

Allocate more IP addresses for the cluster. Then, delete the affected Machine:

kubectl delete machine [MACHINE_NAME]

If the cluster is configured correctly, a replacement Machine is created with an IP address.

Sufficient number of IP addresses allocated, but Machine fails to register with cluster

Symptoms

Network has enough addresses allocated but the Machine still fails to register with the user cluster.

Possible causes

There might be an IP conflict. The IP might be taken by another Machine or by your load balancer.

Resolution

Check that the affected Machine's IP address is not taken. If there is a conflict, you need to resolve the conflict in your environment.

New nodes created but not healthy

Symptoms

New nodes don't register themselves to the user cluster control plane when using manual load balancing mode.

Possible causes

In-node Ingress validation might be enabled that blocks the boot up process of the nodes.

Resolution

To disable the validation, run:

kubectl patch machinedeployment [MACHINE_DEPLOYMENT_NAME] -p '{"spec":{"template":{"spec":{"providerSpec":{"value":{"machineVariables":{"net_validation_ports": null}}}}}}}' --type=merge

Diagnosing cluster issues using gkectl

Use gkectl diagnosecommands to identify cluster issues and share cluster information with Google. See Diagnosing cluster issues.

Default logging behavior

For gkectl and gkeadm it is sufficient to use the default logging settings:

  • By default, log entries are saved as follows:

    • For gkectl, the default log file is /home/ubuntu/.config/gke-on-prem/logs/gkectl-$(date).log, and the file is symlinked with the logs/gkectl-$(date).log file in the local directory where you run gkectl.
    • For gkeadm, the default log file is logs/gkeadm-$(date).log in the local directory where you run gkeadm.
  • All log entries are saved in the log file, even if they are not printed in the terminal (when --alsologtostderr is false).
  • The -v5 verbosity level (default) covers all the log entries needed by the support team.
  • The log file also contains the command executed and the failure message.

We recommend that you send the log file to the support team when you need help.

Specifying a non-default location for the log file

To specify a non-default location for the gkectl log file, use the --log_file flag. The log file that you specify will not be symlinked with the local directory.

To specify a non-default location for the gkeadm log file, use the --log_file flag.

Locating Cluster API logs in the admin cluster

If a VM fails to start after the admin control plane has started, you can try debugging this by inspecting the Cluster API controllers' logs in the admin cluster:

  1. Find the name of the Cluster API controllers Pod in the kube-system namespace, where [ADMIN_CLUSTER_KUBECONFIG] is the path to the admin cluster's kubeconfig file:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system get pods | grep clusterapi-controllers
  2. Open the Pod's logs, where [POD_NAME] is the name of the Pod. Optionally, use grep or a similar tool to search for errors:

    kubectl --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] -n kube-system logs [POD_NAME] vsphere-controller-manager