Version 1.3. This version is no longer supported. For more information see the version support policy.

Creating and managing node pools

Starting in GKE on-prem version 1.3, you can create a node pool to define a group of nodes in and across your user clusters that all have the same configuration. You can then manage that pool of nodes separately without affecting any of the other nodes in each cluster. Learn more about node pools.

One or more node pools can be defined in the configuration files of your user clusters. Creating a node pool creates additional nodes in each cluster. Node pool management is done through the configuration file of your user cluster, including creating, updating, and deleting node pools. After the node pools are defined in your configuration files, you deploy those changes to your clusters with the gkectl update cluster command. The changes that you deploy are immediately performed in each user cluster. For example, if you remove a node pool from a cluster, those nodes are immediately deleted regardless if any of those nodes are running a workload.

Example node pool:

nodepools:
  - name: pool-1
    cpus: 4
    memorymb: 8192
    replicas: 5

Tip for new installations: Create your first user cluster and define your node pools in that cluster. Then use that cluster's configuration file to create additional user clusters with the same node pool settings.

Before you begin

Support:
- Only user clusters version 1.3.0 or later are supported.
- Node pools in admin clusters are unsupported.
- The gkectl update cluster command currently supports only node pool management. All other changes that exist in the configuration file are ignored.
- While the nodes in a node pool can be managed separately from the other nodes in each user cluster, the nodes of any cluster cannot be separately upgraded. All nodes are upgraded when you upgrade your clusters.
Resources:
- You can deploy only changes to node pool replicas without interruption to a node's workload.
  
  Important: If you deploy any other node pool configuration change, the nodes in the node pool are recreated. You must ensure each node pool is not running a workload that should not be disrupted.
- When you deploy your node pool changes, a temporary node might be created. You must verify that an IP address is available for that temporary node.

Creating and updating node pools

You manage a node pool through modifying and deploying your user cluster's configuration file. You can create and deploy one or more node pools to each user cluster.

To create or update node pools:

In an editor, open the configuration file of the user cluster in which you want to create or update node pools.
Define one or more node pools in the nodepools section under usercluster in the configuration file:
1. Configure the minimum required node pool attributes. You must specify the following attributes for each node pool:
  - usercluster.nodepools.name: Specifies a unique name for the node pool. Updating this attribute recreates the node. Example: name: pool-1
  - usercluster.nodepools.cpus: Specify how many CPUs are allocated to each worker node of the user cluster. Updating this attribute recreates the node. Example: cpus: 4
  - usercluster.nodepools.memorymb: Specifies how much memory, in megabytes, is allocated to each worker node of the user cluster. Updating this attribute recreates the node. Example: memorymb: 8192
  - usercluster.nodepools.replicas: Specifies the total number of worker nodes that the user cluster uses to run workloads. You can update this attribute without affecting any nodes or running workloads. Example: replicas: 5
  Note that while some of the nodepools attributes are the same as the workernode (DHCP | Static IP), the workernode section is required for every user cluster. You can't remove workernode nor replace it with nodepools.
  
  Example:
```
nodepools:
  - name: pool-1
    cpus: 4
    memorymb: 8192
    replicas: 5
```
  See Examples for a configuration example with multiple node pools.
2. Configure optional node pool attributes. You can add labels and taints to your node pool configuration to steer node workloads. You can also define which vSphere Datastore is used by your node pool.
  - usercluster.nodepools.labels: Specifies one or more key : value pairs to uniquely identify your node pools. The key and value must begin with a letter or number, and can contain letters, numbers, hyphens, dots, and underscores, up to 63 characters each.
    
    For detailed configuration information, see labels.
    
    Important: You cannot specify the following keys for a label because they are reserved for use by GKE on-prem: kubernetes.io, k8s.io, and googleapis.com.
    
    Example:
```
labels:
  key1: value1
  key2: value2
```
  - usercluster.nodepools.taints: Specifies a key, value, and effect to define taints for your node pools. These taints correspond with the tolerations that you configure for your pods.
    
    The key is required and value is optional. Both must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores, up to 253 characters. Optionally, you can prefix a key with a DNS subdomain followed by a /. For example: example.com/my-app.
    
    Valid effect values are: NoSchedule, PreferNoSchedule, or NoExecute.
    
    For detailed configuration information, see taints.
    
    Example:
```
taints:
  - key: key1
    value: value1
    effect: NoSchedule
```
  - usercluster.nodepools.vsphere.datastore: Specify the vSphere Datastore to use with the node pool. This overrides the default vSphere Datastore of the user cluster.
    
    Example:
```
vsphere:
  datastore: datastore_name
```
See Examples for a configuration example with multiple node pools.
Use the gkectl update cluster command to deploy your changes to the user cluster.

Note: The gkectl update cluster supports only node pool management. Only changes to the nodepools section are deployed. All other changes in your configuration file are ignored.
```
gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config config.yaml --dry-run --yes
```
where:
- [ADMIN_CLUSTER_KUBECONFIG]: Specifies the kubeconfig file of your admin cluster.
- config.yaml: Specifies the edited configuration file of the user cluster. You might have chosen a different name for this file.
- --dry-run: Optional flag. Add this flag to view the change only. No changes are deployed to the user cluster.
- --yes: Optional flag. Add this flag to run the command silently. The prompt to verify that you want to proceed is disabled.
If you aborted the command prematurely, you can run the same command again to complete the operation and deploy your changes to the user cluster.

If you need to revert your changes, you must revert your changes in the configuration file and then redeploy those changes to your user cluster.
Verify that the changes are successful by inspecting all the nodes. Run the following command to list all of the nodes in the user cluster:
```
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
```
where [USER_CLUSTER_KUBECONFIG] is the kubeconfig file of your user cluster.

Deleting node pools

To remove node pools from a user cluster:

Remove all of the nodepools settings from the configuration file of your user cluster. If there are multiple node pools, you must remove all settings.
Ensure that there are no workloads running. All affected nodes will be deleted.
Deploy your changes by running the gkectl update cluster command:
```
gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config config.yaml --dry-run --yes
```
where:
- [ADMIN_CLUSTER_KUBECONFIG]: Specifies the kubeconfig file of your admin cluster.
- config.yaml: Specifies the edited configuration file of the user cluster. You might have chosen a different name for this file.
- --dry-run: Optional flag. Add this flag to view the change only. No changes are deployed to the user cluster.
- --yes: Optional flag. Add this flag to run the command silently. The prompt to verify that you want to proceed is disabled.
Verify that the changes are successful by inspecting all the nodes. Run the following command to list all of the nodes in the user cluster:
```
kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
```
where [USER_CLUSTER_KUBECONFIG] is the kubeconfig file of your user cluster.

Examples

In the following example configuration, there are four node pools, each with different attributes:

pool-1: only the minimum required attributes are specified
pool-2: includes vSphere Datastore
pool-3: includes taints and labels
pool-4: includes all attributes

...
usercluster:
  ...
  workernode:
    cpus: 4
    memorymb: 8192
    replicas: 3
  # (Optional) Node pools with customizable labels, taints, etc.
  nodepools:
    - name: pool-1
      cpus: 4
      memorymb: 8192
      replicas: 5
    - name: pool-2
      cpus: 8
      memorymb: 16384
      replicas: 3
      vsphere:
        datastore: my_datastore
    - name: pool-3
      cpus: 4
      memorymb: 8192
      replicas: 5
      taints:
        - key: "example-key"
          effect: NoSchedule
      labels:
        environment: production
        app: nginx
    - name: pool-4
      cpus: 8
      memorymb: 16384
      replicas: 3
      taints:
        - key: "my_key"
          value: my_value1
          effect: NoExecute
      labels:
        environment: test
      vsphere:
        datastore: my_datastore
...

View a complete example of the user cluster configuration file.

Troubleshooting

In general, the gkectl update cluster command provides specifics when it fails. If the command succeeded and you don't see the nodes, you can troubleshoot with the Diagnosing cluster issues guide.
It is possible that there are insufficient cluster resources like a lack of available IP addresses during node pool creation or update. See the Resizing a user cluster topic for details about verifying that IP addresses are available.
You can also review the general Troubleshooting guide.
Won't proceed past Creating node MachineDeployment(s) in user cluster….

It can take a while to create or update the node pools in your user cluster. However, if the wait time is extremely long and you suspect that something might have failed, you can run the following commands:
1. Run kubectl get nodes to obtain the state of your nodes.
2. For any nodes that are not ready, run kubectl describe node [node_name] to obtain details.