Creating and managing node pools

Starting in Google Distributed Cloud version 1.3, you can create a group of nodes in your user cluster that all have the same configuration by defining a node pool in the configuration file of that cluster. You can then manage that pool of nodes separately without affecting any of the other nodes in the cluster. Learn more about node pools.

One or more node pools can be defined in the configuration file of any user clusters. Creating a node pool creates additional nodes in the user cluster. Node pool management, including creating, updating, and deleting node pools in a user cluster, is done through modifying the nodePools section of your configuration file and deploying those changes to your existing cluster with the gkectl update cluster command. Note that deleting nodepools will cause immediate removal of the related nodes regardless if any of those nodes are running a workload.

Example node pool:

nodePools:
  - name: pool-1
    cpus: 4
    memoryMB: 8192
    replicas: 5

Tip for new installations: Create your first user cluster and define your node pools in that cluster. Then use that cluster's configuration file to create additional user clusters with the same node pool settings.

Before you begin

  • Support:

    • Only user clusters version 1.3.0 or later are supported.

    • Node pools in admin clusters are unsupported.

    • The gkectl update cluster command currently has full support for updating node pools and adding static IPs. It also supports enabling cloud audit logging and enabling / disabling auto repair. All other changes that exist in the configuration file are ignored.

    • While the nodes in a node pool can be managed separately from other nodes, the nodes of any cluster cannot be separately upgraded. All nodes are upgraded when you upgrade your clusters.

  • Resources:

    • You can deploy only changes to node pool replicas without interruption to a node's workload.

      Important: If you deploy any other node pool configuration change, the nodes in the node pool are recreated. You must ensure any such node pool is not running a workload that should not be disrupted.

    • When you deploy your node pool changes, unwanted nodes are deleted after the desired ones are created or updated. One implication of this policy is that even if the total number of nodes remains the same before and after an update, more resources (for example, IP addresses) may be required during the update.

      Suppose a node pool will have N nodes at the end of an update. Then you must have at least N + 1 IP addresses available for nodes in that pool. This means that if you are resizing a cluster by adding nodes to one or more pools, you must have at least one more IP address than the total number of nodes that will be in all of the cluster's node pools at the end of the resizing. For more information, see Verify that enough IP addresses are available.

Creating and updating node pools

You manage a node pool through modifying and deploying your user cluster's configuration file. You can create and deploy one or more node pools in a user cluster.

To create or update node pools:

  1. In an editor, open the configuration file of the user cluster in which you want to create or update node pools.

  2. Define one or more node pools in the nodePools section of the user cluster configuration file:

    1. Configure the minimum required node pool attributes. You must specify the following attributes for each node pool:

      • nodePools.name: Specifies a unique name for the node pool. Updating this attribute recreates the node. Example: - name: pool-1

      • nodePools.cpus: Specify how many CPUs are allocated to each worker node in the pool. Updating this attribute recreates the node. Example: cpus: 4

      • nodePools.memoryMB: Specifies how much memory, in megabytes, is allocated to each worker node of the user cluster. Updating this attribute recreates the node. Example: memoryMB: 8192

      • nodePools.replicas: Specifies the total number of worker nodes in the pool. The user cluster uses nodes across all the pools to run workloads. You can update this attribute without affecting any nodes or running workloads. Example: replicas: 5

      Note that while some of the nodePools attributes are the same as the workernode (DHCP | Static IP) in the old configuration file, the workernode section is still required in the old configuration files of every user cluster. You can't remove the workernode section nor replace it with nodepools. In new user cluster configuration file, there is no workernode section any more. You have to define at least one node pool for a user cluster and ensure that there are enough un-tainted nodes in replacement of the default workernode pool in old configuration files.

      Example:

      nodePools:
      - name: pool-1
        cpus: 4
        memoryMB: 8192
        replicas: 5
      

      See Examples for a exemplar user cluster configuration file with multiple node pools.

    2. Configure optional node pool attributes. You can add labels and taints to your node pool configuration to steer node workloads. You can also define which vSphere Datastore is used by your node pool.

      • nodePools.labels: Specifies one or more key : value pairs to uniquely identify your node pools. The key and value must begin with a letter or number, and can contain letters, numbers, hyphens, dots, and underscores, up to 63 characters each.

        For detailed configuration information, see labels.

        Important: You cannot specify the following keys for a label because they are reserved for use by Google Distributed Cloud: kubernetes.io, k8s.io, and googleapis.com.

        Example:

        labels:
          key1: value1
          key2: value2
        
      • nodePools.taints: Specifies a key, value, and effect to define taints for your node pools. These taints correspond with the tolerations that you configure for your pods.

        The key is required and value is optional. Both must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores, up to 253 characters. Optionally, you can prefix a key with a DNS subdomain followed by a /. For example: example.com/my-app.

        Valid effect values are: NoSchedule, PreferNoSchedule, or NoExecute.

        For detailed configuration information, see taints.

        Example:

        taints:
          - key: key1
            value: value1
            effect: NoSchedule
        
      • nodePools.bootDiskSizeGB: Specifies the size of boot disk, in gigabytes, is allocated to each worker node in the pool. This configuration is available starting from Google Distributed Cloud version 1.5.0

        Example:

        bootDiskSizeGB: 40
        
      • nodePools.vsphere.datastore: Specify the vSphere Datastore on which each node in the pool will be created on. This overrides the default vSphere Datastore of the user cluster.

        Example:

        vsphere:
          datastore: datastore_name
        

    See Examples for a configuration example with multiple node pools.

  3. Use the gkectl update cluster command to deploy your changes to the user cluster.

    gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [USER_CLUSTER_CONFIG_FILE] --dry-run --yes
    
    where:
    • [ADMIN_CLUSTER_KUBECONFIG]: Specifies the kubeconfig file of your admin cluster.
    • [USER_CLUSTER_CONFIG_FILE]: Specifies the configuration file of your user cluster.
    • --dry-run: Optional flag. Add this flag to view the change only. No changes are deployed to the user cluster.
    • --yes: Optional flag. Add this flag to run the command silently. The prompt to verify that you want to proceed is disabled.

    If you aborted the command prematurely, you can run the same command again to complete the operation and deploy your changes to the user cluster.

    If you need to revert your changes, you must revert your changes in the configuration file and then redeploy those changes to your user cluster.

  4. Verify that the changes are successful by inspecting all the nodes. Run the following command to list all of the nodes in the user cluster:

    kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
    

    where [USER_CLUSTER_KUBECONFIG] is the kubeconfig file of your user cluster.

Deleting a node pool

To delete a node pool from a user cluster:

  1. Remove its definition from the nodePools section of the user cluster configuration file.

  2. Ensure that there are no workloads running on the affected nodes.

  3. Deploy your changes by running the gkectl update cluster command:

    gkectl update cluster --kubeconfig [ADMIN_CLUSTER_KUBECONFIG] --config [USER_CLUSTER_CONFIG_FILE] --dry-run --yes
    
    where:
    • [ADMIN_CLUSTER_KUBECONFIG]: Specifies the kubeconfig file of your admin cluster.
    • [USER_CLUSTER_CONFIG_FILE]: Specifies the configuration file of your user cluster.
    • --dry-run: Optional flag. Add this flag to view the change only. No changes are deployed to the user cluster.
    • --yes: Optional flag. Add this flag to run the command silently. The prompt to verify that you want to proceed is disabled.
  4. Verify that the changes are successful by inspecting all the nodes. Run the following command to list all of the nodes in the user cluster:

    kubectl --kubeconfig [USER_CLUSTER_KUBECONFIG] get nodes -o wide
    

    where [USER_CLUSTER_KUBECONFIG] is the kubeconfig file of your user cluster.

Examples

In the following example configuration, there are four node pools, each with different attributes:

  • pool-1: only the minimum required attributes are specified
  • pool-2: includes vSphere Datastore
  • pool-3: includes bootDiskSizeGB
  • pool-4: includes taints and labels
  • pool-5: includes all attributes
apiVersion: v1
kind: UserCluster
...
# (Required) List of node pools. The total un-tainted replicas across all node pools
# must be greater than or equal to 3
nodePools:
- name: pool-1
  cpus: 4
  memoryMB: 8192
  replicas: 5
- name: pool-2
  cpus: 8
  memoryMB: 16384
  replicas: 3
  vsphere:
    datastore: my_datastore
- name: pool-3
  cpus: 8
  memoryMB: 8192
  replicas: 3
  bootDiskSizeGB: 40
- name: pool-4
  cpus: 4
  memoryMB: 8192
  replicas: 5
  taints:
    - key: "example-key"
      effect: NoSchedule
  labels:
    environment: production
    app: nginx
- name: pool-5
  cpus: 8
  memoryMB: 16384
  replicas: 3
  taints:
    - key: "my_key"
      value: my_value1
      effect: NoExecute
  labels:
    environment: test
  vsphere:
    datastore: my_datastore
  bootDiskSizeGB: 60
...

Troubleshooting

  • In general, the gkectl update cluster command provides specifics when it fails. If the command succeeded and you don't see the nodes, you can troubleshoot with the Diagnosing cluster issues guide.

  • It is possible that there are insufficient cluster resources like a lack of available IP addresses during node pool creation or update. See the Resizing a user cluster topic for details about verifying that IP addresses are available.

  • You can also review the general Troubleshooting guide.

  • Won't proceed past Creating node MachineDeployment(s) in user cluster….

    It can take a while to create or update the node pools in your user cluster. However, if the wait time is extremely long and you suspect that something might have failed, you can run the following commands:

    1. Run kubectl get nodes to obtain the state of your nodes.
    2. For any nodes that are not ready, run kubectl describe node [node_name] to obtain details.