Migrate your workloads to other machine types

Stay organized with collections Save and categorize content based on your preferences.

This tutorial demonstrates how to migrate workloads running on a Google Kubernetes Engine (GKE) cluster to a new set of nodes within the same cluster without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.

Background

A node pool is a subset of machines that all have the same configuration, including machine type (CPU and memory) authorization scopes. Node pools represent a subset of nodes within a cluster; a container cluster can contain one or more node pools.

When you need to change the machine profile of your Compute Engine cluster, you can create a new node pool and then migrate your workloads over to the new node pool.

To migrate your workloads without incurring downtime, you need to:

  • Mark the existing node pool as unschedulable.
  • Drain the workloads running on the existing node pool.
  • Delete the existing node pool.

Kubernetes, which is the cluster orchestration system of GKE clusters, automatically reschedules the evicted Pods to the new node pool as it drains the existing node pool.

Objectives

  • Create a GKE cluster.
  • Deploy the sample web application to the cluster.
  • Create a new node pool.
  • Migrate Pods to the new node pool without incurring downtime.

Costs

This tutorial uses the following billable components of Google Cloud:

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.

Before you begin

Take the following steps to enable the Kubernetes Engine API:
  1. Visit the Kubernetes Engine page in the Google Cloud console.
  2. Create or select a project.
  3. Wait for the API and related services to be enabled. This can take several minutes.
  4. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

Install the following command-line tools used in this tutorial:

  • gcloud is used to create and delete Kubernetes Engine clusters. gcloud is included in the gcloud CLI.
  • kubectl is used to manage Kubernetes, the cluster orchestration system used by Kubernetes Engine. You can install kubectl using gcloud:
    gcloud components install kubectl

Clone the sample code from GitHub:

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
cd kubernetes-engine-samples/migrating-node-pool

Set defaults for the gcloud command-line tool

To save time typing your project ID and Compute Engine zone options in the gcloud command-line tool, you can set the defaults:
gcloud config set project project-id
gcloud config set compute/zone compute-zone

Creating a GKE cluster

The first step is to create a container cluster to run application workloads. The following command creates a new cluster with five nodes with the default machine type (e2-medium):

gcloud container clusters create migration-tutorial --num-nodes=5

Running a replicated application deployment

The following manifest describes a six replica Deployment of the sample web application container image:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: hello-app
  template:
    metadata:
      labels:
        app: hello-app
    spec:
      containers:
      - name: hello-app
        image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0

To deploy this manifest, run:

kubectl apply -f node-pools-deployment.yaml

You can retrieve the list of the Pods started by running:

kubectl get pods

Output:

NAME                   READY     STATUS    RESTARTS   AGE
web-2212180648-80q72   1/1       Running   0          10m
web-2212180648-jwj0j   1/1       Running   0          10m
web-2212180648-pf67q   1/1       Running   0          10m
web-2212180648-pqz73   1/1       Running   0          10m
web-2212180648-rrd3b   1/1       Running   0          10m
web-2212180648-v3b18   1/1       Running   0          10m

Creating a node pool with large machine type

By default, GKE creates a node pool named default-pool for every new cluster:

gcloud container node-pools list --cluster migration-tutorial

Output:

NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-pool  e2-medium      100           1.16.13-gke.401

To introduce instances with a different configuration, such as a different machine type or different authentication scopes, you need to create a new node pool.

The following command creates a new node pool named larger-pool with five high memory instances of the e2-highmem-2 machine type:

gcloud container node-pools create larger-pool \
  --cluster=migration-tutorial \
  <strong>--machine-type=e2-highmem-2 \
  --num-nodes=5</strong>

Your container cluster should now have two node pools:

gcloud container node-pools list --cluster migration-tutorial

Output:

NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-pool  e2-medium      100           v1.16.13-gke.401
larger-pool   e2-highmem-2   100           v1.16.13-gke.401

You can see the instances of the new node pool added to your GKE cluster:

kubectl get nodes

Output:

NAME                                                STATUS    AGE       VERSION
gke-migration-tutorial-default-pool-56e3af9a-059q   Ready     40m       v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-0ng4   Ready     40m       v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-k6jm   Ready     40m       v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-lkrv   Ready     40m       v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-p9j4   Ready     40m       v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-2rhk    Ready     4m        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-4bb2    Ready     4m        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-7fl0    Ready     4m        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-cx9q    Ready     4m        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-hs6p    Ready     4m        v1.16.13-gke.401

Migrating the workloads

After you create a new node pool, your workloads are still running on the default-pool. Kubernetes does not reschedule Pods as long as they are running and available.

Run the following command to see which node the Pods are running on (see the NODE column):

kubectl get pods -o=wide

Output:

NAME                          READY     STATUS    IP         NODE
web-2212180648-80q72          1/1       Running   10.8.3.4   gke-migration-tutorial-default-pool-56e3af9a-k6jm
web-2212180648-jwj0j          1/1       Running   10.8.2.5   gke-migration-tutorial-default-pool-56e3af9a-0ng4
web-2212180648-pf67q          1/1       Running   10.8.4.4   gke-migration-tutorial-default-pool-56e3af9a-lkrv
web-2212180648-pqz73          1/1       Running   10.8.2.6   gke-migration-tutorial-default-pool-56e3af9a-0ng4
web-2212180648-rrd3b          1/1       Running   10.8.4.3   gke-migration-tutorial-default-pool-56e3af9a-lkrv
web-2212180648-v3b18          1/1       Running   10.8.1.4   gke-migration-tutorial-default-pool-56e3af9a-p9j4

To migrate these Pods to the new node pool, you must do the following:

  1. Cordon the existing node pool: This operation marks the nodes in the existing node pool (default-pool) as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.

  2. Drain the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool (default-pool) gracefully.

The preceding steps cause Pods running in your existing node pool to gracefully terminate, and Kubernetes reschedules them onto other available nodes. In this case, the only available nodes are in larger-pool node pool.

To make sure Kubernetes terminates your applications gracefully, your containers should handle the SIGTERM signal. This can be used to close active connections to the clients and commit or abort database transactions in a clean way. In your Pod manifest, you can use spec.terminationGracePeriodSeconds field to specify how long Kubernetes must wait before stopping the containers in the Pod. This defaults to 30 seconds. You can read more about Pod termination in the Kubernetes documentation.

You can cordon and drain nodes using the kubectl cordon and kubectl drain commands.

First, get a list of nodes in default-pool:

kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool

Then, run the kubectl cordon NODE command (substitute NODE with the names from the previous command). The following shell command iterates each node in default-pool, and marks them as unschedulable:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do
  kubectl cordon "$node";
done

Similar to the command above, drain each node by evicting Pods with an allotted graceful termination period of 10 seconds:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do
  kubectl drain --force --ignore-daemonsets --delete-emptydir-data --grace-period=10 "$node";
done

Once this command completes, you should see that the default-pool nodes have SchedulingDisabled status in the node list:

kubectl get nodes

Output:

NAME                                                STATUS                     AGE       VERSION
gke-migration-tutorial-default-pool-56e3af9a-059q   Ready,SchedulingDisabled   1h        v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-0ng4   Ready,SchedulingDisabled   1h        v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-k6jm   Ready,SchedulingDisabled   1h        v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-lkrv   Ready,SchedulingDisabled   1h        v1.16.13-gke.401
gke-migration-tutorial-default-pool-56e3af9a-p9j4   Ready,SchedulingDisabled   1h        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-2rhk    Ready                      1h        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-4bb2    Ready                      1h        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-7fl0    Ready                      1h        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-cx9q    Ready                      1h        v1.16.13-gke.401
gke-migration-tutorial-larger-pool-b8ec62a6-hs6p    Ready                      1h        v1.16.13-gke.401

Additionally, you should see that the Pods are now running on the larger-pool nodes:

kubectl get pods -o=wide

Output:

NAME                   READY     STATUS    IP         NODE
web-2212180648-3n9hz   1/1       Running   10.8.9.4   gke-migration-tutorial-larger-pool-b8ec62a6-cx9q
web-2212180648-88q1c   1/1       Running   10.8.7.4   gke-migration-tutorial-larger-pool-b8ec62a6-2rhk
web-2212180648-dlmjc   1/1       Running   10.8.9.3   gke-migration-tutorial-larger-pool-b8ec62a6-cx9q
web-2212180648-hcv46   1/1       Running   10.8.5.4   gke-migration-tutorial-larger-pool-b8ec62a6-hs6p
web-2212180648-n0nht   1/1       Running   10.8.6.4   gke-migration-tutorial-larger-pool-b8ec62a6-7fl0
web-2212180648-s51jb   1/1       Running   10.8.8.4   gke-migration-tutorial-larger-pool-b8ec62a6-4bb2

Deleting the old node pool

Once Kubernetes reschedules all Pods in the web Deployment to the larger-pool, it is now safe to delete the default-pool as it is no longer necessary. Run the following command to delete the default-pool:

gcloud container node-pools delete default-pool --cluster migration-tutorial

Once this operation completes, you should have a single node pool for your container cluster, which is the larger-pool:

gcloud container node-pools list --cluster migration-tutorial

Output:

NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
larger-pool   e2-highmem-2   100           1.16.13-gke.401

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  • Delete the container cluster: This step deletes resources that make up the container cluster, such as the compute instances, disks, and network resources.

    gcloud container clusters delete migration-tutorial
    

What's next