Migrating workloads to different machine types

This tutorial demonstrates how to migrate workloads running on a GKE cluster to a new set of nodes without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.

Background

A node pool is a subset of machines that all have the same configuration, including machine type (CPU and memory) authorization scopes. Node pools represent a subset of nodes within a cluster; a container cluster can contain one or more node pools.

When you need to change the machine profile of your Compute Engine cluster, you can create a new node pool and then migrate your workloads over to the new node pool.

To migrate your workloads without incurring downtime, you need to:

  • Mark the existing node pool as unschedulable.
  • Drain the workloads running on the existing node pool.
  • Delete the existing node pool.

Kubernetes, which is the cluster orchestration system of GKE clusters, automatically reschedules the evicted Pods to the new node pool as it drains the existing node pool.

Before you begin

Take the following steps to enable the Kubernetes Engine API:
  1. Visit the Kubernetes Engine page in the Google Cloud Platform Console.
  2. Create or select a project.
  3. Wait for the API and related services to be enabled. This can take several minutes.
  4. Make sure that billing is enabled for your project.

    Learn how to enable billing

Install the following command-line tools used in this tutorial:

  • gcloud is used to create and delete Kubernetes Engine clusters. gcloud is included in the Google Cloud SDK.
  • kubectl is used to manage Kubernetes, the cluster orchestration system used by Kubernetes Engine. You can install kubectl using gcloud:
    gcloud components install kubectl

Set defaults for the gcloud command-line tool

To save time typing your project ID and Compute Engine zone options in the gcloud command-line tool, you can set the defaults:
gcloud config set project PROJECT_ID
gcloud config set compute/zone us-central1-b

Step 1: Create a GKE cluster

The first step is to create a container cluster to run application workloads. The following command creates a new cluster with five nodes with default machine type (n1-standard-1):

gcloud container clusters create migration-tutorial --num-nodes=5

Step 2: Run a replicated application deployment

The following command will create a six replica Deployment of the sample web application container image:

kubectl run web --image=gcr.io/google-samples/hello-app:1.0 \
  --replicas=6 --limits='cpu=100m,memory=80Mi'

You can retrieve the list of the Pods started by running:

kubectl get pods

Output:

NAME                   READY     STATUS    RESTARTS   AGE
web-2212180648-80q72   1/1       Running   0          10m
web-2212180648-jwj0j   1/1       Running   0          10m
web-2212180648-pf67q   1/1       Running   0          10m
web-2212180648-pqz73   1/1       Running   0          10m
web-2212180648-rrd3b   1/1       Running   0          10m
web-2212180648-v3b18   1/1       Running   0          10m

Step 3: Create a node pool with large machine type

By default, GKE creates a node pool named default-pool for every new cluster:

gcloud container node-pools list --cluster migration-tutorial

Output:

NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-pool  n1-standard-1  100           1.5.7

To introduce instances with a different configuration, such as a different machine type or different authentication scopes, you need to create a new node pool.

The following command creates a new node pool with named larger-pool with five high memory instances with n1-highmem-2 machine type (a larger machine type than the GKE default n1-standard-1):

gcloud container node-pools create larger-pool --cluster=migration-tutorial \
  --machine-type=n1-highmem-2 --num-nodes=5

Your container cluster should now have two node pools:

gcloud container node-pools list --cluster migration-tutorial

Output:

NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-pool  n1-standard-1  100           1.5.7
larger-pool   n1-highmem-2   100           1.5.7

You can see the instances of the new node pool added to your GKE cluster:

kubectl get nodes

Output:

NAME                                                STATUS    AGE       VERSION
gke-migration-tutorial-default-pool-56e3af9a-059q   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-0ng4   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-k6jm   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-lkrv   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-p9j4   Ready     40m       v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-2rhk    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-4bb2    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-7fl0    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-cx9q    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-hs6p    Ready     4m        v1.5.7

Step 4: Migrate the workloads

After you create a new node pool, your workloads are still running on the default-pool. Kubernetes does not reschedule Pods as long as they are running and available.

Run the following command to see which node the pods are running on (see the NODE column):

kubectl get pods -o=wide

Output:

NAME                          READY     STATUS    IP         NODE
web-2212180648-80q72          1/1       Running   10.8.3.4   gke-migration-tutorial-default-pool-56e3af9a-k6jm
web-2212180648-jwj0j          1/1       Running   10.8.2.5   gke-migration-tutorial-default-pool-56e3af9a-0ng4
web-2212180648-pf67q          1/1       Running   10.8.4.4   gke-migration-tutorial-default-pool-56e3af9a-lkrv
web-2212180648-pqz73          1/1       Running   10.8.2.6   gke-migration-tutorial-default-pool-56e3af9a-0ng4
web-2212180648-rrd3b          1/1       Running   10.8.4.3   gke-migration-tutorial-default-pool-56e3af9a-lkrv
web-2212180648-v3b18          1/1       Running   10.8.1.4   gke-migration-tutorial-default-pool-56e3af9a-p9j4

To migrate these Pods to the new node pool, you must perform the following steps:

  1. Cordon the existing node pool: This operation marks the nodes in the existing node pool (default-pool) as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.

  2. Drain the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool (default-pool) gracefully.

The above steps cause Pods running in your existing node pool to gracefully terminate, and Kubernetes reschedules them onto other available nodes. In this case the only available nodes are the ones in the larger-pool created in Step 3.

To make sure Kubernetes terminates your applications gracefully, your containers should handle the SIGTERM signal. This can be used to close active connections to the clients and commit or abort database transactions in a clean way. In your Pod manifest, you can use spec.terminationGracePeriodSeconds field to specify how long Kubernetes must wait before killing the containers in the Pod. This defaults to 30 seconds. You can read more about pod termination in the Kubernetes documentation.

First, cordon the nodes in the default-pool. You can run the following command to get a list of nodes in this node pool:

kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool

Then cordon each node by running a kubectl cordon NODE command (substitute NODE with the names from the previous command). The following command iterates over each node and marks them unschedulable:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do
  kubectl cordon "$node";
done

Output:

node "gke-migration-tutorial-default-pool-56e3af9a-059q" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-0ng4" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-k6jm" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-lkrv" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-p9j4" cordoned

Now you should see that the default-pool nodes have SchedulingDisabled status in the node list:

kubectl get nodes

Output:

NAME                                                STATUS                     AGE       VERSION
gke-migration-tutorial-default-pool-56e3af9a-059q   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-0ng4   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-k6jm   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-lkrv   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-p9j4   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-2rhk    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-4bb2    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-7fl0    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-cx9q    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-hs6p    Ready                      1h        v1.5.7

Next, drain Pods on each node gracefully. To perform the drain, use the kubectl drain command which evicts Pods on each node.

You can run kubectl drain --force NODE by substituting NODE with the same list of names passed to the kubectl cordon command.

The following shell command iterates each node in default-pool and drains them by evicting Pods with an allotted graceful termination period of 10 seconds:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do
  kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node";
done

Once this command completes, you should see that the Pods are now running on the larger-pool nodes:

kubectl get pods -o=wide

Output:

NAME                   READY     STATUS    IP         NODE
web-2212180648-3n9hz   1/1       Running   10.8.9.4   gke-migration-tutorial-larger-pool-b8ec62a6-cx9q
web-2212180648-88q1c   1/1       Running   10.8.7.4   gke-migration-tutorial-larger-pool-b8ec62a6-2rhk
web-2212180648-dlmjc   1/1       Running   10.8.9.3   gke-migration-tutorial-larger-pool-b8ec62a6-cx9q
web-2212180648-hcv46   1/1       Running   10.8.5.4   gke-migration-tutorial-larger-pool-b8ec62a6-hs6p
web-2212180648-n0nht   1/1       Running   10.8.6.4   gke-migration-tutorial-larger-pool-b8ec62a6-7fl0
web-2212180648-s51jb   1/1       Running   10.8.8.4   gke-migration-tutorial-larger-pool-b8ec62a6-4bb2

Step 5: Delete the old node pool

Once Kubernetes reschedules all Pods in the web Deployment to the larger-pool, it is now safe to delete the default-pool as it is no longer necessary. Run the following command to delete the default-pool:

gcloud container node-pools delete default-pool --cluster migration-tutorial

Once this operation completes, you should have a single node pool for your container cluster, which is the larger-pool:

gcloud container node-pools list --cluster migration-tutorial

Output:

NAME          MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
larger-pool   n1-highmem-2   100           1.5.7

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

  • Delete the container cluster: This step will delete the resources that make up the container cluster, such as the compute instances, disks and network resources.

gcloud container clusters delete migration-tutorial

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine Tutorials