Migrating workloads to different machine types

This tutorial demonstrates how to migrate workloads running on a Container Engine cluster to a new set of nodes without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.


A node pool is a subset of machines that all have the same configuration, including machine type (CPU and memory) authoriazation scopes. Node pools represent a subset of nodes within a cluster; a container cluster can contain one or more node pools.

When you need to change the machine profile of your Compute Engine cluster, you can create a new node pool and then migrate your workloads over to the new node pool.

To migrate your workloads without incurring downtime, you need to:

  • Mark the existing node pool as unscheduleable.
  • Drain the workloads running on the existing node pool.
  • Delete the existing node pool.

Kubernetes, which is the cluster orchestration system of Container Engine clusters, automatically reschedules the evicted Pods to the new node pool as it drains the existing node pool.

Before you begin

Take the following steps to enable the Google Container Engine API:
  1. Visit the Container Engine page in the Google Cloud Platform Console.
  2. Create or select a project.
  3. Wait for the API and related services to be enabled. This can take several minutes.
  4. Enable billing for your project.

    Enable billing

Install the following command-line tools used in this tutorial:

  • gcloud is used to create and delete Container Engine clusters gcloud is included in the Google Cloud SDK.
  • kubectl is used to manage Kubernetes, the cluster orchestration system used by Container Engine. You can install kubectl using gcloud:
    gcloud components install kubectl

Set defaults for the gcloud command-line tool

To save time typing your project ID and Compute Engine zone options in the gcloud command-line tool, you can set default configuration values by running the following commands:
$ gcloud config set project PROJECT_ID
$ gcloud config set compute/zone us-central1-b

Step 1: Create a Container Engine cluster

The first step is to create a container cluster to run a sample load-balanced web application deployment. The following command creates a new cluster with 5 nodes with default machine type (n1-standard-1):

 gcloud container clusters create migration-tutorial --num-nodes=5

Step 2: Run a replicated web server deployment

The next step is to create a web application Deployment. The following command will create a six replica Deployment of the nginx web server running on port 80:

 kubectl run web --image=nginx:1.13 --replicas=6 --port 80 --limits='cpu=100m,memory=80Mi'

You can retrieve the list of the Pods started by running:

$ kubectl get pods
NAME                   READY     STATUS    RESTARTS   AGE
web-2212180648-80q72   1/1       Running   0          10m
web-2212180648-jwj0j   1/1       Running   0          10m
web-2212180648-pf67q   1/1       Running   0          10m
web-2212180648-pqz73   1/1       Running   0          10m
web-2212180648-rrd3b   1/1       Running   0          10m
web-2212180648-v3b18   1/1       Running   0          10m

Expose this Deployment to the Internet by creating a Service with LoadBalancer type:

 kubectl expose deployment/web --type=LoadBalancer

Container Engine creates a Load Balancer for your application; this might take several minutes. You can run the following command to find out the external IP address for the web Service:

$ kubectl get services
NAME         CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
kubernetes    <none>          443/TCP        8m
web    80:30415/TCP   3m

You can point your browser to visit this IP address and verify that the application is working correctly.

Step 3: Create a node pool with large machine type

When you create a cluster, Container Engine creates a pool named default-pool:

$ gcloud container node-pools list --cluster migration-tutorial
default-pool  n1-standard-1  100           1.5.7

To introduce instances with a different configuration, such as a different machine type or different authentication scopes, you need to create a new node pool.

The following command creates a new node pool with named larger-pool with five high memory instances with n1-highmem-2 machine type (a larger machine type than the Container Engine default n1-standard-1):

gcloud container node-pools create larger-pool --cluster migration-tutorial --machine-type=n1-highmem-2 --num-nodes=5

Your container cluster should now have two node pools:

$ gcloud container node-pools list --cluster migration-tutorial
default-pool  n1-standard-1  100           1.5.7
larger-pool   n1-highmem-2   100           1.5.7

You can see the instances of the new node pool added to your Container Engine cluster:

$ kubectl get nodes
NAME                                                STATUS    AGE       VERSION
gke-migration-tutorial-default-pool-56e3af9a-059q   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-0ng4   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-k6jm   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-lkrv   Ready     40m       v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-p9j4   Ready     40m       v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-2rhk    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-4bb2    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-7fl0    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-cx9q    Ready     4m        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-hs6p    Ready     4m        v1.5.7

Step 4: Migrate the workloads

After you create a new node pool, your workloads are still running on the default-pool. Kubernetes does not reschedule Pods as long as they are running and available.

Run the following command to see which node the pods are running on (see the NODE column):

$ kubectl get pods -o=wide
NAME                          READY     STATUS    IP         NODE
web-2212180648-80q72          1/1       Running   gke-migration-tutorial-default-pool-56e3af9a-k6jm
web-2212180648-jwj0j          1/1       Running   gke-migration-tutorial-default-pool-56e3af9a-0ng4
web-2212180648-pf67q          1/1       Running   gke-migration-tutorial-default-pool-56e3af9a-lkrv
web-2212180648-pqz73          1/1       Running   gke-migration-tutorial-default-pool-56e3af9a-0ng4
web-2212180648-rrd3b          1/1       Running   gke-migration-tutorial-default-pool-56e3af9a-lkrv
web-2212180648-v3b18          1/1       Running   gke-migration-tutorial-default-pool-56e3af9a-p9j4

To migrate these Pods to the new node pool, you must perform the following steps:

  1. Cordon the existing node pool: This operation marks the nodes in the existing node pool (default-pool) as unscheduleable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unscheduleable.

  2. Drain the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool (default-pool) gracefully.

The above steps cause Pods running in your existing node pool to gracefully terminate, and Kubernetes reschedules them onto other available nodes. In this case the only available nodes are the ones in the larger-pool created in Step 3.

To make sure Kubernetes terminates your applications gracefully, your containers should handle the SIGTERM signal. This can be used to close active connections to the clients and commit or abort database transactions in a clean way. In your Pod manifest, you can use spec.terminationGracePeriodSeconds field to specify how long Kubernetes must wait before killing the containers in the Pod. This defaults to 30 seconds. You can read more about pod termination in the Kubernetes documentation.

First, cordon the nodes in the default-pool. You can run the following command to get a list of nodes in this node pool:

kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool

Then cordon each node by running a kubectl cordon <NODE> command (substitute <NODE> with the names from the previous command). The following command iterates over each node and marks them unschedulable:

$ for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do kubectl cordon "$node"; done

node "gke-migration-tutorial-default-pool-56e3af9a-059q" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-0ng4" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-k6jm" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-lkrv" cordoned
node "gke-migration-tutorial-default-pool-56e3af9a-p9j4" cordoned

Now you should see that the default-pool nodes have SchedulingDisabled status in the node list:

$ kubectl get nodes
NAME                                                STATUS                     AGE       VERSION
gke-migration-tutorial-default-pool-56e3af9a-059q   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-0ng4   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-k6jm   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-lkrv   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-default-pool-56e3af9a-p9j4   Ready,SchedulingDisabled   1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-2rhk    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-4bb2    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-7fl0    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-cx9q    Ready                      1h        v1.5.7
gke-migration-tutorial-larger-pool-b8ec62a6-hs6p    Ready                      1h        v1.5.7

Next, drain Pods on each node gracefully. To perform the drain, use the kubectl drain command which evicts Pods on each node.

You can run kubectl drain --force <NODE> by substituting <NODE> with the same list of names passed to the kubectl cordon command.

The following command iterates each node in default-pool and drains them:

for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do kubectl drain --force "$node" done

Once this command completes, you should see that the Pods are now running on the larger-pool nodes:

$ kubectl get pods -o=wide
NAME                   READY     STATUS    IP         NODE
web-2212180648-3n9hz   1/1       Running   gke-migration-tutorial-larger-pool-b8ec62a6-cx9q
web-2212180648-88q1c   1/1       Running   gke-migration-tutorial-larger-pool-b8ec62a6-2rhk
web-2212180648-dlmjc   1/1       Running   gke-migration-tutorial-larger-pool-b8ec62a6-cx9q
web-2212180648-hcv46   1/1       Running   gke-migration-tutorial-larger-pool-b8ec62a6-hs6p
web-2212180648-n0nht   1/1       Running   gke-migration-tutorial-larger-pool-b8ec62a6-7fl0
web-2212180648-s51jb   1/1       Running   gke-migration-tutorial-larger-pool-b8ec62a6-4bb2

Visit the external IP address of the web Service to see if the application serves the requests correctly from the new node pool.

Step 5: Delete the old node pool

Once Kubernetes reschedules all Pods in the web Deployment to the larger-pool, it is now safe to delete the default-pool as it is no longer necessary. Run the following command to delete the default-pool:

 gcloud container node-pools delete default-pool --cluster migration-tutorial

Once this operation completes, you should have a single node pool for your container cluster, which is the larger-pool:

$ gcloud container node-pools list --cluster migration-tutorial
larger-pool   n1-highmem-2   100           1.5.7

Step 6: Cleanup

After completing this tutorial, follow these steps to remove the following resources to prevent unwanted charges incurring on your account:

  1. Delete the service: This step will deallocate the Cloud Load Balancer created for your service:

    kubectl delete service web
  2. Wait for the Load Balancer provisioned for the web service to be deleted: The load balancer is deleted asynchronously in the background when you run kubectl delete. Wait until the load balancer is deleted by watching the output of the following command:

    gcloud compute forwarding-rules list
  3. Delete the container cluster: This step will delete the resources that make up the container cluster, such as the compute instances, disks and network resources.

    gcloud container clusters delete migration-tutorial

Send feedback about...

Container Engine Documentation