This tutorial demonstrates how to migrate workloads running on a Google Kubernetes Engine (GKE) cluster to a new set of nodes within the same cluster without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.
Background
A node pool is a subset of machines that all have the same configuration, including machine type (CPU and memory) authorization scopes. Node pools represent a subset of nodes within a cluster; a container cluster can contain one or more node pools.
When you need to change the machine profile of your Compute Engine cluster, you can create a new node pool and then migrate your workloads over to the new node pool.
To migrate your workloads without incurring downtime, you need to:
- Mark the existing node pool as unschedulable.
- Drain the workloads running on the existing node pool.
- Delete the existing node pool.
Kubernetes, which is the cluster orchestration system of GKE clusters, automatically reschedules the evicted Pods to the new node pool as it drains the existing node pool.
Objectives
- Create a GKE cluster.
- Deploy the sample web application to the cluster.
- Create a new node pool.
- Migrate Pods to the new node pool without incurring downtime.
Costs
This tutorial uses the following billable components of Google Cloud:
To generate a cost estimate based on your projected usage,
use the pricing calculator.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.
Before you begin
Take the following steps to enable the Kubernetes Engine API:- Visit the Kubernetes Engine page in the Google Cloud console.
- Create or select a project.
- Wait for the API and related services to be enabled. This can take several minutes.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
Install the following command-line tools used in this tutorial:
-
gcloud
is used to create and delete Kubernetes Engine clusters.gcloud
is included in thegcloud
CLI. -
kubectl
is used to manage Kubernetes, the cluster orchestration system used by Kubernetes Engine. You can installkubectl
usinggcloud
:gcloud components install kubectl
Clone the sample code from GitHub:
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples cd kubernetes-engine-samples/migrating-node-pool
Set defaults for the gcloud
command-line tool
To save time typing your project ID
and Compute Engine zone options in the gcloud
command-line tool, you can set the defaults:
gcloud config set project project-id gcloud config set compute/zone compute-zone
Creating a GKE cluster
The first step is to create a container cluster to run application workloads.
The following command creates a new cluster with five nodes with the default
machine type (e2-medium
):
gcloud container clusters create migration-tutorial --num-nodes=5
Running a replicated application deployment
The following manifest describes a six replica Deployment of the sample web application container image:
To deploy this manifest, run:
kubectl apply -f node-pools-deployment.yaml
You can retrieve the list of the Pods started by running:
kubectl get podsOutput:
NAME READY STATUS RESTARTS AGE web-2212180648-80q72 1/1 Running 0 10m web-2212180648-jwj0j 1/1 Running 0 10m web-2212180648-pf67q 1/1 Running 0 10m web-2212180648-pqz73 1/1 Running 0 10m web-2212180648-rrd3b 1/1 Running 0 10m web-2212180648-v3b18 1/1 Running 0 10m
Creating a node pool with large machine type
By default, GKE creates a node pool named default-pool
for
every new cluster:
gcloud container node-pools list --cluster migration-tutorialOutput:
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-pool e2-medium 100 1.16.13-gke.401
To introduce instances with a different configuration, such as a different machine type or different authentication scopes, you need to create a new node pool.
The following command creates a new node pool named larger-pool
with five
high memory instances of the e2-highmem-2
machine type:
gcloud container node-pools create larger-pool \ --cluster=migration-tutorial \ --machine-type=e2-highmem-2 \ --num-nodes=5
Your container cluster should now have two node pools:
gcloud container node-pools list --cluster migration-tutorialOutput:
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-pool e2-medium 100 v1.16.13-gke.401 larger-pool e2-highmem-2 100 v1.16.13-gke.401
You can see the instances of the new node pool added to your GKE cluster:
kubectl get nodesOutput:
NAME STATUS AGE VERSION gke-migration-tutorial-default-pool-56e3af9a-059q Ready 40m v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-0ng4 Ready 40m v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-k6jm Ready 40m v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-lkrv Ready 40m v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-p9j4 Ready 40m v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-2rhk Ready 4m v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-4bb2 Ready 4m v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-7fl0 Ready 4m v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q Ready 4m v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-hs6p Ready 4m v1.16.13-gke.401
Migrating the workloads
After you create a new node pool, your workloads are still running on the
default-pool
. Kubernetes does not reschedule Pods as long as they are running
and available.
Run the following command to see which node the Pods are running on (see the
NODE
column):
kubectl get pods -o=wideOutput:
NAME READY STATUS IP NODE web-2212180648-80q72 1/1 Running 10.8.3.4 gke-migration-tutorial-default-pool-56e3af9a-k6jm web-2212180648-jwj0j 1/1 Running 10.8.2.5 gke-migration-tutorial-default-pool-56e3af9a-0ng4 web-2212180648-pf67q 1/1 Running 10.8.4.4 gke-migration-tutorial-default-pool-56e3af9a-lkrv web-2212180648-pqz73 1/1 Running 10.8.2.6 gke-migration-tutorial-default-pool-56e3af9a-0ng4 web-2212180648-rrd3b 1/1 Running 10.8.4.3 gke-migration-tutorial-default-pool-56e3af9a-lkrv web-2212180648-v3b18 1/1 Running 10.8.1.4 gke-migration-tutorial-default-pool-56e3af9a-p9j4
To migrate these Pods to the new node pool, you must do the following:
Cordon the existing node pool: This operation marks the nodes in the existing node pool (
default-pool
) as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.Drain the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool (
default-pool
) gracefully.
The preceding steps cause Pods running in your existing node pool to gracefully
terminate, and Kubernetes reschedules them onto other available nodes. In this
case, the only available nodes are in larger-pool
node pool.
To make sure Kubernetes terminates your applications gracefully, your containers
should handle the SIGTERM signal. This can be used to close
active connections to the clients and commit or abort database transactions in a
clean way. In your Pod manifest, you can use
spec.terminationGracePeriodSeconds
field to specify how long Kubernetes must
wait before stopping the containers in the Pod. This defaults to 30 seconds. You
can read more about
Pod termination
in the Kubernetes documentation.
You can cordon and drain nodes using the kubectl drain
command.
First, get a list of nodes in default-pool
:
kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool
Then, you can run a kubectl drain --force NODE
command
(substitute NODE
with the names from the
previous command). The following shell command iterates each node in default-pool
,
marks them unschedulable, and drains them by evicting Pods with an allotted
graceful termination period of 10 seconds:
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do kubectl drain --force --ignore-daemonsets --delete-emptydir-data --grace-period=10 "$node"; done
Once this command completes, you should see that the default-pool
nodes have SchedulingDisabled
status in the node list:
kubectl get nodesOutput:
NAME STATUS AGE VERSION gke-migration-tutorial-default-pool-56e3af9a-059q Ready,SchedulingDisabled 1h v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-0ng4 Ready,SchedulingDisabled 1h v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-k6jm Ready,SchedulingDisabled 1h v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-lkrv Ready,SchedulingDisabled 1h v1.16.13-gke.401 gke-migration-tutorial-default-pool-56e3af9a-p9j4 Ready,SchedulingDisabled 1h v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-2rhk Ready 1h v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-4bb2 Ready 1h v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-7fl0 Ready 1h v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q Ready 1h v1.16.13-gke.401 gke-migration-tutorial-larger-pool-b8ec62a6-hs6p Ready 1h v1.16.13-gke.401
Additionally, you should see that the Pods are now running on the
larger-pool
nodes:
kubectl get pods -o=wideOutput:
NAME READY STATUS IP NODE web-2212180648-3n9hz 1/1 Running 10.8.9.4 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q web-2212180648-88q1c 1/1 Running 10.8.7.4 gke-migration-tutorial-larger-pool-b8ec62a6-2rhk web-2212180648-dlmjc 1/1 Running 10.8.9.3 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q web-2212180648-hcv46 1/1 Running 10.8.5.4 gke-migration-tutorial-larger-pool-b8ec62a6-hs6p web-2212180648-n0nht 1/1 Running 10.8.6.4 gke-migration-tutorial-larger-pool-b8ec62a6-7fl0 web-2212180648-s51jb 1/1 Running 10.8.8.4 gke-migration-tutorial-larger-pool-b8ec62a6-4bb2
Deleting the old node pool
Once Kubernetes reschedules all Pods in the web
Deployment to the
larger-pool
, it is now safe to delete the default-pool
as it is no longer
necessary.
Run the following command to delete the default-pool
:
gcloud container node-pools delete default-pool --cluster migration-tutorial
Once this operation completes, you should have a single node pool for your
container cluster, which is the larger-pool
:
gcloud container node-pools list --cluster migration-tutorialOutput:
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION larger-pool e2-highmem-2 100 1.16.13-gke.401
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the container cluster: This step deletes resources that make up the container cluster, such as the compute instances, disks, and network resources.
gcloud container clusters delete migration-tutorial
What's next
Explore other Kubernetes Engine tutorials.
Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.