This tutorial demonstrates how to migrate workloads running on a GKE cluster to a new set of nodes within the same cluster without incurring downtime for your application. Such a migration can be useful if you want to migrate your workloads to nodes with a different machine type.
Background
A node pool is a subset of machines that all have the same configuration, including machine type (CPU and memory) authorization scopes. Node pools represent a subset of nodes within a cluster; a container cluster can contain one or more node pools.
When you need to change the machine profile of your Compute Engine cluster, you can create a new node pool and then migrate your workloads over to the new node pool.
To migrate your workloads without incurring downtime, you need to:
- Mark the existing node pool as unschedulable.
- Drain the workloads running on the existing node pool.
- Delete the existing node pool.
Kubernetes, which is the cluster orchestration system of GKE clusters, automatically reschedules the evicted Pods to the new node pool as it drains the existing node pool.
Before you begin
Take the following steps to enable the Kubernetes Engine API:- Visit the Kubernetes Engine page in the Google Cloud Console.
- Create or select a project.
- Wait for the API and related services to be enabled. This can take several minutes.
-
Make sure that billing is enabled for your Google Cloud project. Learn how to confirm billing is enabled for your project.
Install the following command-line tools used in this tutorial:
-
gcloud
is used to create and delete Kubernetes Engine clusters.gcloud
is included in the Google Cloud SDK. -
kubectl
is used to manage Kubernetes, the cluster orchestration system used by Kubernetes Engine. You can installkubectl
usinggcloud
:gcloud components install kubectl
Set defaults for the gcloud
command-line tool
To save time typing your project ID
and Compute Engine zone options in the gcloud
command-line tool, you can set the defaults:
gcloud config set project [PROJECT_ID] gcloud config set compute/zone [COMPUTE_ENGINE_ZONE]
Step 1: Create a GKE cluster
The first step is to create a container cluster to run application workloads.
The following command creates a new cluster with five nodes with default machine
type (n1-standard-1
gcloud container clusters create migration-tutorial --num-nodes=5
Step 2: Run a replicated application deployment
The following command will create a six replica Deployment of the sample web application container image:
kubectl run web --image=gcr.io/google-samples/hello-app:1.0 \ --replicas=6 --limits='cpu=100m,memory=80Mi'
You can retrieve the list of the Pods started by running:
kubectl get podsOutput:
NAME READY STATUS RESTARTS AGE web-2212180648-80q72 1/1 Running 0 10m web-2212180648-jwj0j 1/1 Running 0 10m web-2212180648-pf67q 1/1 Running 0 10m web-2212180648-pqz73 1/1 Running 0 10m web-2212180648-rrd3b 1/1 Running 0 10m web-2212180648-v3b18 1/1 Running 0 10m
Step 3: Create a node pool with large machine type
By default, GKE creates a node pool named default-pool
for
every new cluster:
gcloud container node-pools list --cluster migration-tutorialOutput:
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-pool n1-standard-1 100 1.5.7
To introduce instances with a different configuration, such as a different [machine type] or different authentication scopes, you need to create a new node pool.
The following command creates a new node pool named larger-pool
with five
high memory instances with n1-highmem-2
machine type (a larger
machine type than the GKE default n1-standard-1
):
gcloud container node-pools create larger-pool \ --cluster=migration-tutorial \ --machine-type=n1-highmem-2 \ --num-nodes=5
Your container cluster should now have two node pools:
gcloud container node-pools list --cluster migration-tutorialOutput:
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-pool n1-standard-1 100 1.5.7 larger-pool n1-highmem-2 100 1.5.7
You can see the instances of the new node pool added to your GKE cluster:
kubectl get nodesOutput:
NAME STATUS AGE VERSION gke-migration-tutorial-default-pool-56e3af9a-059q Ready 40m v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-0ng4 Ready 40m v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-k6jm Ready 40m v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-lkrv Ready 40m v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-p9j4 Ready 40m v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-2rhk Ready 4m v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-4bb2 Ready 4m v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-7fl0 Ready 4m v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q Ready 4m v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-hs6p Ready 4m v1.5.7
Step 4: Migrate the workloads
After you create a new node pool, your workloads are still running on the
default-pool
. Kubernetes does not reschedule Pods as long as they are running
and available.
Run the following command to see which node the pods are running on (see the NODE column):
kubectl get pods -o=wideOutput:
NAME READY STATUS IP NODE web-2212180648-80q72 1/1 Running 10.8.3.4 gke-migration-tutorial-default-pool-56e3af9a-k6jm web-2212180648-jwj0j 1/1 Running 10.8.2.5 gke-migration-tutorial-default-pool-56e3af9a-0ng4 web-2212180648-pf67q 1/1 Running 10.8.4.4 gke-migration-tutorial-default-pool-56e3af9a-lkrv web-2212180648-pqz73 1/1 Running 10.8.2.6 gke-migration-tutorial-default-pool-56e3af9a-0ng4 web-2212180648-rrd3b 1/1 Running 10.8.4.3 gke-migration-tutorial-default-pool-56e3af9a-lkrv web-2212180648-v3b18 1/1 Running 10.8.1.4 gke-migration-tutorial-default-pool-56e3af9a-p9j4
To migrate these Pods to the new node pool, you must perform the following steps:
Cordon the existing node pool: This operation marks the nodes in the existing node pool (
default-pool
) as unschedulable. Kubernetes stops scheduling new Pods to these nodes once you mark them as unschedulable.Drain the existing node pool: This operation evicts the workloads running on the nodes of the existing node pool (
default-pool
) gracefully.
The above steps cause Pods running in your existing node pool to gracefully
terminate, and Kubernetes reschedules them onto other available nodes. In this
case the only available nodes are the ones in the larger-pool
created in Step
3.
To make sure Kubernetes terminates your applications gracefully, your containers
should handle the SIGTERM signal. This can be used to close
active connections to the clients and commit or abort database transactions in a
clean way. In your Pod manifest, you can use
spec.terminationGracePeriodSeconds
field to specify how long Kubernetes must
wait before killing the containers in the Pod. This defaults to 30 seconds. You
can read more about pod termination in the Kubernetes
documentation.
First, cordon the nodes in the default-pool
. You can run the following command
to get a list of nodes in this node pool:
kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool
Then cordon each node by running a kubectl cordon NODE
command (substitute NODE
with the names from the
previous command). The following command iterates over each node and marks them
unschedulable:
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do kubectl cordon "$node"; doneOutput:
node "gke-migration-tutorial-default-pool-56e3af9a-059q" cordoned node "gke-migration-tutorial-default-pool-56e3af9a-0ng4" cordoned node "gke-migration-tutorial-default-pool-56e3af9a-k6jm" cordoned node "gke-migration-tutorial-default-pool-56e3af9a-lkrv" cordoned node "gke-migration-tutorial-default-pool-56e3af9a-p9j4" cordoned
Now you should see that the default-pool
nodes have SchedulingDisabled
status in the node list:
kubectl get nodesOutput:
NAME STATUS AGE VERSION gke-migration-tutorial-default-pool-56e3af9a-059q Ready,SchedulingDisabled 1h v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-0ng4 Ready,SchedulingDisabled 1h v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-k6jm Ready,SchedulingDisabled 1h v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-lkrv Ready,SchedulingDisabled 1h v1.5.7 gke-migration-tutorial-default-pool-56e3af9a-p9j4 Ready,SchedulingDisabled 1h v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-2rhk Ready 1h v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-4bb2 Ready 1h v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-7fl0 Ready 1h v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q Ready 1h v1.5.7 gke-migration-tutorial-larger-pool-b8ec62a6-hs6p Ready 1h v1.5.7
Next, drain Pods on each node gracefully. To perform the drain, use the kubectl
drain
command which evicts Pods on each node.
You can run kubectl drain --force NODE
by substituting
NODE
with the same list of names passed to the kubectl
cordon
command.
The following shell command iterates each node in default-pool
and drains them
by evicting Pods with an allotted graceful termination period of 10 seconds:
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=default-pool -o=name); do kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node"; done
Once this command completes, you should see that the Pods are now running on the
larger-pool
nodes:
kubectl get pods -o=wideOutput:
NAME READY STATUS IP NODE web-2212180648-3n9hz 1/1 Running 10.8.9.4 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q web-2212180648-88q1c 1/1 Running 10.8.7.4 gke-migration-tutorial-larger-pool-b8ec62a6-2rhk web-2212180648-dlmjc 1/1 Running 10.8.9.3 gke-migration-tutorial-larger-pool-b8ec62a6-cx9q web-2212180648-hcv46 1/1 Running 10.8.5.4 gke-migration-tutorial-larger-pool-b8ec62a6-hs6p web-2212180648-n0nht 1/1 Running 10.8.6.4 gke-migration-tutorial-larger-pool-b8ec62a6-7fl0 web-2212180648-s51jb 1/1 Running 10.8.8.4 gke-migration-tutorial-larger-pool-b8ec62a6-4bb2
Step 5: Delete the old node pool
Once Kubernetes reschedules all Pods in the web
Deployment to the
larger-pool
, it is now safe to delete the default-pool
as it is no longer
necessary.
Run the following command to delete the default-pool
:
gcloud container node-pools delete default-pool --cluster migration-tutorial
Once this operation completes, you should have a single node pool for your
container cluster, which is the larger-pool
:
gcloud container node-pools list --cluster migration-tutorialOutput:
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION larger-pool n1-highmem-2 100 1.5.7
Cleaning up
To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:
Delete the container cluster: This step will delete the resources that make up the container cluster, such as the compute instances, disks and network resources.
gcloud container clusters delete migration-tutorial
What's next
Explore other Kubernetes Engine tutorials.
Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.