Upgrading a GKE cluster running a stateful workload

This tutorial provides you with recommended practices for creating a stateful application and upgrading the Google Kubernetes Engine (GKE) cluster that's running the application. This tutorial provides examples for deploying a Redis application, but the same concepts are applicable to other types of stateful applications deployed on GKE.

Glossary

Terms used in this tutorial:

  • Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker.
  • A Redis Cluster is a distributed implementation of Redis that provides high availability. Redis Clusters consist of leader nodes and follower nodes.
  • A stateful application can remember information about its state each time that it runs. Redis is a popular in-memory database for stateful applications.

Objectives

This tutorial covers the following steps:

  • In GKE, create a Redis Cluster with three leaders and three followers on top of three GKE nodes.
  • Deploy a Redis client application. The application counts the number of requests to a website.
  • Upgrade your cluster with surge upgrade.
  • Test the application's workload disruption and status disruption.

The following diagram shows you an overview of the cluster architecture you create by completing these objectives:

Architecture diagram

Costs

This tutorial uses billable components of Google Cloud, including:

  • GKE

Use the pricing calculator to generate a cost estimate based on your projected usage. New Google Cloud users might be eligible for a free trial.

Before you begin

Take the following steps to enable the Kubernetes Engine API:
  1. Visit the Kubernetes Engine page in the Google Cloud Console.
  2. Create or select a project.
  3. Wait for the API and related services to be enabled. This can take several minutes.
  4. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

Install the following command-line tools used in this tutorial:

  • gcloud is used to create and delete Kubernetes Engine clusters. gcloud is included in the Google Cloud SDK.
  • kubectl is used to manage Kubernetes, the cluster orchestration system used by Kubernetes Engine. You can install kubectl using gcloud:
    gcloud components install kubectl

Creating a GKE cluster

In this section, you create a cluster in GKE with three nodes and verify that the cluster is functioning.

Setting defaults for the gcloud command-line tool

To save time typing your project ID and Compute Engine zone options in the gcloud command-line tool, set the following defaults:

gcloud config set project PROJECT-ID
gcloud config set compute/zone COMPUTE-ZONE

Creating a GKE cluster

To create your GKE cluster, complete the following steps:

  1. Create a cluster named redis-test that has three nodes:

    gcloud container clusters create redis-test \
        --num-nodes=3
    

    Once the cluster is created, you should see output similar to the following example:

    NAME        LOCATION      MASTER_VERSION  MASTER_IP     MACHINE_TYPE   NODE_VERSION    NUM_NODES  STATUS
    redis-test  COMPUTE-ZONE  1.15.12-gke.20  35.232.77.38  n1-standard-1  1.15.12-gke.20  3          RUNNING
    
  2. Configure kubectl to communicate with the cluster:

    gcloud container clusters get-credentials redis-test
    
  3. Verify that your cluster is running:

    kubectl get nodes
    

    You should see output similar to the following example:

    NAME                                        STATUS   ROLES    AGE     VERSION
    gke-redis-test-default-pool-c4e4225c-mw3w   Ready    <none>   2m1s    v1.15.12-gke.20
    gke-redis-test-default-pool-c4e4225c-pv51   Ready    <none>   2m1s    v1.15.12-gke.20
    gke-redis-test-default-pool-c4e4225c-whl5   Ready    <none>   2m1s    v1.15.12-gke.20
    

Creating a Redis Cluster on GKE

In this section you create a Redis Cluster on top of the GKE cluster you created in the previous section by creating a ConfigMap, StatefulSet, and a headless Service.

  1. Clone the sample manifests:

    git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
    cd kubernetes-engine-samples/hello-app-redis/manifests
    
  2. A ConfigMap named redis-configmap.yaml stores the Redis configuration.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: redis-cluster
    data:
      redis.conf:  |+
        cluster-enabled yes
        cluster-node-timeout 15000
        cluster-config-file /data/nodes.conf
        appendonly yes
        protected-mode no
        dir /data
        port 6379
    
    

    To learn more about the Redis parameters in this ConfigMap, see the Redis Cluster configuration parameters section in the Redis Cluster tutorial.

  3. Deploy the ConfigMap by running the following command:

    kubectl apply -f redis-configmap.yaml
    
  4. The Statefulset named redis-cluster.yaml. has the following key fields:

    • The replicas field is set to 6. This is so that every GKE node has two Pods, for the 3 Redis leaders and 3 Redis followers.
    • The volumeClaimTemplates field provides stable storage using PersistentVolumes.
    • The affinity field creates a Pod anti-affinity rule to spread Pods across the Kubernetes nodes.
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: redis
    spec:
      serviceName: "redis-service"
      replicas: 6
      selector:
        matchLabels:
          app: redis
      template:
        metadata:
          labels:
            app: redis
            appCluster: redis-cluster
        spec:
          terminationGracePeriodSeconds: 20
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 100
                podAffinityTerm:
                  labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - redis
                  topologyKey: kubernetes.io/hostname
          containers:
          - name: redis
            image: "redis"
            command:
              - "redis-server"
            args:
              - "/conf/redis.conf"
              - "--protected-mode"
              - "no"
            resources:
              requests:
                cpu: "100m"
                memory: "100Mi"
            ports:
                - name: redis
                  containerPort: 6379
                  protocol: "TCP"
                - name: cluster
                  containerPort: 16379
                  protocol: "TCP"
            volumeMounts:
            - name: conf
              mountPath: /conf
              readOnly: false
            - name: data
              mountPath: /data
              readOnly: false
          volumes:
          - name: conf
            configMap:
              name: redis-cluster
              defaultMode: 0755
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 1Gi
    
    
  5. Deploy the StatefulSet by running the following command:

    kubectl apply -f redis-cluster.yaml
    
  6. The headless Service named redis-service.yaml is for the Redis nodes' connection. To create a headless Service, set the clusterIP field to None.

    apiVersion: v1
    kind: Service
    metadata:
      name: redis-cluster
    spec:
      clusterIP: None
      ports:
      - name: redis-port
        port: 6379
        protocol: TCP
        targetPort: 6379
      selector:
        app: redis
        appCluster: redis-cluster
      sessionAffinity: None
      type: ClusterIP
    
    
  7. Deploy the Service by running the following command:

    kubectl apply -f redis-service.yaml
    
  8. Wait approximately two minutes and verify all the Pods are running by using the following command:

    kubectl get pods
    

    You should see output similar to the following example:

    NAME      READY   STATUS              RESTARTS   AGE
    redis-0   1/1     Running             0          2m29s
    redis-1   1/1     Running             0          2m8s
    redis-2   1/1     Running             0          107s
    redis-3   1/1     Running             0          85s
    redis-4   1/1     Running             0          54s
    redis-5   1/1     Running             0          23s
    
  9. Verify the persistent volumes were created by running the following command:

    kubectl get pv
    

    You should see output similar to the following example:

    NAME       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                  STORAGECLASS   REASON   AGE
    pvc-HASH   1Gi        RWO            Delete           Bound    default/data-redis-5   standard                75s
    pvc-HASH   1Gi        RWO            Delete           Bound    default/data-redis-1   standard                2m59s
    pvc-HASH   1Gi        RWO            Delete           Bound    default/data-redis-3   standard                2m16s
    pvc-HASH   1Gi        RWO            Delete           Bound    default/data-redis-2   standard                2m38s
    pvc-HASH   1Gi        RWO            Delete           Bound    default/data-redis-0   standard                3m20s
    pvc-HASH   1Gi        RWO            Delete           Bound    default/data-redis-4   standard                104s
    

    In this output, HASH represents a hash which is attached to each persistent volume name.

Assigning roles to your Redis Cluster

Once the configuration is complete, you assign roles to your Redis Cluster.

To assign the roles, complete the following steps:

  1. Set the first three Redis nodes as leaders and last three Redis nodes as followers:

    1. Retrieve and make a copy of your Pod IP addresses:

      kubectl get pods -l app=redis -o jsonpath='{range.items[*]}{.status.podIP} '
      
    2. Assign the leader and follower roles by pasting each of the Pod IP addresses into the following command and type yes when prompted:

      kubectl exec -it redis-0 -- redis-cli --cluster create --cluster-replicas 1 \
      POD-IP-1:6379 POD-IP-2:6379 POD-IP-3:6379 \
      POD-IP-4:6379 POD-IP-5:6379 POD-IP-6:6379
      
  2. Verify the Redis Cluster is running:

    kubectl exec -it redis-0 -- redis-cli cluster info
    

    You should see output similar to the following example:

    cluster_state:ok
    # ...other output...
    
  3. Log in to a Redis node to check its role. For example, to verify that that redis-0 has a leader role, run the following command:

    kubectl exec -it redis-0 -- redis-cli role
    

    You should see output similar to the following example:

    1) "master"
    2) (integer) 574
    3) 1) 1) "10.28.2.3"
           2) "6379"
           3) "574"
    

Creating a Redis client application

In this section, you create an application named hello-app-redis that uses Redis as a cache database, counts the number of requests it receives, and prints out the number on a website. If the Redis Service works, the number keeps increasing.

You can download the image directly from gcr.io/google-samples/hello-app-redis:1.0 in the Google Cloud Console.

To pull the image, run the following command:

docker pull gcr.io/google-samples/hello-app-redis:1.0

To learn more about building images, see Build the container image.

Deploying the Redis client application to GKE

To deploy your application to the GKE cluster you created, you need a Deployment to define your application.

To create the Deployment, complete the following steps:

  1. The file named app-deployment.yaml contains the details for the application.

    # Copyright 2020 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: hello-web
      name: hello-web
      namespace: default
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: hello-web
      template:
        metadata:
          labels:
            app: hello-web
        spec:
          # Pod anti affinity config START
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - hello-web
                topologyKey: kubernetes.io/hostname
          # Pod anti affinity config END
          containers:
          - image: gcr.io/google-samples/hello-app-redis:1.0  # change to the image name you built
            name: hello-app
            # Readiness probe config START
            readinessProbe:
              failureThreshold: 1
              httpGet:
                path: /healthz
                port: 8080
                scheme: HTTP
              initialDelaySeconds: 1
              periodSeconds: 1
              successThreshold: 1
              timeoutSeconds: 1
    

    To learn more about the probes and Pod affinity rules used in this Deployment, see GKE best practices: Designing and building highly available clusters.

  2. Apply the Deployment by running the following command:

    kubectl apply -f app-deployment.yaml
    

    Make sure you run this command in the same directory app-deployment.yaml is in.

  3. Expose the application through a load balancer:

    kubectl expose deployment hello-web \
        --type=LoadBalancer \
        --port 80 \
        --target-port 8080
    
  4. Wait approximately one minute and retrieve the application's external IP address by running the following command:

    kubectl get service
    

    From the output, copy the value listed in hello-web's EXTERNAL-IP column:

    NAME             TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)              AGE
    hello-web        LoadBalancer   10.13.10.55   EXTERNAL_IP   80:30703/TCP         166m
    
  5. Verify the application is working by pasting the EXTERNAL_IP into your web browser. You should see output similar to the following example:

    I have been hit [1] times since deployment!
    

    Take note of the visit number. You need to use it in the Testing the application's disruption section.

  6. Set a variable for the EXTERNAL_IP you just copied. You use this value when you create scripts to test your application in the next section:

    export IP=EXTERNAL_IP
    

Upgrading the GKE cluster and testing workload disruption

In the following sections, you upgrade your GKE cluster and observe what happens by using scripts you create.

Testing your application

In this section you use two scripts, one that sends requests to your application, and one that measures the success rate of the requests. You use these scripts to measure what happens when you upgrade your cluster.

To create the scripts:

  1. Change to the directory containing the scripts:

    cd
    cd kubernetes-engine-samples/hello-app-redis/scripts
    
  2. The script named generate_load.sh sends a queries per second (QPS) request to your application. This script saves the HTTP response code into the current directory to a file named output. The value of output is used in the script you create in the next step.

    # Copyright 2020 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    #!/bin/bash
    # Usage: generate_load.sh <IP> <QPS>_
    
    IP=$1
    QPS=$2
    
    while true
      do for N in $(seq 1 $QPS)
        do curl -I -m 5 -s -w "%{http_code}\n" -o /dev/null http://${IP}/ >> output &
        done
      sleep 1
    done
    
  3. A second script named print_error_rate.sh calculates the success rate based on the output generated by generate_load.sh.

    # Copyright 2020 Google LLC
    #
    # Licensed under the Apache License, Version 2.0 (the "License");
    # you may not use this file except in compliance with the License.
    # You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    #!/bin/bash
    # Usage: watch ./print_error_rate.sh
    
    TOTAL=$(cat output | wc -l);
    SUCCESS=$(grep "200" output |  wc -l);
    ERROR1=$(grep "000" output |  wc -l)
    ERROR2=$(grep "503" output |  wc -l)
    ERROR3=$(grep "500" output |  wc -l)
    SUCCESS_RATE=$(($SUCCESS * 100 / TOTAL))
    ERROR_RATE=$(($ERROR1 * 100 / TOTAL))
    ERROR_RATE_2=$(($ERROR2 * 100 / TOTAL))
    ERROR_RATE_3=$(($ERROR3 * 100 / TOTAL))
    echo "Success rate: $SUCCESS/$TOTAL (${SUCCESS_RATE}%)"
    echo "App network Error rate: $ERROR1/$TOTAL (${ERROR_RATE}%)"
    echo "Resource Error rate: $ERROR2/$TOTAL (${ERROR_RATE_2}%)"
    echo "Redis Error rate: $ERROR3/$TOTAL (${ERROR_RATE_3}%)"
    
  4. Give yourself permission to run the scripts:

    chmod u+x generate_load.sh print_error_rate.sh
    
  5. Set a variable for the number of QPS. This value is used in the generate_load.sh script as is the variable you set for the EXTERNAL_IP. We recommend you set a value of 40.

    export QPS=40
    
  6. Run the generate_load.sh script to start sending QPS:

    ./generate_load.sh $IP $QPS 2>&1
    
  7. Leave the generate_load.sh script running and open a new terminal. In the new terminal, run the print_error_rate.sh script to check the error rate:

    watch ./print_error_rate.sh
    

    You should see a 100% success rate and 0% error rates as the QPS are made.

  8. Leave both scripts running and open a third terminal in preparation for the next section.

Upgrading your cluster

In this section, you upgrade your workloads:

  1. In the terminal you just opened, define the settings for the upgrade using surge upgrade:

    gcloud container node-pools update default-pool \
      --max-surge-upgrade=1 \
      --max-unavailable-upgrade=0 \
      --cluster=redis-test
    

    With this configuration (maxSurge=1 and maxUnavailable=0), only one surge node can be added to the node pool during an upgrade, so only one node can be upgraded at a time. This setting speeds up Pod restarts during upgrades while progressing conservatively.

  2. Determine which GKE version the redis-test cluster is using:

    V=$(gcloud container clusters describe redis-test | grep "version:" | sed "s/version: //")
    echo $V
    

    You should see output similar to the following example: 1.15.12-gke.20.

  3. Retrieve a list of available Kubernetes versions:

    gcloud container get-server-config
    
  4. In the list of versions, locate the validMasterVersions: section and look for the redis-cluster version you retrieved in the previous step. To avoid version skew, copy the version from the list that is immediately above the redis-cluster version.

  5. Upgrade the cluster's control plane to the version you selected and type y when prompted:

    gcloud container clusters upgrade redis-test \
        --master \
        --cluster-version VERSION
    

    Replace VERSION with the version you selected from the list in the previous step.

    The control plane upgrade takes several minutes.

  6. Upgrade the cluster's nodes to the version you selected and type y when prompted:

    gcloud container clusters upgrade redis-test \
        --cluster-version=VERSION \
        --node-pool=default-pool
    

    Replace VERSION with the version you selected from the list.

Testing workload disruption

In this section, you test how your application's status and workload disruption.

  1. Return to the terminal window running ./print_error_rate.sh and observe how the success rate changed during the upgrade. You should notice a slight decrease in the success rate and a slight increase in the app network error rate as the nodes are taken down to be upgraded.

    In the Success rate field, you'll see how many visits were successfully made to the website. Take a note of this value.

  2. Stop both scripts from running by entering CTRL+C in the relevant terminals.

  3. Return to the website for your application by entering its IP address (this is the EXTERNAL_IP you copied during the Deploying to GKE section) into your browser.

  4. Observe the visit number for your application. The number you see should equal:

    ORIGINAL_VISIT_NUMBER + SUCCESSFUL_VISIT_NUMBER

    where ORIGINAL_VISIT_NUMBER is the number you recorded in the final step of Deploying to GKE and SUCCESSFUL_VISIT_NUMBER is the value you recorded in the first step of this section.

Cleaning up

After you've finished the Upgrading a stateful workload tutorial, you can clean up the resources that you created on Google Cloud so they won't take up quota and you won't be billed for them in the future. The following sections describe how to delete or turn off these resources.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Cloud Console, go to the Manage resources page.

    Go to the Manage resources page

  2. In the project list, select the project that you want to delete and then click Delete .
  3. In the dialog, type the project ID and then click Shut down to delete the project.

Deleting clusters

To delete the cluster you created for this tutorial, run the following command:

gcloud container clusters delete redis-test

What's next

  • Try out other Google Cloud features for yourself. Have a look at our tutorials.