Upgrading Anthos Service Mesh on GKE

This guide explains how to upgrade Anthos Service Mesh from version 1.4.5+ to version 1.5.4 on Google Kubernetes Engine.

Redeploying the Anthos Service Mesh control plane components takes about 5 to 10 minutes to complete. Additionally, you need to inject new sidecar proxies in all of your workloads so they are updated with the current Anthos Service Mesh version. The time it takes to update the sidecar proxies depends on many factors, such as the number of pods, the number of nodes, deployment scaling settings, pod disruption budgets, and other configuration settings. A rough estimate of the time that it takes to update the sidecar proxies is 100 pods per minute.

Overview of the upgrade

This section outlines the steps that you take to upgrade Anthos Service Mesh.

Prepare

  1. Review the Supported features and this upgrade guide to become familiar with the features and the upgrade process.

  2. If you enabled optional features when you installed the previous version of Anthos Service Mesh by adding --set values flags to the istioctl apply command line, you need to use the same flags when you run istioctl apply to install 1.5.4.

  3. If you enabled optional features when you installed the previous version Anthos Service Mesh by adding the -f flag to the istioctl apply command line to specify a YAML file, you need to convert the YAML from the IstioControlPlane API to the IstioOperator API, and specify the updated YAML when you run istioctl apply to install 1.5.4.

  4. If you are upgrading Anthos Service Mesh on a private cluster, you must add a firewall rule to open port 15017 if you want to use automatic sidecar injection. If you don't add the firewall rule and automatic sidecar injection is enabled, you get an error when you deploy workloads. For details on adding a firewall rule, see Adding firewall rules for specific use cases.

  5. Schedule a downtime. Upgrading can take up to 1 hour, depending on the scale of the cluster.

Upgrade

  1. Follow the steps in this guide to prepare for installing Anthos Service Mesh.

  2. Upgrade Anthos Service Mesh.

  3. Validate the installation.

  4. Update sidecar proxies

  5. Test your application to verify that the workloads are working correctly.

Pruning 1.4 resources

If you installed Anthos Service Mesh 1.4 using the alpha version of the Anthos CLI, you might need to prune the 1.4 resources before upgrading to version 1.5.4.

Check if Anthos Service Mesh resources have operator.istio.io/component labels.

kubectl get all -n istio-system --selector operator.istio.io/component

If the command returns No resources found in istio-system namespace, use the following commands to prune the Anthos Service Mesh 1.4 resources.

kubectl delete deploy -n istio-system promsd
kubectl delete all -n istio-system --selector 'app in (galley, istio-ingressgateway, istio-nodeagent, sidecarInjectorWebhook, promsd, pilot)'

Setting project and cluster defaults

  1. Get the project ID of the project that the cluster was created in:

    gcloud

    gcloud projects list

    Console

    1. In the Cloud Console, go to the Dashboard page:

      Go to the Dashboard page

    2. Click the Select from drop-down list at the top of the page. In the Select from window that appears, select your project. The project ID is displayed on the project Dashboard Project info card.

  2. Create an environment variable for the project ID:

    export PROJECT_ID=YOUR_PROJECT_ID
  3. Set the default project ID for the gcloud command-line tool:

    gcloud config set project ${PROJECT_ID}
    
  4. Create the following environment variables:

    • Set the cluster name:

      export CLUSTER_NAME=YOUR_CLUSTER_NAME
    • Set the CLUSTER_LOCATION to either your cluster zone or cluster region:

      export CLUSTER_LOCATION=YOUR_ZONE_OR_REGION
  5. Set the default zone or region for the gcloud command-line tool.

    • If you have a single-zone cluster, set the default zone:

      gcloud config set compute/zone ${CLUSTER_LOCATION}
    • If you have a regional cluster, set the default region:

      gcloud config set compute/region ${CLUSTER_LOCATION}

Setting credentials and permissions

  1. Get authentication credentials to interact with the cluster:
    gcloud container clusters get-credentials ${CLUSTER_NAME}
  2. Grant cluster admin permissions to the current user. You need these permissions to create the necessary role based access control (RBAC) rules for Anthos Service Mesh:
    kubectl create clusterrolebinding cluster-admin-binding \
      --clusterrole=cluster-admin \
      --user="$(gcloud config get-value core/account)"

    If you see the "cluster-admin-binding" already exists error, you can safely ignore it and continue with the existing cluster-admin-binding.

Downloading the installation file

    Linux

  1. Download the Anthos Service Mesh installation file to your current working directory:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.5.4-asm.2-linux.tar.gz
  2. Download the signature file and use openssl to verify the signature:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.5.4-asm.2-linux.tar.gz.1.sig
    openssl dgst -verify - -signature istio-1.5.4-asm.2-linux.tar.gz.1.sig istio-1.5.4-asm.2-linux.tar.gz <<'EOF'
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
    wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
    -----END PUBLIC KEY-----
    EOF

    The expected output is: Verified OK

  3. Mac OS

  4. Download the Anthos Service Mesh installation file to your current working directory:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.5.4-asm.2-osx.tar.gz
  5. Download the signature file and use openssl to verify the signature:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.5.4-asm.2-osx.tar.gz.1.sig
    openssl dgst -sha256 -verify /dev/stdin -signature istio-1.5.4-asm.2-osx.tar.gz.1.sig istio-1.5.4-asm.2-osx.tar.gz <<'EOF'
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
    wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
    -----END PUBLIC KEY-----
    EOF

    The expected output is: Verified OK

  6. Windows

  7. Download the Anthos Service Mesh installation file to your current working directory:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.5.4-asm.2-win.zip
  8. Download the signature file and use openssl to verify the signature:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.5.4-asm.2-win.zip.1.sig
    openssl dgst -verify - -signature istio-1.5.4-asm.2-win.zip.1.sig istio-1.5.4-asm.2-win.zip <<'EOF'
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
    wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
    -----END PUBLIC KEY-----
    EOF

    The expected output is: Verified OK

  9. Extract the contents of the file to any location on your file system. For example, to extract the contents to the current working directory:
    tar xzf istio-1.5.4-asm.2-linux.tar.gz

    The command creates an installation directory in your current working directory named istio-1.5.4-asm.2 that contains:

    • Sample applications in samples
    • The following tools in the bin directory:
      • istioctl: You use istioctl to install Anthos Service Mesh.
      • asmctl: You use asmctl to help validate your security configuration after installing Anthos Service Mesh. (Currently, asmctl isn't supported on GKE on-prem.)

  10. Ensure that you're in the Anthos Service Mesh installation's root directory.
    cd istio-1.5.4-asm.2
  11. For convenience, add the tools in the /bin directory to your PATH:
    export PATH=$PWD/bin:$PATH

Preparing resource configuration files

When you run the istioctl apply command to upgrade Anthos Service Mesh, you specify -f istio-operator.yaml on the command line. This file contains information about your project and cluster that is needed to enable the Mesh telemetry and Mesh security features. You need to download the istio-operator.yaml and other resource configuration files and set the project and cluster information.

To prepare the resource configuration files:

  1. If you haven't already, install kpt:

    gcloud components install kpt
    
  2. Optionally, create a new directory for the Anthos Service Mesh package resource configuration files. If you plan to set up more than one cluster, you might want to use the cluster name as the directory name.

  3. Change to the directory where you want to download the Anthos Service Mesh package.

  4. Download the Anthos Service Mesh package to the current working directory:

    kpt pkg get \
    https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@release-1.5-asm .
    

  5. Set the cluster name:

      kpt cfg set asm gcloud.container.cluster ${CLUSTER_NAME}

  6. Optionally, customize the resource configuration files by using the kpt setters. By default, these setters use the defaults for gcloud config. If you set the gcloud config defaults, or if you want to change the values, run the following setters:

    • Set the project ID:

      kpt cfg set asm gcloud.core.project ${PROJECT_ID}
    • Set the default zone or region:

      kpt cfg set asm gcloud.compute.location ${CLUSTER_LOCATION}
  7. Optionally, you can check in the resource configuration files to your own source control system, such as Cloud Source Repositories, so that you can track changes to the files.

Upgrading Anthos Service Mesh

This section explains how to upgrade Anthos Service Mesh and enable:

  • The Supported default features listed on the Supported features page.
  • Anthos Service Mesh certificate authority (Mesh CA).
  • The telemetry data pipeline that powers the Anthos Service Mesh dashboards in the Google Cloud Console.

For information on enabling the Supported optional features, see Enabling optional features.

To install Anthos Service Mesh:

Choose one of the following commands to configure Anthos Service Mesh in PERMISSIVE mutual TLS (mTLS) authentication mode or STRICT mTLS mode.

PERMISSIVE mTLS

istioctl manifest apply --set profile=asm \
  -f asm/cluster/istio-operator.yaml

STRICT mTLS

istioctl manifest apply --set profile=asm \
  -f asm/cluster/istio-operator.yaml \
  --set values.global.mtls.enabled=true

Check the control plane components

Upgrading requires reinstalling the control plane components, which takes about 5 to 10 minutes to complete. The old control plane components are terminated and then deleted as the new components are installed. You can check the progress by looking at the value in the AGE column of the workloads.

kubectl get pod -n istio-system

Example output:

NAME                                     READY   STATUS        RESTARTS   AGE
istio-ingressgateway-5bfdf7c586-v6wxx    2/2     Terminating   0          25m
istio-ingressgateway-7b598c5557-b88md    2/2     Running       0          5m44s
istiod-78cdbbbdb-d7tps                   1/1     Running       0          5m16s
promsd-576b8db4d6-lqf64                  2/2     Running       1          5m26s

In this example, there are two instances of istio-ingressgateway. The instance with 25min the AGE column is being terminated. All the other components are newly installed.

Validating the installation

We recommend that you use the asmctl analysis tool to validate the basic configuration of your project, cluster, and workloads. If an asmctl test fails, asmctl recommends solutions, if possible. The asmctl validate command runs basic tests that check:

  1. That the APIs required by Anthos Service Mesh are enabled on the project.
  2. That the Istio-Ingressgateway is properly configured to call Mesh CA.
  3. The general health of Istiod and Istio-Ingressgateway.

If you run the asmctl validate command with the optional --with-testing-workloads flag, in addition to the basic tests, asmctl runs security tests that check:

  1. Mutual TLS (mTLS) communication is configured properly.
  2. Mesh CA can issue certificates.

To run the security tests, asmctl deploys workloads on your cluster in a test namespace, runs the mTLS communication tests, outputs the results, and deletes the test namespace.

To run asmctl:

  1. Ensure that gcloud application-default credentials are set:

     gcloud auth application-default login
    
  2. If you haven't already, get authentication credentials to interact with the cluster:

     gcloud container clusters get-credentials ${CLUSTER_NAME}
    
  3. To run both the basic and security tests (assuming istio-1.5.4-asm.2/bin) is in yourPATH):

    asmctl validate --with-testing-workloads
    

    On success, the command responds with output similar to the following:

    [asmctl version 0.3.0]
    Using Kubernetes context: example-project_us-central1-example-cluster
    To change the context, use the --context flag
    Validating enabled APIs
    OK
    Validating ingressgateway configuration
    OK
    Validating istio system
    OK
    Validating sample traffic
    Launching example services...
    Sent traffic to example service http code: 200
    verified mTLS configuration
    OK
    Validating issued certs
    OK
    

Updating sidecar proxies

Any workloads that were running on your cluster before you upgraded Anthos Service Mesh need to have the sidecar proxy injected or updated so they have the current Anthos Service Mesh version.

With automatic sidecar injection, you can update the sidecars for existing pods with a pod restart. How you restart pods depends on if they were created as part of a Deployment.

  1. If you used a Deployment, restart the Deployment, which restarts all Pods with sidecars:

    kubectl rollout restart YOUR_DEPLOYMENT -n YOUR_NAMESPACE

    If you didn't use a Deployment, delete the Pods, and they are automatically recreated with sidecars:

    kubectl delete pod -n YOUR_NAMESPACE --all
  2. Check that all the Pods in the namespace have sidecars injected:

    kubectl get pod -n YOUR_NAMESPACE --all

    In the following example output from the previous command, notice that the READY column indicates there are two containers for each of your workloads: the primary container and the container for the sidecar proxy.

    NAME                    READY   STATUS    RESTARTS   AGE
    YOUR_WORKLOAD           2/2     Running   0          20s
    ...