Upgrading Anthos Service Mesh on GKE

This guide explains how to upgrade Anthos Service Mesh from version 1.4.5+ to version 1.4.10 on Google Kubernetes Engine. If you want to upgrade to Anthos Service Mesh 1.5, see the 1.5 version of Upgrading Anthos Service Mesh on GKE.

Redeploying the Anthos Service Mesh control plane components takes about 5 to 10 minutes to complete. Additionally, you need to inject new sidecar proxies in all of your workloads so they are updated with the current Anthos Service Mesh version. The time it takes to update the sidecar proxies depends on many factors, such as the number of pods, the number of nodes, deployment scaling settings, pod disruption budgets, and other configuration settings. A rough estimate of the time that it takes to update the sidecar proxies is 100 pods per minute.

Preparing for the upgrade

This section outlines the steps that you take to upgrade Anthos Service Mesh.

  1. Review the Supported features and this upgrade guide to become familiar with the features and the upgrade process.

  2. Review your authorization policies to see if they need to be updated.

  3. If you enabled optional features when you installed the previous version of Anthos Service Mesh by adding --set values flags to the istioctl apply command line, you need to use the same flags when you run istioctl apply to upgrade to 1.4.10.

  4. If you enabled optional features when you installed the previous version of Anthos Service Mesh by adding the -f flag to the istioctl apply command line to specify a YAML file, you must specify the same file (or a file with the same content) when you run istioctl apply to upgrade to 1.4.10.

  5. Schedule a downtime. Upgrading can take up to 1 hour, depending on the scale of the cluster. Note that this doesn't include the time that you need to redeploy workloads to update sidecar proxies.

Setting project and cluster defaults

  1. Get the project ID of the project that the cluster was created in:

    gcloud

    gcloud projects list

    Console

    1. In the Google Cloud console, go to the Dashboard page:

      Go to the Dashboard page

    2. Click the Select from drop-down list at the top of the page. In the Select from window that appears, select your project. The project ID is displayed on the project Dashboard Project info card.

  2. Create an environment variable for the project ID:

    export PROJECT_ID=YOUR_PROJECT_ID
  3. Create an environment variable for the project number:

    export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)")
  4. Set the default project ID for the Google Cloud CLI:

    gcloud config set project ${PROJECT_ID}
    
  5. Create the following environment variables:

    • Set the cluster name:

      export CLUSTER_NAME=YOUR_CLUSTER_NAME
    • Set the CLUSTER_LOCATION to either your cluster zone or cluster region:

      export CLUSTER_LOCATION=YOUR_ZONE_OR_REGION
    • Set the workload pool.

      export WORKLOAD_POOL=${PROJECT_ID}.svc.id.goog
    • Set the mesh ID.

      export MESH_ID="proj-${PROJECT_NUMBER}"
  6. Set the default zone or region for the Google Cloud CLI.

    • If you have a single-zone cluster, set the default zone:

      gcloud config set compute/zone ${CLUSTER_LOCATION}
    • If you have a regional cluster, set the default region:

      gcloud config set compute/region ${CLUSTER_LOCATION}

Setting credentials and permissions

  1. Get authentication credentials to interact with the cluster:
    gcloud container clusters get-credentials ${CLUSTER_NAME}
  2. Grant cluster admin permissions to the current user. You need these permissions to create the necessary role based access control (RBAC) rules for Anthos Service Mesh:
    kubectl create clusterrolebinding cluster-admin-binding \
      --clusterrole=cluster-admin \
      --user="$(gcloud config get-value core/account)"

    If you see the "cluster-admin-binding" already exists error, you can safely ignore it and continue with the existing cluster-admin-binding.

Downloading the installation file

    Linux

  1. Download the Anthos Service Mesh installation file to your current working directory:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.4.10-asm.18-linux.tar.gz
  2. Download the signature file and use openssl to verify the signature:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.4.10-asm.18-linux.tar.gz.1.sig
    openssl dgst -verify - -signature istio-1.4.10-asm.18-linux.tar.gz.1.sig istio-1.4.10-asm.18-linux.tar.gz <<'EOF'
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
    wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
    -----END PUBLIC KEY-----
    EOF

    The expected output is: Verified OK

  3. Mac OS

  4. Download the Anthos Service Mesh installation file to your current working directory:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.4.10-asm.18-osx.tar.gz
  5. Download the signature file and use openssl to verify the signature:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.4.10-asm.18-osx.tar.gz.1.sig
    openssl dgst -sha256 -verify /dev/stdin -signature istio-1.4.10-asm.18-osx.tar.gz.1.sig istio-1.4.10-asm.18-osx.tar.gz <<'EOF'
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
    wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
    -----END PUBLIC KEY-----
    EOF

    The expected output is: Verified OK

  6. Windows

  7. Download the Anthos Service Mesh installation file to your current working directory:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.4.10-asm.18-win.zip
  8. Download the signature file and use openssl to verify the signature:
    curl -LO https://storage.googleapis.com/gke-release/asm/istio-1.4.10-asm.18-win.zip.1.sig
    openssl dgst -verify - -signature istio-1.4.10-asm.18-win.zip.1.sig istio-1.4.10-asm.18-win.zip <<'EOF'
    -----BEGIN PUBLIC KEY-----
    MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWZrGCUaJJr1H8a36sG4UUoXvlXvZ
    wQfk16sxprI2gOJ2vFFggdq3ixF2h4qNBt0kI7ciDhgpwS8t+/960IsIgw==
    -----END PUBLIC KEY-----
    EOF

    The expected output is: Verified OK

  9. Extract the contents of the file to any location on your file system. For example, to extract the contents to the current working directory:
    tar xzf istio-1.4.10-asm.18-linux.tar.gz

    The command creates an installation directory in your current working directory named istio-1.4.10-asm.18 that contains:

    • Sample applications in samples
    • The following tools in the bin directory:
      • istioctl: You use istioctl to install Anthos Service Mesh.
      • asmctl: You use asmctl to help validate your security configuration after installing Anthos Service Mesh. (Currently, asmctl isn't supported on GKE on VMware.)

  10. Ensure that you're in the Anthos Service Mesh installation's root directory.
    cd istio-1.4.10-asm.18
  11. For convenience, add the tools in the /bin directory to your PATH:
    export PATH=$PWD/bin:$PATH

Upgrading Anthos Service Mesh

This section explains how to upgrade Anthos Service Mesh and enable:

  • The Supported default features listed on the Supported features page.
  • Anthos Service Mesh certificate authority (Mesh CA).
  • The telemetry data pipeline that powers the Anthos Service Mesh dashboard in the Google Cloud console.

For information on enabling the Supported optional features, see Enabling optional features.

To upgrade Anthos Service Mesh:

Choose one of the following commands to configure Anthos Service Mesh in PERMISSIVE mutual TLS (mTLS) authentication mode or STRICT mTLS mode.

PERMISSIVE mTLS

istioctl manifest apply --set profile=asm \
  --set values.global.trustDomain=${WORKLOAD_POOL} \
  --set values.global.sds.token.aud=${WORKLOAD_POOL} \
  --set values.nodeagent.env.GKE_CLUSTER_URL=https://container.googleapis.com/v1/projects/${PROJECT_ID}/locations/${CLUSTER_LOCATION}/clusters/${CLUSTER_NAME} \
  --set values.global.meshID=${MESH_ID} \
  --set values.global.proxy.env.GCP_METADATA="${PROJECT_ID}|${PROJECT_NUMBER}|${CLUSTER_NAME}|${CLUSTER_LOCATION}"

STRICT mTLS

istioctl manifest apply --set profile=asm \
  --set values.global.trustDomain=${WORKLOAD_POOL} \
  --set values.global.sds.token.aud=${WORKLOAD_POOL} \
  --set values.nodeagent.env.GKE_CLUSTER_URL=https://container.googleapis.com/v1/projects/${PROJECT_ID}/locations/${CLUSTER_LOCATION}/clusters/${CLUSTER_NAME} \
  --set values.global.meshID=${MESH_ID} \
  --set values.global.proxy.env.GCP_METADATA="${PROJECT_ID}|${PROJECT_NUMBER}|${CLUSTER_NAME}|${CLUSTER_LOCATION}" \
  --set values.global.mtls.enabled=true

Check the control plane components

Upgrading requires reinstalling the control plane components, which takes about 5 to 10 minutes to complete. The old control plane components are terminated and then deleted as the new components are installed. You can check the progress by looking at the value in the AGE column of the workloads.

kubectl get pod -n istio-system
NAME                                     READY   STATUS        RESTARTS   AGE
istio-galley-76d684bf9-jwz65             2/2     Running       0          5m36s
istio-ingressgateway-5bfdf7c586-v6wxx    2/2     Terminating   0          25m
istio-ingressgateway-7b598c5557-b88md    2/2     Running       0          5m44s
istio-nodeagent-bnjg7                    1/1     Running       0          5m16s
istio-nodeagent-cps2j                    1/1     Running       0          5m10s
istio-nodeagent-f4x46                    1/1     Running       0          5m26s
istio-nodeagent-jbl5x                    1/1     Running       0          5m38s
istio-pilot-5dc4bc4dbf-ds5dh             2/2     Running       0          5m30s
istio-pilot-74665549c5-7r6qh             2/2     Terminating   0          25m
istio-sidecar-injector-7ddff4b99-b76l7   1/1     Running       0          5m36s
promsd-6d4d5b7c5c-dgnd7                  2/2     Running       0          5m30s

In this example, there are two instances of istio-ingressgateway and istio-pilot. The instances with 25min the AGE column are being terminated. All the other components are newly installed.

Validating the upgrade

We recommend that you use the asmctl analysis tool to validate the basic configuration of your project, cluster, and workloads. If an asmctl test fails, asmctl recommends solutions, if possible. The asmctl validate command runs basic tests that check:

  1. That the APIs required by Anthos Service Mesh are enabled on the project.
  2. That the Istio-Ingressgateway is properly configured to call Mesh CA.
  3. The general health of Istiod and Istio-Ingressgateway.

If you run the asmctl validate command with the optional --with-testing-workloads flag, in addition to the basic tests, asmctl runs security tests that check:

  1. Mutual TLS (mTLS) communication is configured properly.
  2. Mesh CA can issue certificates.

To run the security tests, asmctl deploys workloads on your cluster in a test namespace, runs the mTLS communication tests, outputs the results, and deletes the test namespace.

To run asmctl:

  1. Ensure that gcloud application-default credentials are set:

     gcloud auth application-default login
    
  2. If you haven't already, get authentication credentials to interact with the cluster:

     gcloud container clusters get-credentials ${CLUSTER_NAME}
    
  3. To run both the basic and security tests (assuming istio-1.4.10-asm.18/bin) is in yourPATH):

    asmctl validate --with-testing-workloads
    

    On success, the command responds with output similar to the following:

    [asmctl version 0.3.0]
    Using Kubernetes context: example-project_us-central1-example-cluster
    To change the context, use the --context flag
    Validating enabled APIs
    OK
    Validating ingressgateway configuration
    OK
    Validating istio system
    OK
    Validating sample traffic
    Launching example services...
    Sent traffic to example service http code: 200
    verified mTLS configuration
    OK
    Validating issued certs
    OK
    

Updating sidecar proxies

Any workloads that were running on your cluster before you upgraded Anthos Service Mesh need to have the sidecar proxy injected or updated so they have the current Anthos Service Mesh version.

With automatic sidecar injection, you can update the sidecars for existing pods with a pod restart. How you restart pods depends on if they were created as part of a Deployment.

  1. If you used a Deployment, restart the Deployment, which restarts all Pods with sidecars:

    kubectl rollout restart YOUR_DEPLOYMENT -n YOUR_NAMESPACE

    If you didn't use a Deployment, delete the Pods, and they are automatically recreated with sidecars:

    kubectl delete pod -n YOUR_NAMESPACE --all
  2. Check that all the Pods in the namespace have sidecars injected:

    kubectl get pod -n YOUR_NAMESPACE --all

    In the following example output from the previous command, notice that the READY column indicates there are two containers for each of your workloads: the primary container and the container for the sidecar proxy.

    NAME                    READY   STATUS    RESTARTS   AGE
    YOUR_WORKLOAD           2/2     Running   0          20s
    ...
    

Updating your authorization policies

If you are upgrading from open source Istio 1.4.x or from an earlier version of Anthos Service Mesh and have existing authorization policies, you might need to update them. See Updating your authorization policies for more information.