Upgrading Anthos Service Mesh to the latest version

This page explains how to run the script to upgrade Anthos Service Mesh from 1.7.3+ or a 1.8 patch release to version 1.8.6 on a GKE cluster for a mesh containing one or more clusters that are in the same Google Cloud project.

Anthos Service Mesh installations 1.7 and higher support skip-version upgrades. To upgrade directly to the 1.10, see Upgrading Anthos Service Mesh to the latest version

This page is for upgrading Anthos Service Mesh. For information on running the script for new installations and migrations from Istio, see the following:

Before you begin

Before you begin the upgrade, make sure that you have:

The script requires that you have the required permissions, or that you include either the --enable_all or --enable_gcp_iam_roles flags to allow the script to enable the permission for you. Similarly, to allow the script to enable the required APIs and update your cluster, specify the --enable_all flag or the more granular enablement flags.

Preparing for the upgrade

If you customized the previous installation, you need the same customizations when you upgrade to a new Anthos Service Mesh version or migrate from Istio. If you customized the installation by adding the --set values flag to istioctl install, you must add those settings to an IstioOperator YAML file, referred to as an overlay file. You specify the overlay file by using the --custom_overlay option with the filename when you run the script. The script passes the overlay file to istioctl install.

The script follows the revision upgrade process (referred to as "canary" upgrades in the Istio documentation). With a revision-based upgrade, the script installs a new revision of the control plane alongside the existing control plane. When installing the new version, the script includes a revision label that identifies the new control plane.

You then migrate to the new version by setting the same revision label on your workloads and performing a rolling restart to re-inject the proxies so that they use the new Anthos Service Mesh version and configuration. With this approach, you can monitor the effect of the upgrade on a small percentage of your workloads. After testing your application, you can migrate all traffic to the new version. This approach is much safer than doing an in-place upgrade where new control plane components replace the previous version.

Upgrading Anthos Service Mesh

  1. Set the options and specify the flags to run the script. You always include the following options: project_id, cluster_name, cluster_location, and mode.

    The following section provides typical examples for running the script. See the navigation bar on the right for a list of the examples. For a complete description of the script's arguments, see Option and flags.

  2. To complete setting up Anthos Service Mesh, you need to enable automatic sidecar injection and deploy or redeploy workloads.

Examples

This section shows examples of running the script for upgrades and some additional arguments that you might find useful. See the navigation bar on the right for a list of the examples.

Only validate

The following example shows running the script with the --only_validate option. With this option, the script doesn't make any changes to your project or cluster, and it doesn't install Anthos Service Mesh. When you specify --only_validate,the script fails if you include any of the --enable_* flags.

The script validates that:

  • Your environment has the required tools.
  • You have the required permission on the specified project.
  • The cluster meets the minimum requirements.
  • The project has all the required Google APIs enabled.

By default, the script downloads and extracts the installation file and downloads the asm configuration package from GitHub to a temp directory. Before exiting, the script outputs a message that provides the name of the temp directory. You can specify an existing directory for the downloads with the --output_dir DIR_PATH option. The --output_dir option makes it convenient for you to use the istioctl command-line tool if you need it. Additionally, the configuration files to enable optional features are included in the asm/istio/options directory.

Run the following command to validate your configuration and download the installation file and asm package to the OUTPUT_DIR directory:

./install_asm \
  --project_id PROJECT_ID \
  --cluster_name CLUSTER_NAME \
  --cluster_location CLUSTER_LOCATION \
  --mode upgrade \
  --output_dir DIR_PATH \
  --only_validate

On success, the script outputs the following:

./install_asm \
install_asm: Setting up necessary files...
install_asm: Creating temp directory...
install_asm: Generating a new kubeconfig...
install_asm: Checking installation tool dependencies...
install_asm: Downloading ASM..
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 57.0M  100 57.0M    0     0  30.6M      0  0:00:01  0:00:01 --:--:-- 30.6M
install_asm: Downloading ASM kpt package...
fetching package /asm from https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages to asm
install_asm: Checking for project PROJECT_ID...
install_asm: Confirming cluster information...
install_asm: Confirming node pool requirements...
install_asm: Fetching/writing GCP credentials to kubeconfig file...
Fetching cluster endpoint and auth data.
kubeconfig entry generated for cluster-1.
install_asm: Checking Istio installations...
install_asm: Checking required APIs...
install_asm: Successfully validated all requirements to install ASM from this computer.

If one of the tests fails the validation, the script outputs an error message. For example, if your project doesn't have all of the required Google APIs enabled, you see the following error:

ERROR: One or more APIs are not enabled. Please enable them and retry, or run
the script with the '--enable_gcp_apis' flag to allow the script to enable them
on your behalf.

If you got an error message about needing to run the script with an enablement flag, you can include the specific flag from the error message or the --enable_all flag when running the script without --only_validate. If you prefer, you can update your project and cluster yourself before running the script as described in the Setting up your project and Setting up your cluster sections of the Multi-project installation guide.

Upgrading

The following command runs the script to upgrade. The script doesn't allow you to change to another CA, so you don't need to include the ca option.

./install_asm \
  --project_id  PROJECT_ID \
  --cluster_name CLUSTER_NAME \
  --cluster_location CLUSTER_LOCATION \
  --mode upgrade \
  --option revisioned-istio-ingressgateway \
  --enable_all

Upgrading with an with an overlay file

An overlay file is a YAML file containing an IstioOperator custom resource (CR) that you pass to install_asm to configure the control plane. You can override the default control plane configuration and enable an optional feature by passing the YAML file to install_asm. You can layer on more overlays, and each overlay file overrides the configuration on the previous layers.

If you specify more than one CR in a YAML file, install_asm splits the file into multiple temporary YAML files, one for each CR. The script splits the CRs into separate files because istioctl install only applies the first CR in a YAML file containing more than one CR.

The following example does an upgrade and includes an overlay file to customize the control plane configuration. In the following command, change OVERLAY_FILE to the name of the YAML file.

./install_asm \
  --project_id PROJECT_ID \
  --cluster_name CLUSTER_NAME \
  --cluster_location CLUSTER_LOCATION \
  --mode upgrade \
  --enable_all \
  --option revisioned-istio-ingressgateway \
  --custom_overlay OVERLAY_FILE

Upgrading with an option

The following example does an upgrade and includes the egressgateways.yaml file from the anthos-service-mesh package, which enables an egress gateway. Note that you don't include the .yaml extension. The script fetches the file for you so you don't have to download the asm package first.

./install_asm \
  --project_id PROJECT_ID \
  --cluster_name CLUSTER_NAME \
  --cluster_location CLUSTER_LOCATION \
  --mode upgrade \
  --enable_all \
  --option revisioned-istio-ingressgateway \
  --option egressgateways

You can use --option to enable an optional feature. If you need to make modifications to any of the files in the asm/istio/options directory in the asm package, download the asm package, make your changes, and include the file using --custom_overlay.

To download the asm package to the current working directory so you can make modifications to the files:

kpt pkg get \
https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@release-1.8-asm asm

If you ran the Only validate example where you specified the --output_dir option, then the configuration files are in the specified output directory under asm/istio/options.

Deploying and redeploying workloads

Your installation isn't complete until you enable automatic sidecar proxy injection (auto-injection). Migrations from OSS Istio and upgrades follow the revision-based upgrade process (referred to as "canary upgrades" in the Istio documentation). With a revision-based upgrade, the new version of the control plane is installed alongside the existing control plane. You then move some of your workloads to the new version, which lets you monitor the effect of the upgrade with a small percentage of the workloads before migrating all of the traffic to the new version.

The script sets a revision label in the format istio.io/rev=asm-186-8 on istiod. To enable auto-injection, add a matching revision label to your namespace(s). The revision label is used by the sidecar injector webhook to associate injected sidecars with a particular istiod revision. After adding the label, restart the Pods in the namespace for sidecars to be injected.

The script also creates a revisioned istio-ingressgateway Deployment. This allows you to control when you switch to the new version.

  1. Get the revision label that is on istiod and the istio-ingressgateway.

    kubectl get pod -n istio-system -L istio.io/rev
    

    The output from the command is similar to the following.

    NAME                                             READY   STATUS    RESTARTS   AGE   REV
    istio-ingressgateway-65d884685d-6hrdk            1/1     Running   0          67m
    istio-ingressgateway-65d884685d-94wgz            1/1     Running   0          67m
    istio-ingressgateway-asm-182-2-8b5fc8767-gk6hb   1/1     Running   0          5s    asm-186-8
    istio-ingressgateway-asm-182-2-8b5fc8767-hn4w2   1/1     Running   0          20s   asm-186-8
    istiod-asm-176-1-67998f4b55-lrzpz                1/1     Running   0          68m   asm-178-10
    istiod-asm-176-1-67998f4b55-r76kr                1/1     Running   0          68m   asm-178-10
    istiod-asm-182-2-5cd96f88f6-n7tj9                1/1     Running   0          27s   asm-186-8
    istiod-asm-182-2-5cd96f88f6-wm68b                1/1     Running   0          27s   asm-186-8
    1. Note whether you have both the old and new versions of the istio-ingressgateway.

      • If you included the revisioned-istio-ingressgateway option when you upgraded, a canary upgrade of the istio-ingressgateway was done. In this case, your output shows both the old and new versions of the istio-ingressgateway.

      • If you didn't include revisioned-istio-ingressgateway when you upgraded, an in-place upgrade of the istio-ingressgateway was done. In this case, your output shows only the new version.

    2. In the output, under the REV column, note the value of the revision label for the new version. In this example, the value is asm-186-8.

    3. Also note the value in the revision label for the old istiod version. You need this to delete the old version of istiod when you finish moving workloads to the new version. In the example output, the value of the revision label for the old version is asm-178-10.

  2. If you have both the old and new versions of the istio-ingressgateway, switch the istio-ingressgateway to the new revision. In the following command, change REVISION to the value that matches the revision label of the new version.

    kubectl patch service -n istio-system istio-ingressgateway --type='json' -p='[{"op": "replace", "path": "/spec/selector/service.istio.io~1canonical-revision", "value": "REVISION"}]'

    Expected output: service/istio-ingressgateway patched

  3. Add the revision label to a namespace and remove the istio-injection label (if it exists). In the following command, change REVISION to the value that matches the new revision of istiod.

    kubectl label namespace NAMESPACE istio.io/rev=REVISION istio-injection- --overwrite

    If you see "istio-injection not found" in the output, you can ignore it. That means that the namespace didn't previously have the istio-injection label. Because auto-injection fails if a namespace has both the istio-injection and the revision label, all kubectl label commands in the Anthos Service Mesh documentation include removing the istio-injection label.

  4. Restart the Pods to trigger re-injection.

    kubectl rollout restart deployment -n NAMESPACE
  5. Verify that your Pods are configured to point to the new version of istiod.

    kubectl get pods -n NAMESPACE -l istio.io/rev=REVISION
  6. Test your application to verify that the workloads are working correctly.

  7. If you have workloads in other namespaces, repeat the steps to label the namespace and restart Pods.

  8. If you are satisfied that your application is working as expected, continue with the steps to transition to the new version of istiod. If there's an issue with your application, follow the steps to rollback.

  9. Run the following command again to confirm whether you have both the old and new versions of the istio-ingressgateway or only the new version. This determines how you handle transitioning to the new version of the istio-ingressgateway or rolling back to the old version.

    kubectl get pod -n istio-system -L istio.io/rev
    

    Complete the transition

    If you are satisfied that your application is working as expected, remove the old control plane to complete the transition to the new version.

    1. Change to the directory where the files from the anthos-service-mesh GitHub repository are located.

    2. Configure the validating webhook to use the new control plane.

      kubectl apply -f asm/istio/istiod-service.yaml
      
    3. If you have both the old and new versions of the istio-ingressgateway, delete the old istio-ingressgateway Deployment. The command that you run depends on whether you are migrating from Istio or upgrading from a previous version of Anthos Service Mesh:

      Migrate

      If you migrated from Istio, the old istio-ingressgateway doesn't have a revision label.

      kubectl delete deploy/istio-ingressgateway -n istio-system
      

      Upgrade

      If you upgraded from a previous Anthos Service Mesh version, in the following command, replace OLD_REVISION with the revision label for the previous version of the istio-ingressgateway.

      kubectl delete deploy -l app=istio-ingressgateway,istio.io/rev=OLD_REVISION -n istio-system --ignore-not-found=true
      
    4. Delete the old version of istiod. The command that you use depends on whether you are migrating from Istio or upgrading from a previous version of Anthos Service Mesh.

      Migrate

      If you migrated from Istio, the old istiod doesn't have a revision label.

      kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod -n istio-system --ignore-not-found=true
      

      Upgrade

      If you upgraded from a previous Anthos Service Mesh version, in the following command, make sure that OLD_REVISION matches the revision label for the previous version of istiod.

      kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-OLD_REVISION -n istio-system --ignore-not-found=true
      
    5. Remove the old version of the IstioOperator configuration.

      kubectl delete IstioOperator installed-state-OLD_REVISION -n istio-system
      

      The expected output is similar to the following:

      istiooperator.install.istio.io "installed-state-OLD_REVISION" deleted

    Rollback

    If you encountered an issue when testing your application with the new version of istiod, follow these steps to rollback to the previous version:

    1. Switch back to the old version of the istio-ingressgateway. The command that you use depends on whether you have both the old and new versions of the istio-ingressgateway or only the new version.

      • If you have both the old and new versions of the istio-ingressgateway run the kubectl patch service command and replace OLD_REVISION with the old revision.

        kubectl patch service -n istio-system istio-ingressgateway --type='json' -p='[{"op": "replace", "path": "/spec/selector/service.istio.io~1canonical-revision", "value": "OLD_REVISION"}]'
        
      • If you only have the new version of the istio-ingressgateway, run the kubectl rollout undo command.

        kubectl -n istio-system rollout undo deploy istio-ingressgateway
        
    2. Relabel your namespace to enable auto-injection with the previous version of istiod. The command that you use depends on whether you used a revision label or istio-injection=enabled with the previous version.

      • If you used a revision label for auto-injection:

        kubectl label namespace NAMESPACE istio.io/rev=OLD_REVISION --overwrite
        
      • If you used istio-injection=enabled:

        kubectl label namespace NAMESPACE istio.io/rev- istio-injection=enabled --overwrite
        

      Expected output:

      namespace/NAMESPACE labeled
    3. Confirm that the revision label on the namespace matches the revision label on the previous version of istiod:

      kubectl get ns NAMESPACE --show-labels
      
    4. Restart the Pods to trigger re-injection so the proxies have the previous version:

      kubectl rollout restart deployment -n NAMESPACE
      
    5. If you have both the old and new versions of the istio-ingressgateway, remove the new istio-ingressgateway Deployment. Make sure that the value of REVISION in the following command is correct.

      kubectl delete deploy -l app=istio-ingressgateway,istio.io/rev=REVISION -n istio-system --ignore-not-found=true
      
    6. Remove the new version of istiod. Make sure that the value of REVISION in the following command is correct.

      kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod-REVISION -n istio-system --ignore-not-found=true
      
    7. Remove the new version of the IstioOperator configuration.

      kubectl delete IstioOperator installed-state-REVISION -n istio-system
      

      Expected output is similar to the following:

      istiooperator.install.istio.io "installed-state-REVISION" deleted
    8. If you didn't include the --disable_canonical_service flag, the script enabled the Canonical Service controller. We recommend that you leave it enabled, but if you need to disable it, see Enabling and disabling the Canonical Service controller.

Viewing the Anthos Service Mesh dashboards

After you have workloads deployed on your cluster with the sidecar proxies injected, you can explore the Anthos Service Mesh pages in the Google Cloud console to see all of the observability features that Anthos Service Mesh offers. Note that it takes about one or two minutes for telemetry data to be displayed in the Google Cloud console after you deploy workloads.

Access to Anthos Service Mesh in the Google Cloud console is controlled by Identity and Access Management (IAM). To access the Anthos Service Mesh pages, a Project Owner must grant users the Project Editor or Viewer role, or the more restrictive roles described in Controlling access to Anthos Service Mesh in the Google Cloud console.

  1. In the Google Cloud console, go to Anthos Service Mesh.

    Go to Anthos Service Mesh

  2. Select the Google Cloud project from the drop-down list on the menu bar.

  3. If you have more than one service mesh, select the mesh from the Service Mesh drop-down list.

To learn more, see Exploring Anthos Service Mesh in the Google Cloud console.

In addition to the Anthos Service Mesh pages, metrics related to your services (such as the number of requests received by a particular service) are sent to Cloud Monitoring, where they appear in the Metrics Explorer.

To view metrics:

  1. In the Google Cloud console, go to the Monitoring page:

    Go to Monitoring

  2. Select Resources > Metrics Explorer.

For a full list of metrics, see Istio metrics in the Cloud Monitoring documentation.

What's next