Anthos Service Mesh control plane revisions

This page describes how control plane revisions work and the value of using them for safe service mesh upgrades (and rollbacks). Up until version 1.6.8, the default installation process for Anthos Service Mesh didn't use control plane revisions. Introducing revisions might require some effort and modifications to your installation procedures, but we highly recommend it since using revisions brings significant benefits.

Service mesh fundamentals

Anthos Service Mesh installation consists of two major parts, which are usually automated using the install_asm script. First you use the istioctl command line tool and IstioOperator YAML files to install the control plane and its configuration. The control plane (also referred to as istiod) consists of a set of system services that are responsible for managing mesh configuration. Next, you deploy a special sidecar proxy throughout your environment that intercepts network communication to and from each workload. The proxies communicate with the control plane to get their configuration, which allows you to direct and control traffic (data plane traffic) around your mesh without making any changes to your workloads.

To deploy the proxies, you use a process called automatic sidecar injection (auto-injection) to run a proxy as an additional sidecar container in each of your workload Pods. You don't need to modify the Kubernetes manifests that you use to deploy your workloads, but you do need to add a label to your namespaces and restart the Pods.

Prior to Anthos Service Mesh 1.6, you upgraded by installing a new version of the control plane which immediately replaced the old version. This procedure is known as an in-place upgrade, and it is risky because if there are failures, rolling back can be difficult. To re-inject the proxies and have them communicate with the new control plane version, you had to restart all workloads in all of your namespaces. Depending on the number of workloads and namespaces in your mesh, the entire upgrade process could take an hour or more. In-place upgrades can lead to downtime and should be scheduled in maintenance windows.

Use revisions to upgrade your mesh safely

The ability to control traffic is one of the principal benefits of using a service mesh. For example, you can gradually shift traffic to a new version of an application when you first deploy it to production. If you detect problems during the upgrade, you can shift traffic back to the original version, providing a simple and low risk means of rolling back. This procedure is known as a canary release, and it greatly reduces the risk associated with new deployments.

Similarly, you can minimize the risk associated with upgrading the service mesh itself. Anthos Service Mesh 1.6 and later supports canary upgrades by using control plane revisions. With a canary upgrade, you install a new and separate control plane and configuration alongside the existing control plane. The installer assigns a string called a revision to identify the new control plane. At first, the sidecar proxies continue to receive configuration from the previous version of the control plane. You gradually associate workloads with the new control plane by labelling their namespaces with the new control plane revision. Once you have labelled a namespace with the new revision, you restart the workload Pods so that new sidecars are injected, and they receive their configuration from the new control plane. If there are problems, you can easily roll back by associating the workloads with the original control plane.

How does auto-injection work?

Auto-injection uses a Kubernetes feature called admission control. A mutating admission webhook is registered to watch for newly created Pods. The webhook is configured with a namespace selector so that it only matches Pods that are being deployed to namespaces that have a particular label. When a Pod matches, the webhook consults an injection service provided by istiod to obtain a new, mutated configuration for the Pod, which contains the containers and volumes needed to run the sidecar.

sidecar injector

  1. Webhook configuration is created during installation. Registers webhook with Kubernetes API server.
  2. Kubernetes API server watches for Pod deployments in namespaces that match the webhook namespaceSelector.
  3. A namespace is labeled so that it will be matched by the namespaceSelector.
  4. Pods deployed to the namespace trigger the webhook.
  5. The inject service provided by istiod mutates the Pod specifications to inject the sidecar.

What is a revision?

The label used for auto-injection is like any other user-defined Kubernetes label. A label is essentially a key-value pair which can be used to support the concept of tagging. Labels are widely used for tagging and for revisions—examples include Git tags, Docker tags, and Knative revisions.

Up until Anthos Service Mesh version 1.6.8, the default installation procedures have established a convention for configuring the namespace selector in the webhook to use the label: istio-injection=enabled

The current Anthos Service Mesh installation process lets you tag the installed control plane with a revision string, both as a revision argument to istioctl commands and as a field in the IstioOperator custom resource. The installer labels every control plane object with the revision, including the istiod Service and Deployment. The revision becomes part of the service name, for example, istiod-asm-173-6.istio-system.

The corresponding label key for namespaces is istio.io/rev and the value is typically set to indicate the version of the mesh. For example, a control plane with revision asm-173-6 selects Pods in namespaces with the label istio.io/rev=asm-173-6 and injects sidecars.

The canary upgrade process

Revision labels make it possible to perform canary upgrades and easy rollbacks of the control plane.

canary upgrade

The following steps describe how the process works:

  1. Start with an existing Anthos Service Mesh or open source Istio installation. It doesn't matter whether the namespaces are using a revision label or the istio-injection=enabled label.
  2. Use a revision string when you install the new version of the control plane. Because of the revision string, the new control plane is installed alongside the existing version. The new installation includes a new webhook configuration with a namespaceSelector configured to watch for namespaces with that specific revision label.
  3. You migrate sidecar proxies to the new control plane by removing the old label from the namespace, adding the new revision label, and then restarting the Pods. If you use revisions with Anthos Service Mesh, you must stop using the istio-injection=enabled label. A control plane with a revision does not select Pods in namespaces with an istio-injection label, even if there is a revision label. The webhook for the new control plane injects sidecars into the Pods.
  4. Carefully test the workloads associated with the upgraded control plane and either continue to roll out the upgrade or roll back to the original control plane.

After associating Pods with the new control plane, the existing control plane and webhook are still installed. The old webhook has no effect for Pods in namespaces that have been migrated to the new control plane. You can roll back the Pods in a namespace to the original control plane by removing the new revision label, adding back the original label and restarting the Pods. When you are certain that the upgrade is complete, you can remove the old control plane.

For detailed steps on upgrading using revisions, see the Upgrade guides.

A closer look at a mutating webhook configuration

The best way to understand the mutating webhook for automatic sidecar injection is to inspect the configuration yourself. Use the following command:

kubectl -n istio-system get mutatingwebhookconfiguration -l app=sidecar-injector -o yaml

You should see a separate configuration for each control plane that you have installed. A namespace selector for a revision-based control plane looks like this:

 namespaceSelector:
    matchExpressions:
    - key: istio-injection
      operator: DoesNotExist
    - key: istio.io/rev
      operator: In
      values:
      - asm-173-6

The selector may vary depending on the version of Anthos Service Mesh or Istio that you are running. This selector matches namespaces with a specific revision label as long as they do not also have an istio-injection label.

When a Pod is deployed to a namespace matching the selector, its Pod specification is submitted to the injector service for mutation. The injector service to be called is specified as follows:

     service:
        name: istiod-asm-173-6
        namespace: istio-system
        path: /inject
        port: 443

The service is exposed by istiod on port 443 at the inject URL path.

The rules section specifies that the webhook should apply to Pod creation:

   rules:
    - apiGroups:
      - ""
      apiVersions:
      - v1
      operations:
      - CREATE
      resources:
      - pods
      scope: '*'

Summary

Although the change over to using revision labels on your namespaces to enable auto-injection might take some getting used to, the benefits that revision labels provide for safe, canary upgrades are well worth the effort.

What's next