Cloud Service Mesh control plane revisions

This page describes how control plane revisions work and the value of using them for safe service mesh upgrades (and rollbacks).

Service mesh installation fundamentals

At a high level, Cloud Service Mesh installation consists of two major phases:

  1. First you use the asmcli tool to install an in-cluster control plane. The control plane consists of a set of system services that are responsible for managing mesh configuration.

  2. Next, you deploy a special sidecar proxy throughout your environment that intercepts network communication to and from each workload. The proxies communicate with the control plane to get their configuration, which lets you direct and control traffic (data plane traffic) around your mesh without making any changes to your workloads.

    To deploy the proxies, you use a process called automatic sidecar injection (auto-injection) to run a proxy as an additional sidecar container in each of your workload Pods. You don't need to modify the Kubernetes manifests that you use to deploy your workloads, but you do need to add a label to your namespaces and restart the Pods.

Use revisions to upgrade your mesh safely

The ability to control traffic is one of the principal benefits of using a service mesh. For example, you can gradually shift traffic to a new version of an application when you first deploy it to production. If you detect problems during the upgrade, you can shift traffic back to the original version, providing a low risk means of rolling back. This procedure is known as a canary release, and it greatly reduces the risk associated with new deployments.

Using control plane revisions in a canary upgrade, you install a new and separate control plane and configuration alongside the existing control plane. The installer assigns a string called a revision to identify the new control plane. At first, the sidecar proxies continue to receive configuration from the previous version of the control plane. You gradually associate workloads with the new control plane by labelling their namespaces or Pods with the new control plane revision. Once you have labelled a namespace or Pods with the new revision, you restart the workload Pods so that new sidecars are auto-injected, and they receive their configuration from the new control plane. If there are problems, you can roll back by associating the workloads with the original control plane.

How does auto-injection work?

Auto-injection uses a Kubernetes feature called admission control. A mutating admission webhook is registered to watch for newly created Pods. The webhook is configured with a namespace selector so that it only matches Pods that are being deployed to namespaces that have a particular label. When a Pod matches, the webhook consults an injection service provided by the control plane to obtain a new, mutated configuration for the Pod, which contains the containers and volumes needed to run the sidecar.

sidecar injector

  1. A webhook configuration is created during installation. The webhook is registered with the Kubernetes API server.
  2. The Kubernetes API server watches for Pod deployments in namespaces that match the webhook namespaceSelector.
  3. A namespace is labeled so that it will be matched by the namespaceSelector.
  4. Pods deployed to the namespace trigger the webhook.
  5. The inject service provided by the control plane mutates the Pod specifications to auto-inject the sidecar.

What is a revision?

The label used for auto-injection is like any other user-defined Kubernetes label. A label is essentially a key-value pair which can be used to support the concept of labelling. Labels are widely used for tagging and for revisions. For example, Git tags, Docker tags, and Knative revisions.

The current Cloud Service Mesh installation process lets you label the installed control plane with a revision string. The installer labels every control plane object with the revision. The key in the key-value pair is istio.io/rev. For in-cluster control planes, the istiod Service and Deployment typically have a revision label similar to istio.io/rev=asm-1233-2, where asm-1233-2 identifies the Cloud Service Mesh version. The revision becomes part of the service name, for example: istiod-asm-1233-2.istio-system.

To enable auto-injection, you add a revision label to your namespaces that matches the revision label on the control plane. For example, a control plane with revision istio.io/rev=asm-1233-2 selects Pods in namespaces with the label istio.io/rev=asm-1233-2 and injects sidecars.

The canary upgrade process

Revision labels make it possible to perform canary upgrades and rollbacks of the in-cluster control plane.

canary upgrade

The following steps describe how the process works:

  1. Start with an existing Cloud Service Mesh or open source Istio installation. It doesn't matter whether the namespaces are using a revision label or the istio-injection=enabled label.
  2. Use a revision string when you install the new version of the control plane. Because of the revision string, the new control plane is installed alongside the existing version. The new installation includes a new webhook configuration with a namespaceSelector configured to watch for namespaces with that specific revision label.
  3. You migrate sidecar proxies to the new control plane by removing the old label from the namespace, adding the new revision label, and then restarting the Pods. If you use revisions with Cloud Service Mesh, you must stop using the istio-injection=enabled label. A control plane with a revision does not select Pods in namespaces with an istio-injection label, even if there is a revision label. The webhook for the new control plane injects sidecars into the Pods.
  4. Carefully test the workloads associated with the upgraded control plane and either continue to roll out the upgrade or roll back to the original control plane.

After associating Pods with the new control plane, the existing control plane and webhook are still installed. The old webhook has no effect for Pods in namespaces that have been migrated to the new control plane. You can roll back the Pods in a namespace to the original control plane by removing the new revision label, adding back the original label and restarting the Pods. When you are certain that the upgrade is complete, you can remove the old control plane.

A closer look at a mutating webhook configuration

To better understand the mutating webhook for automatic sidecar injection, inspect the configuration yourself. Use the following command:

kubectl -n istio-system get mutatingwebhookconfiguration -l app=sidecar-injector -o yaml

You should see a separate configuration for each control plane that you have installed. A namespace selector for a revision-based control plane looks like this:

 namespaceSelector:
    matchExpressions:
    - key: istio-injection
      operator: DoesNotExist
    - key: istio.io/rev
      operator: In
      values:
      - asm-1233-2

The selector may vary depending on the version of Cloud Service Mesh or Istio that you are running. This selector matches namespaces with a specific revision label as long as they don't also have an istio-injection label.

When a Pod is deployed to a namespace matching the selector, its Pod specification is submitted to the injector service for mutation. The injector service to be called is specified as follows:

     service:
        name: istiod-asm-1233-2
        namespace: istio-system
        path: /inject
        port: 443

The service is exposed by the control plane on port 443 at the inject URL path.

The rules section specifies that the webhook should apply to Pod creation:

   rules:
    - apiGroups:
      - ""
      apiVersions:
      - v1
      operations:
      - CREATE
      resources:
      - pods
      scope: '*'