Upgrading to Istio 1.6 with Operator

Starting with version 1.6, the Istio on Google Kubernetes Engine add-on uses the Istio Operator for installation and configuration . The Istio Operator follows the Kubernetes Operator pattern. The Operator lets you configure Istio by defining a Kubernetes custom resource definition (CRD) for the Istio installation. The Operator then uses a controller to make changes to the installation to match the custom resource.

When you upgrade your cluster to 1.17.9-gke.6300 or higher, the Istio 1.6 Operator and control plane are installed alongside the existing 1.4.x Istio control plane. The upgrade requires user action and follows the dual control plane upgrade process (referred to as canary upgrades in the Istio documentation). With a dual control plane upgrade, you can migrate to the 1.6 version by setting a label on your workloads to point to the new control plane and performing a rolling restart.

We aren't releasing a 1.5 version of the Istio on GKE add-on. Version 1.6 is the version that we release after 1.4.10.

  • Upgrading the cluster to GKE versions 1.17 and higher causes the built-in ingress gateway to be unavailable for approximately 5 minutes during the upgrade process. We recommend installing and managing separate user-defined gateways to avoid this issue, as described in Adding gateways.
  • There is a known issue with the upgrade from GKE 1.16 to 1.17 and 1.18 versions earlier than R33. A fix is available in versions 1.17.12-gke.1501 or higher and 1.18.9-gke.1501 or higher. The issue causes any custom resources you created in the istio-system namespace to be deleted during the upgrade. These resources must be manually recreated. Ensure that you only upgrade to an affected version. The issue only occurs during upgrades, so new clusters are not affected.

Istio Operator benefits

The Operator allows for greater installation configurability. In versions of the add-on prior to 1.6, the GKE add-on manager reconciles any changes to the Istio manifest and prevents most types of configuration changes. The Istio Operator doesn't have this limitation. The Operator generates an Istio control plane installation manifest based on the IstioOperator custom resource (CR) that you provide during installation. This CR is completely under your control and is never reconciled.

After you have upgraded to Istio 1.6 with the Operator, you can migrate from the Istio on GKE add-on to the open source version of Istio.

Upgrading to Istio 1.6 with Operator

You only need to do these steps once to transition to the Operator. Subsequent upgrades follow the dual control plane upgrade process.

  1. Select a GKE version that includes Istio 1.6 (1.17.7-gke.8+, 1.17.8-gke.6+) and upgrade your cluster.

    Note that the Istio on Google Kubernetes Engine add-on with Istio 1.6 installs two versions of Istio:

    • The static manifest version controlled by the add-on manager (which is active after you upgrade your cluster).

    • The 1.6 version controlled by the Operator (which is inactive until enabled). The inactive 1.6 version doesn't connect to any proxies and consumes negligible cluster resources.

    If the currently installed version of Istio differs from the version in the target static manifest, upgrading the cluster might also perform an in-place upgrade of Istio. For example, if your cluster is currently running Istio 1.4.6-gke.0 and you select GKE cluster version 1.17.7-gke.8, your Istio control plane is upgraded to 1.4.10-gke.0 (or higher) as part of the upgrade.

  2. Check that the 1.6 version of Istio is installed and running:

    kubectl get pods -n istio-system
    

    You should see an istiod pod in a Running state along with the 1.4 control plane components, for example:

    NAME                                             READY   STATUS      RESTARTS   AGE
    istio-citadel-78b9b5b589-52cqg                   1/1     Running     0          4m16s
    istio-galley-79bd448645-zxfjm                    1/1     Running     0          4m16s
    istio-ingressgateway-b4f8986c4-dbr6c             1/1     Running     0          4m27s
    istio-pilot-dc558d859-cnrqt                      2/2     Running     1          4m16s
    istio-policy-8664dd6c4-m42xs                     2/2     Running     1          4m15s
    istio-security-post-install-1.4.10-gke.4-255r6   0/1     Completed   0          4m15s
    istio-sidecar-injector-7f85d7f7c4-vt2x9          1/1     Running     0          4m15s
    istio-telemetry-69b6477c5f-4pr6v                 2/2     Running     2          4m15s
    <b>istiod-istio-163-5fccfcf4dd-2p9c8                1/1     Running     0          3m31s</b>
    prometheus-7d9f49d945-4nps2                      2/2     Running     0          3m17s
    promsd-696bcc5b96-ln2s8                          2/2     Running     1          4m15s
    
  3. Ensure that you have migrated any unsupported v1alpha1 security policy configuration objects. For more information, see the Istio Upgrade Notes.

  4. Disable configuration validation in the 1.4.x control plane:

    1. Edit the galley ClusterRole resource:

      kubectl edit clusterrole -n istio-system istio-galley-istio-system
      
    2. Change the '*' value to get in the following list entry:

      - apiGroups:
        - admissionregistration.k8s.io
        resources:
        - validatingwebhookconfigurations
        verbs:
        - '*' # change this to get
      

      After the update, your code should look like this:

      - apiGroups:
        - admissionregistration.k8s.io
        resources:
        - validatingwebhookconfigurations
        verbs:
        - get
      
    3. Delete the Galley ValidatingWebhookConfiguration:

      kubectl delete ValidatingWebhookConfiguration istio-galley -n istio-system
      

Migrate a namespace to 1.6

Installing the new version has no impact on the existing sidecar proxies. To upgrade these, you must configure them to point to the new control plane. This is controlled during sidecar injection based on the namespace label istio.io/rev.

  1. Find the exact Istio 1.6 version number:

    kubectl -n istio-system get pods -lapp=istiod --show-labels

    The output of the command is similar to the following:

    NAME                                READY   STATUS    RESTARTS   AGE   LABELS
    istiod-istio-163-5fccfcf4dd-2p9c8   1/1     Running   0          22h   app=istiod,istio.io/rev=istio-163,istio=istiod,pod-template-hash=5fccfcf4dd

    In this example, the version is: istio-163

  2. Re-label the namespace containing the workloads that you want to roll over to 1.6. The version label has to match the version of the control plane. In the following command, replace VERSION with the version from the previous command, for example: istio-163

    kubectl label namespace NAMESPACE istio-injection- istio.io/rev=VERSION --overwrite
  3. Perform a rolling restart of the selected workloads:

    kubectl rollout restart deployment DEPLOYMENT -n NAMESPACE
  4. List the pods in the namespace:

    kubectl get pods -n NAMESPACE
  5. Select one of the pods to check that the workloads have been injected with the 1.6 version of the sidecar proxy:

    kubectl describe pod -n NAMESPACE YOUR_SELECTED_POD

    The output should show the proxy container at the 1.6 version, for example:

    ...
    istio-proxy:
      Container ID:  docker://22f62020ddcc6f8e02d800b5614e02aae2d082ce991c9e3eab9846d9f2cf90f5
      Image:         gcr.io/gke-release/istio/proxyv2:1.6.3-gke.0
    ...
    
  6. Complete the migration of all namespaces to the new Istio version. Repeat steps 3 through 5 for all your workload namespaces.

If you plan to migrate to Anthos Service Mesh, skip to Turn down the old control plane.

To remain on the Istio addon, do the following steps to migrate the ingress gateways to the new Istio version.

  1. Edit the IstioOperator CR to replace the ingress gateway with a 1.6 version:

    kubectl edit istiooperators -n istio-system istio-1-6-3-gke-0
    
  2. Change the enabled setting for the gateway to true:

    ...
    spec:
      components:
        ingressGateways:
        - enabled: false # change this to true
          name: istio-ingressgateway
    
  3. Verify that the gateway pod is recreated with the new 1.6 version.

    kubectl get pods -n istio-system -l app=istio-ingressgateway -o yaml | grep image
    

Turn down the old control plane

After all workload proxies are using the 1.6 control plane, you can turn down the old control plane by scaling the replicas of each old component to 0.

To scale down the replicas:

kubectl scale deploy -n istio-system --replicas=0 \
  istio-citadel istio-galley istio-pilot istio-policy istio-sidecar-injector istio-telemetry prometheus promsd

The remaining Istio resources can be left in the cluster without any problem.

If you are remaining on Istio, you can now edit the IstioOperator CR and upgrade the Operator on your own schedule. For more information, see the Operator documentation.

Migrating to Anthos Service Mesh

Before migrating to Anthos Service Mesh, you must disable the Operator as described below. After migrating, you still configure the service mesh using the same IstioOperator CR format as the Operator, but you do this with the istioctl install command when you want to change the installed state, rather than having the Operator controller continuously watching the IstioOperator CR in the cluster.

To migrate to Anthos Service Mesh, you use a Google-provided script that handles all the details of preparing your Cloud project and cluster, and then installs Anthos Service Mesh using the Anthos Service Mesh version of istioctl install. Although we recommend migrating to Anthos Service Mesh 1.7, you can migrate to Anthos Service Mesh 1.6 using the 1.6 version of the script.

Requirements

Make sure your cluster meets the following requirements:

  • A machine type that has at least four vCPUs, such as e2-standard-4. If the machine type for your cluster doesn't have at least four vCPUs, change the machine type as described in Migrating workloads to different machine types.

  • The minimum number of nodes depends on your machine type. Anthos Service Mesh requires at least eight vCPUs. If the machine type has four vCPUs, your cluster must have at least two nodes. If the machine type has eight vCPUs, the cluster only needs one node. If you need to add nodes, see Resizing a cluster.

  • The script enables Workload Identity on your cluster. Workload Identity is the recommended method of calling Google APIs. Enabling Workload Identity changes the way calls from your workloads to Google APIs are secured, as described in Workload Identity limitations.

  • To be included in the service mesh, service ports must be named, and the name must include the port's protocol in the following syntax: name: protocol[-suffix] where the square brackets indicate an optional suffix that must start with a dash. For more information, see Naming service ports.

  • If you are installing Anthos Service Mesh on a private cluster, you must open port 15017 in the firewall to get the webhook used with automatic sidecar injection to work properly. For more information, see Opening a port on a private cluster.

  • If you have created a service perimeter in your organization, you might need to add the Mesh CA service to the perimeter. See Adding Mesh CA to a service perimeter for more information.

Plan the migration

To help you plan the migration, review Preparing to migrate from Istio.

Disable the Operator

To prevent the Operator from reconciling the istio-ingressgateway that Anthos Service Mesh installs, you need to disable the Operator.

To disable the Operator:

  1. Get the Operator version:

    kubectl get istiooperators -n istio-system
    

    The output is similar to the following:

    NAME                        REVISION     STATUS    AGE
    istio-1-6-11-gke-0          istio-1611   HEALTHY   12h

    In the sample output the Operator version is istio-1-6-11-gke-0.

  2. Disable the Operator. In the following command replace VERSION with the Operator version from the previous step:

    kubectl patch -n istio-system istiooperator VERSION -p '{"spec":{"profile":"disabled"}}' --type=merge
    

    This command blocks the operator from making any changes in the cluster.

Migrate to Anthos Service Mesh

This section describes how to migrate to Anthos Service Mesh using the install_asm script. We recommend that you migrate to Anthos Service Mesh 1.7, but migrating to Anthos Service Mesh 1.6 is supported.

Migrate to 1.7

  1. Install the required tools.

  2. Download the install_asm script.

  3. Review the script's options and flags.

    If you haven't customized the Istio installation, and you want to continue using Citadel as the certificate authority (CA), you can migrate to Anthos Service Mesh with the following arguments to the script:

    ./install_asm \
      --project_id PROJECT_ID \
      --cluster_name CLUSTER_NAME\
      --cluster_location CLUSTER_LOCATION \
      --mode migrate \
      --ca citadel \
      --enable_apis
    

    Anthos Service Mesh 1.7 also provides overlay files available in GitHub for commonly used features such as enabling in egress gateway. For more information, see Enabling optional features.

  4. To complete setting up Anthos Service Mesh, you need to enable automatic sidecar injection and deploy or redeploy workloads.

Migrate to 1.6

  1. Install the required tools.

  2. Download the install_asm script.

  3. Review the script's options and flags.

    If you haven't customized the Istio installation, and you want to continue using Citadel as the certificate authority (CA), you can migrate to Anthos Service Mesh with the following arguments to the script:

    ./install_asm \
      --project_id PROJECT_ID \
      --cluster_name CLUSTER_NAME\
      --cluster_location CLUSTER_LOCATION \
      --mode migrate \
      --ca citadel \
      --enable_apis
    
  4. To complete setting up Anthos Service Mesh, you need to enable automatic sidecar injection and deploy or redeploy workloads.

After migrating

Run the following command, and replace VERSION with the Operator version that you used previously to disable the Operator:

kubectl patch -n istio-system istiooperator VERSION -p '{"spec":{"profile":"empty"}}'

This command re-enables the Operator with an empty profile, which causes it to remove the resources it previously installed from the cluster. This doesn't include the gateways or control plane elements installed by the install_asm script.