Configure managed Anthos Service Mesh

Overview

Managed Anthos Service Mesh is a Google-managed control plane and an optional data plane that you simply configure. Google handles their reliability, upgrades, scaling and security for you in a backward-compatible manner. This guide explains how to set up or migrate applications to managed Anthos Service Mesh in a single or multi-cluster configuration.

To learn about the supported features and limitations of managed Anthos Service Mesh, see Managed Anthos Service Mesh supported features.

Prerequisites

As a starting point, this guide assumes that you have:

Requirements

Migrations

  • Direct migrations/upgrades from in-cluster control plane to Google-managed control plane are supported only from versions 1.9+ (installed with Mesh CA).
  • Installations with Istio CA must first migrate to 1.9+ Mesh CA.
  • Indirect migrations/upgrades are supported, meaning that you can follow the standard Anthos Service Mesh upgrade paths through each version until you reach Anthos Service Mesh 1.11 with an in-cluster control plane, then you can migrate to the Google-managed control plane.

Limitations

We recommend that you review the list of managed Anthos Service Mesh supported features and limitations. In particular, note the following:

  • Managed Anthos Service Mesh can use multiple GKE clusters in a single Google Cloud project only.
  • The IstioOperator API isn't supported.

  • Managed data plane limitations:

    • This Preview release of the managed data plane is available only for new deployments of the managed control plane. If you previously deployed the managed control plane, and you want to deploy the managed data plane, you must rerun the installation script as described in Apply the Google-managed control plane.

    • The managed date plane is available on the Regular and Rapid release channels.

Enable Workload Identity

If Workload Identity isn't enabled, see Enabling Workload Identity on a cluster for the required IAM permissions and the gcloud CLI to enable it.

Download the installation script

  1. Download the latest version of the script that installs Anthos Service Mesh 1.11.8 to the current working directory:

    curl https://storage.googleapis.com/csm-artifacts/asm/install_asm_1.11 > install_asm
    
  2. Make the script executable:

    chmod +x install_asm
    

Configure each cluster

Use the following steps to configure managed Anthos Service Mesh for each cluster in your mesh.

Get cluster credentials

Retrieve the appropriate credentials. The following command will also point the kubectl context to the target cluster.

gcloud container clusters get-credentials  CLUSTER_NAME \
    --zone LOCATION \
    --project PROJECT_ID

Apply the Google-managed control plane

Run the install_asm installation script for each cluster that will use managed Anthos Service Mesh. We recommend that you include both of the following options when you run install_asm:

  • --option cni-managed This option enables the Istio Container Network Interface (CNI) plugin. The CNI plugin configures network traffic redirection to and from the sidecar proxies using the CNCF CNI interface instead of using a high-privilege init container.

  • --enable-registration This flag registers the cluster to fleet.

These options are required if you also want to deploy the Google-managed data plane. For a full list of options, see the asmcli reference page.

  ./install_asm --mode install --managed \
      -p PROJECT_ID \
      -l LOCATION \
      -n CLUSTER_NAME \
      --verbose \
      --output_dir CLUSTER_NAME \
      --enable-all \
      --enable-registration \
      --option cni-managed

The script will download to the specified --output_dir all the files for configuring the managed control plane, installing an Istio Gateway, along with the istioctl tool and sample applications. The steps in this guide assume that you run istioctl from the root of the installation directory, with istioctl present in its /bin subdirectory.

If you rerun install_asm on the same cluster, it overwrites the existing control plane configuration. Be sure to specify the same options and flags if want the same configuration.

Note that an ingress gateway isn't automatically deployed with the control plane. Decoupling the deployment of the ingress gateway and control plane allows you to more easily manage your gateways in a production environment. If the cluster needs an ingress gateway, see Install Istio gateways.

Apply the Google-managed data plane

If you want Google to manage upgrades of the proxies, enable the Google-managed data plane. If enabled, the sidecar proxies and injected gateways are automatically upgraded in conjunction with the managed control plane.

In the feature preview, managed data plane upgrades proxies by evicting Pods that are running older versions of the proxy. The evictions are done in an orderly manner honoring the Pod disruption budget and controlling the rate of change.

This Preview release of managed data plane doesn't manage the following:

  • Uninjected pods.
  • Manually injected pods using istioctl kube-inject.
  • Jobs
  • Stateful Sets
  • DaemonSet

If you don't want to use managed data plane or don't want to enable it for all namespaces, you should trigger a restart of the proxies to benefit from the latest proxy image. The control plane continues to work with the existing proxies.

The managed data plane is available on both the Rapid and Regular release channels.

To enable the Google-managed data plane:

  1. Enable data plane management:

    kubectl annotate --overwrite namespace NAMESPACE \
    mesh.cloud.google.com/proxy='{"managed":"true"}'
    

    Alternatively, you can enable the Google-managed data plane for a specific Pod by annotating it with the same annotation. When you annotate a specific Pod, that Pod uses the Google-managed sidecar proxy and the rest of the workloads use the unmanaged sidecar proxies.

  2. Repeat the previous step for each namespace that you want a managed data plane.

  3. Enable Anthos Service Mesh in the fleet:

    gcloud alpha container hub mesh enable --project=PROJECT_ID
    

It could take up to ten minutes for the data plane controller to be ready to manage the proxies in the cluster. Run the following command to check the status:

if kubectl get dataplanecontrols -o custom-columns=REV:.spec.revision,STATUS:.status.state | grep rapid | grep -v none > /dev/null; then echo "Managed Data Plane is ready."; else echo "Managed Data Plane is NOT ready."; fi

When the data plane controller is ready, the command will output: Managed Data Plane is ready.

If the status for data plane controller doesn't become ready after waiting over ten minutes, see Managed data plane status for troubleshooting tips.

If you want to disable the Google-managed data plane and revert back to managing the sidecar proxies yourself, change the annotation:

kubectl annotate --overwrite namespace NAMESPACE \
  mesh.cloud.google.com/proxy='{"managed":"false"}'

Install Istio Gateways (optional)

Anthos Service Mesh gives you the option to deploy and manage gateways as part of your service mesh. A gateway describes a load balancer operating at the edge of the mesh receiving incoming or outgoing HTTP/TCP connections. Gateways are Envoy proxies that provide you with fine-grained control over traffic entering and leaving the mesh.

As a best practice we recommend that you create a separate namespace for the gateways. Don't deploy the gateways to the istio-system namespace.

You can install one or more Istio Gateways in your cluster to handle typical ingress traffic, by using the following steps:

  1. Choose one of these two options to set up the namespace where you will deploy the gateway in later steps.

    • Enable the namespace for injection:
      kubectl label namespace GATEWAY_NAMESPACE istio-injection- istio.io/rev=asm-managed-rapid --overwrite
      

    OR

    • Enable injection only for the gateway pod, but not all pods in the namespace. This command removes all injection labels and later you will enable injection on the pod itself:
      kubectl label namespace GATEWAY_NAMESPACE istio-injection- istio.io/rev-
      
  2. Create a Deployment and Service for the gateway, by using the following minimal example:

    kubectl apply -f - << EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: istio-ingressgateway
      namespace: GATEWAY_NAMESPACE
    spec:
      type: LoadBalancer
      selector:
        istio: ingressgateway
      ports:
      - port: 80
        name: http
      - port: 443
        name: https
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: istio-ingressgateway
      namespace: GATEWAY_NAMESPACE
    spec:
      selector:
        matchLabels:
          istio: ingressgateway
      template:
        metadata:
          annotations:
            # This is required to tell Anthos Service Mesh to inject the gateway with the
            # required configuration.
            inject.istio.io/templates: gateway
          labels:
            istio: ingressgateway
            istio.io/rev: asm-managed-rapid # This is required only if the namespace is not labeled.
        spec:
          containers:
          - name: istio-proxy
            image: auto # The image will automatically update each time the pod starts.
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: istio-ingressgateway-sds
      namespace: GATEWAY_NAMESPACE
    rules:
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["get", "watch", "list"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: istio-ingressgateway-sds
      namespace: GATEWAY_NAMESPACE
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: istio-ingressgateway-sds
    subjects:
    - kind: ServiceAccount
      name: default
    EOF
    
  3. After you create the deployment, verify that the new services are working correctly:

    kubectl get pod,service -n GATEWAY_NAMESPACE
    

    Verify the output similar to the following:

    NAME                                      READY   STATUS    RESTARTS   AGE
    pod/istio-ingressgateway-856b7c77-bdb77   1/1     Running   0          3s
    
    NAME                           TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
    service/istio-ingressgateway   LoadBalancer   10.24.5.129    34.82.157.6      80:31904/TCP   3s

You can choose to let the managed data plane controller manage the proxies for your gateways just as it does for your services. If you deployed the managed data plane, and you want to have your gateway proxies managed, label and annotate the gateway namespace as described in Apply the Google-managed data plane.

If you choose to manage the gateways yourself, you need to restart the Pods in the GATEWAY_NAMESPACE when a new version of Anthos Service Mesh is released, so that they pick up the new control plane configuration. Before restarting the Pods, you should confirm that the new control plane has rolled out to your cluster by checking the version of the control plane using the Metrics Explorer custom query provided in Verify control plane metrics.

Configure endpoint discovery (only for multi-cluster installations)

Anthos Service Mesh managed control plane supports a single-project, single-network, multi-primary configuration, with the difference that the control plane is not installed in the cluster.

Before you continue, you should have already run the install script on each cluster as described in the previous steps. There is no need to indicate that a cluster is a primary cluster, this is the default behavior.

For every cluster, enable endpoints discovery by running the following commands once for every other cluster i=1..N in the mesh:

  1. For every cluster, ensure the kubectl config context points to the current cluster:

    export CTX=gke_PROJECT_ID_LOCATION_CLUSTER_NAME
    
  2. Enable endpoints discovery by running the following commands once for every other cluster i=1..N in the mesh:

    export CTX_i=gke_PROJECT_ID_LOCATION_i_CLUSTER_NAME_i
    
    ./bin/istioctl x create-remote-secret --context=${CTX_i} --name=CLUSTER_NAME_i | \
      kubectl apply -f - --context=${CTX}
    
  3. Verify the secret has been created by running the command:

    kubectl get secret istio-remote-secret-CLUSTER_NAME_i -n istio-system
    

    Verify the expected output:

    NAME                                   TYPE     DATA   AGE
    istio-remote-secret-CLUSTER_NAME_i   Opaque   1      44s
    
  4. If the current cluster is added to an existing multi-cluster mesh, let all the other clusters discover its endpoints by creating a secret corresponding to the current cluster in all the other clusters:

    ./bin/istioctl x create-remote-secret --context=${CTX} --name=CLUSTER_NAME | \
      kubectl apply -f - --context=${CTX_i}
    
  5. In addition, you can verify the secret for the other cluster also:

    kubectl get secret istio-remote-secret-CLUSTER_NAME -n istio-system --context ${CTX_i}
    

    Verify the expected output:

    NAME                            TYPE     DATA   AGE
    istio-remote-secret-CLUSTER_NAME   Opaque   1      44s
    

For more details and an example with two clusters, see enable endpoint discovery.

Deploy applications

Before you deploy applications, remove any previous istio-injection labels from their namespaces, and set the istio.io/rev:asm-managed-rapid label instead:

kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-managed-rapid --overwrite

At this point, you have successfully configured Anthos Service Mesh managed control plane. You are now ready to deploy your applications or you can deploy the Bookinfo sample application.

If you deploy an application in a multi-cluster setup, replicate the Kubernetes and control plane configuration in all clusters, unless you plan to limit that particular config to a subset of clusters. The configuration applied to a particular cluster is the source of truth for that cluster. In addition, if the cluster also runs Anthos Service Mesh 1.7, 1.8 or higher with Mesh CA in other namespaces, verify the application can communicate with the other applications controlled by the in-cluster control plane.

Verify control plane metrics

You can view the version of the control plane and data plane in Metrics Explorer.

To verify that your configuration works correctly:

  1. In the Google Cloud console, view the control plane metrics:

    Go to Metrics Explorer

  2. Choose your workspace and add a custom query using the following parameters:

    • Resource type: Kubernetes Container
    • Metric: Proxy Clients
    • Filter: container_name="cr-asm-managed-rapid"
    • Group By: revision label and proxy_version label
    • Aggregator sum
    • Period: 1 minute

    When you run Anthos Service Mesh with both a Google-managed and an in-cluster control plane, you can tell the metrics apart by their container name. For example, managed metrics have container_name="cr-asm-managed", while unmanaged metrics have container_name="discovery". To display metrics from both, remove the Filter on container_name="cr-asm-managed".

  3. Verify the control plane version and proxy version by inspecting the following fields in Metrics Explorer:

    • The revision field indicates the control plane version.
    • The proxy_version field indicates the proxy_version.
    • The value field indicates the number of connected proxies.

    For the current channel to Anthos Service Mesh version mapping, see Anthos Service Mesh versions per channel.

Migrate applications to managed Anthos Service Mesh

Managed Anthos Service Mesh only supports migration from Anthos Service Mesh 1.9 that uses Mesh CA.

To migrate to managed Anthos Service Mesh, perform the following steps:

  1. Run the script as indicated in the Apply the Google-managed control plane section.

  2. If you deployed both the Google-managed control plane and data plane:

    1. Enable data plane management:

      kubectl annotate --overwrite namespace NAMESPACE \
      mesh.cloud.google.com/proxy='{"managed":"true"}'
      
    2. Enable Anthos Service Mesh in the fleet:

      gcloud alpha container hub mesh enable --project=PROJECT_ID
      
  3. Replace the current namespace label with the istio.io/rev:asm-managed-rapid label:

    kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-managed-rapid \
        --overwrite
    
  4. Perform a rolling upgrade of deployments in the namespace:

    kubectl rollout restart deployment -n NAMESPACE
    
  5. Test your application to verify that the workloads function correctly.

  6. If you have workloads in other namespaces, repeat the previous steps for each namespace.

  7. If you deployed the application in a multi-cluster setup, replicate the Kubernetes and Istio configuration in all clusters, unless there is a desire to limit that configuration to a subset of clusters only. The configuration applied to a particular cluster is the source of truth for that cluster.

  8. Check that the metrics appear as expected by following the steps in Verify control plane metrics.

A cluster can have a mix of revisions, for example Anthos Service Mesh 1.7 and 1.8, and in-cluster control plane together. You can mix namespaces using different Anthos Service Mesh control plane revisions indefinitely.

If you are satisfied that your application works as expected, you can remove the in-cluster istiod after you switch all namespaces to the in-cluster control plane, or keep them as a backup - istiod will automatically scale down to use fewer resources. To remove, skip to Delete old control plane.

If you encounter problems, you can identify and resolve them by using the information in Resolving managed control plane issues and if necessary, roll back to the previous version.

Delete old control plane

After you install and confirm that all namespaces use the Google-managed control plane, you can delete the old control plane.

kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod -n istio-system --ignore-not-found=true

If you used istioctl kube-inject instead of automatic injection, or if you installed additional gateways, check the metrics for the control plane, and verify that the number of connected endpoints is zero.

Roll back

Perform the following steps if you need to roll back to the previous control plane version:

  1. Update workloads to be injected with the previous version of the control plane. In the following command, the revision value asm-191-1 is used only as an example. Replace the example value with the revision label of your previous control plane.

    kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-191-1 --overwrite
    
  2. Restart the Pods to trigger re-injection so the proxies have the previous version:

    kubectl rollout restart deployment -n NAMESPACE
    

The managed control plane will automatically scale to zero and not use any resource when not in use. The mutating webhooks and provisioning will remain and do not affect cluster behavior.

The gateway is now set to the asm-managed revision. To roll back, re-run the Anthos Service Mesh install command, which will re-deploy gateway pointing back to your in-cluster control plane:

kubectl -n istio-system rollout undo deploy istio-ingressgateway

Expect this output on success:

deployment.apps/istio-ingressgateway rolled back

Migrate a gateway to Google-managed control plane

  1. Create a Kubernetes deployment for the new version of the gateway by using the documentation. You must configure the existing Kubernetes Gateway service to select both the old and the new version, by using the selector field in the service configuration.

  2. Using these kubectl scale, command, gradually scale up the new deployment, while you also scale down the old deployment and check for signs of any service interruption/downtime. If the migration is successful, you should reach zero old instances while experiencing no service interruption.

Uninstall

Google-managed control plane will auto-scale to zero when no namespaces are using it, therefore no uninstallation is required.

Troubleshooting

To identify and resolve problems when using managed control plane, see Resolving managed control plane issues.

What's next?