Configure managed Anthos Service Mesh

Overview

Managed Anthos Service Mesh is a Google-managed control plane and an optional data plane that you simply configure. Google handles their reliability, upgrades, scaling and security for you in a backward-compatible manner. This guide explains how to set up or migrate applications to managed Anthos Service Mesh in a single or multi-cluster configuration with asmcli.

To learn about the supported features and limitations of managed Anthos Service Mesh, see Managed Anthos Service Mesh supported features.

Prerequisites

As a starting point, this guide assumes that you have:

For a faster installation, your clusters must have Workload Identity enabled. If Workload Identity isn't enabled, the installation will automatically enable it.

Requirements

  • One or more clusters with a supported version of GKE, in one of the supported regions.

  • Your clusters must be registered to a fleet. This can be done separately prior to the installation, or as part of the installation by passing the --enable-registration and --fleet-id flags.

  • Your project must have the Service Mesh Feature enabled. You could enable it as part of the installation by passing --enable-gcp-components, or by running the following command:

    gcloud beta container hub mesh enable --project=FLEET_PROJECT_ID
    

    where FLEET_PROJECT_ID is the project-id of the fleet host project.

  • Managed Anthos Service Mesh can use multiple GKE clusters in a single-project single-network or a multi-project single-network environment. If you join clusters that are not in the same project, they must be registered to the same fleet host project, and the clusters must be in a shared VPC configuration together on the same network. In addition, we recommend that you have one project to host the shared VPC, and separate service projects for creating clusters. For more information, see Setting up clusters with Shared VPC.

Limitations

We recommend that you review the list of managed Anthos Service Mesh supported features and limitations. In particular, note the following:

  • The IstioOperator API isn't supported since its main purpose is to control in-cluster components.

    You can use the migration tool included with asmcli to automatically convert other IstioOperator optional features to be compatible with Google-managed control plane. For more information, see Enabling optional features on managed Anthos Service Mesh and Migrate from IstioOperator.

  • Managed data plane limitations:

    • This Preview release of the managed data plane is available only for new deployments of the managed control plane. If you previously deployed the managed control plane, and you want to deploy the managed data plane, you must rerun the installation tool as described in Apply the Google-managed control plane.

    • The managed date plane is available on the Regular and Rapid release channels.

Download the installation tool

  1. Download the latest version of the tool that corresponds to your release channel to the current working directory.

    • For the Stable Release Channel:

        curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.10 > asmcli
      
    • For the Regular Release Channel:

        curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.11 > asmcli
      
    • For the Rapid Release Channel:

        curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.12 > asmcli
      
  2. Make the tool executable:

    chmod +x asmcli
    

Configure each cluster

Use the following steps to configure managed Anthos Service Mesh for each cluster in your mesh.

Get cluster credentials

Retrieve the appropriate credentials. The following command will also point the kubectl context to the target cluster.

gcloud container clusters get-credentials CLUSTER_NAME \
    --zone LOCATION \
    --project PROJECT_ID

Apply the Google-managed control plane

Run the installation tool for each cluster that will use managed Anthos Service Mesh. We recommend that you include the following options:

  • --enable-registration --fleet_id FLEET_PROJECT_ID These two flags register the cluster to a fleet, where the FLEET_ID is the project-id of the fleet host project. If using a single-project, the FLEET_PROJECT_ID is the same as PROJECT_ID, the fleet host project and the cluster project are the same. In more complex configurations like multi-project, we recommend using a separate fleet host project.

  • --enable-all. This flag enables both required components and registration.

These options are required if you also want to deploy the Google-managed data plane. For a full list of options, see the asmcli reference page.

Before you apply the Google-managed control plane, you must select a release channel.

The asmcli tool configures the managed control plane directly using tools and logic inside of the CLI tool. Use the set of instructions below depending on your preferred CA.

If your organization enforces VPC Service Control for your project, you must configure an additional flag: '--use-vpcsc'. Otherwise the installation will fail security controls. Support for the VPC-SC feature is available in the regular and rapid channels. If you are using 1.11, you must use the experimental command.

If your cluster is a GKE autopilot cluster, check at the end of this section for additional requirements and flags to use with the 'asmcli' command.

Certificate Authorities

Select a Certificate Authority to use for your mesh.

Mesh CA

Run the following command to install the control plane with default features and Mesh CA. Enter your values in the provided placeholders.

  ./asmcli install \
      -p PROJECT_ID \
      -l LOCATION \
      -n CLUSTER_NAME \
      --fleet_id FLEET_PROJECT_ID \
      --managed \
      --verbose \
      --output_dir CLUSTER_NAME \
      --enable-all

CA Service

  1. Follow the steps in Configure Certificate Authority Service.
  2. Run the following command to install the control plane with default features and Certificate Authority Service. Enter your values in the provided placeholders.
  ./asmcli install \
      -p PROJECT_ID \
      -l LOCATION \
      -n CLUSTER_NAME \
      --fleet_id FLEET_PROJECT_ID \
      --managed \
      --verbose \
      --output_dir CLUSTER_NAME \
      --enable-all \
      --ca gcp_cas \
      --ca_pool pool_name

The tool will download all the files for configuring the managed control plane to the specified --output_dir, installing the istioctl tool and sample applications. The steps in this guide assume that you run istioctl from the root of the installation directory, with istioctl present in its /bin subdirectory.

If you rerun asmcli on the same cluster, it overwrites the existing control plane configuration. Be sure to specify the same options and flags if you want the same configuration.

Note that an ingress gateway isn't automatically deployed with the control plane. Decoupling the deployment of the ingress gateway and control plane allows you to more easily manage your gateways in a production environment. If the cluster needs an ingress gateway or an egress gateway, see Deploy gateways. To enable other optional features, see Enabling optional features on managed Anthos Service Mesh.

GKE Autopilot

GKE Autopilot is only supported with Anthos Service Mesh in the regular and rapid channels. The cluster needs to be in the GKE rapid channel with version 1.21.3+. In order to adapt to the GKE Autopilot resource limit, the default proxy resource requests and limits are set to 500m CPU and 512 Mb memory. Autopilot clusters require the managed CNI so you must include the flag --use_managed_cni.

./asmcli install \
    -p PROJECT_ID \
    -l LOCATION \
    -n CLUSTER_NAME \
    --managed \
    --verbose \
    --output_dir CLUSTER_NAME \
    --use_managed_cni \
    --channel rapid \
    --enable-all

Verify the control plane has been provisioned

The asmcli tool creates a ControlPlaneRevision custom resource in the cluster. This resource's status is updated when the managed control plane is provisioned or fails provisioning.

  1. Identify what release channel a namespace is using:

    kubectl get namespace NAMESPACE -o jsonpath='{.metadata.labels.istio\.io/rev}{"\n"}'
    
  2. Inspect the status of the resource:

    kubectl describe controlplanerevision RELEASE_CHANNEL -n istio-system
    

    Replacing RELEASE_CHANNEL with the appropriate channel: asm-managed, asm-managed-stable, or asm-managed-rapid.

    The output is similar to:

    Name:         asm-managed
    
    …
    
    Status:
      Conditions:
        Last Transition Time:  2021-08-05T18:56:32Z
        Message:               The provisioning process has completed successfully
        Reason:                Provisioned
        Status:                True
        Type:                  Reconciled
        Last Transition Time:  2021-08-05T18:56:32Z
        Message:               Provisioning has finished
        Reason:                ProvisioningFinished
        Status:                True
        Type:                  ProvisioningFinished
        Last Transition Time:  2021-08-05T18:56:32Z
        Message:               Provisioning has not stalled
        Reason:                NotStalled
        Status:                False
        Type:                  Stalled
    

The Reconciled condition determines whether the managed control plane is running correctly. If true, the control plane is running successfully. Stalled determines whether the managed control plane provisioning process has encountered an error. If Stalled, the Message field contains more information about the specific error. See Stalled codes for more information about possible errors.

Zero-touch upgrades

Once the Google-managed control plane is installed, Google will automatically upgrade it when new releases or patches become available.

It is not mandatory to upgrade the data plane every time a control plane upgrade happens. The control plane continues to work with all the proxies in the support window, but it is recommended for getting access to the latest data plane features, fixes, and performance improvements. To upgrade to the latest published proxy image in your channel, you can perform either a rolling restart, when convenient, or apply the Google-managed data plane which will do it automatically for you.

Apply the Google-managed data plane (optional)

If you want Google to manage upgrades of the proxies, enable the Google-managed data plane. If enabled, the sidecar proxies and injected gateways are automatically upgraded in conjunction with the managed control plane.

Note that the Google-managed data plane requires the Istio Container Network Interface (CNI) plugin, which is now enabled by default when you deploy the Google-managed control plane.

In the feature preview, managed data plane upgrades proxies by evicting Pods that are running older versions of the proxy. The evictions are done in an orderly manner honoring the Pod disruption budget and controlling the rate of change.

This Preview release of managed data plane doesn't manage the following:

  • Uninjected pods.
  • Manually injected pods using istioctl kube-inject.
  • Jobs
  • Stateful Sets
  • DaemonSet

The managed data plane is available on both the Rapid and Regular release channels.

To enable the Google-managed data plane:

  1. Enable data plane management:

    kubectl annotate --overwrite namespace NAMESPACE \
    mesh.cloud.google.com/proxy='{"managed":"true"}'
    

    Alternatively, you can enable the Google-managed data plane for a specific Pod by annotating it with the same annotation. When you annotate a specific Pod, that Pod uses the Google-managed sidecar proxy and the rest of the workloads use the unmanaged sidecar proxies.

  2. Repeat the previous step for each namespace that you want a managed data plane.

It could take up to ten minutes for the data plane controller to be ready to manage the proxies in the cluster. Run the following command to check the status:

if kubectl get dataplanecontrols -o custom-columns=REV:.spec.revision,STATUS:.status.state | grep rapid | grep -v none > /dev/null; then echo "Managed Data Plane is ready."; else echo "Managed Data Plane is NOT ready."; fi

When the data plane controller is ready, the command will output: Managed Data Plane is ready.

If the status for the data plane controller doesn't become ready after waiting over ten minutes, see Managed data plane status for troubleshooting tips.

If you want to disable the Google-managed data plane and revert back to managing the sidecar proxies yourself, change the annotation:

kubectl annotate --overwrite namespace NAMESPACE \
  mesh.cloud.google.com/proxy='{"managed":"false"}'

Configure endpoint discovery (only for multi-cluster installations)

Configure endpoint discovery between public clusters

If you are operating on public clusters (non-private clusters), refer instead to Configure endpoint discovery between public clusters.

Configure endpoint discovery between private clusters

When using GKE private clusters, you must configure the cluster control plane endpoint to be the public endpoint instead of the private endpoint. Please refer to Configure endpoint discovery between private clusters.

For an example application with two clusters, see HelloWorld service example.

Deploy applications

Before you deploy applications, remove any previous istio-injection labels from their namespaces, and set the istio.io/rev=asm-managed-rapid label instead. If you are using a different revision label, click asm-managed-rapid, and replace it with the applicable label: asm-managed for Regular or asm-managed-stable for Stable.

The revision label corresponds to a release channel:

Revision label Channel
istio.io/rev=asm-managed Regular
istio.io/rev=asm-managed-rapid Rapid
istio.io/rev=asm-managed-stable Stable
kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-managed-rapid --overwrite

At this point, you have successfully configured Anthos Service Mesh managed control plane. If you also applied the managed data plane, restart your workloads. If not, perform a rolling update. You are now ready to deploy your applications or you can deploy the Bookinfo sample application.

If you deploy an application in a multi-cluster setup, replicate the Kubernetes and control plane configuration in all clusters, unless you plan to limit that particular config to a subset of clusters. The configuration applied to a particular cluster is the source of truth for that cluster. In addition, if the cluster also runs Anthos Service Mesh or Certificate Authority Service with Mesh CA in other namespaces, verify the application can communicate with the other applications controlled by the in-cluster control plane.

Verify control plane metrics

You can view the version of the control plane and data plane in Metrics Explorer.

To verify that your configuration works correctly:

  1. In the Cloud Console, view the control plane metrics:

    Go to Metrics Explorer

  2. Choose your workspace and add a custom query using the following parameters:

    • Resource type: Kubernetes Container
    • Metric: Proxy Clients
    • Filter: container_name="cr-asm-managed-rapid"
    • Group By: revision label and proxy_version label
    • Aggregator sum
    • Period: 1 minute

    When you run Anthos Service Mesh with both a Google-managed and an in-cluster control plane, you can tell the metrics apart by their container name. For example, managed metrics have container_name="cr-asm-managed", while unmanaged metrics have container_name="discovery". To display metrics from both, remove the Filter on container_name="cr-asm-managed".

  3. Verify the control plane version and proxy version by inspecting the following fields in Metrics Explorer:

    • The revision field indicates the control plane version.
    • The proxy_version field indicates the proxy_version.
    • The value field indicates the number of connected proxies.

    For the current channel to Anthos Service Mesh version mapping, see Anthos Service Mesh versions per channel.

Migrate applications to managed Anthos Service Mesh

To migrate to managed Anthos Service Mesh, perform the following steps:

  1. Run the tool as indicated in the Apply the Google-managed control plane section.

  2. (Optional) If you want to use the Google-managed data plane, enable data plane management:

    kubectl annotate --overwrite namespace NAMESPACE \
    mesh.cloud.google.com/proxy='{"managed":"true"}'
    
  3. (Optional) If you want to use the Google-managed data plane, enable Anthos Service Mesh in the fleet:

    gcloud alpha container hub mesh enable --project=PROJECT_ID
    
  4. Replace the current namespace label with the istio.io/rev=asm-managed-rapid label:

    kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-managed-rapid \
        --overwrite
    
  5. Perform a rolling upgrade of deployments in the namespace:

    kubectl rollout restart deployment -n NAMESPACE
    
  6. Test your application to verify that the workloads function correctly.

  7. If you have workloads in other namespaces, repeat the previous steps for each namespace.

  8. If you deployed the application in a multi-cluster setup, replicate the Kubernetes and Istio configuration in all clusters, unless there is a desire to limit that configuration to a subset of clusters only. The configuration applied to a particular cluster is the source of truth for that cluster.

  9. Check that the metrics appear as expected by following the steps in Verify control plane metrics.

If you are satisfied that your application works as expected, you can remove the in-cluster istiod after you switch all namespaces to the in-cluster control plane, or keep them as a backup - istiod will automatically scale down to use fewer resources. To remove, skip to Delete old control plane.

If you encounter problems, you can identify and resolve them by using the information in Resolving managed control plane issues and if necessary, roll back to the previous version.

Delete old control plane

After you install and confirm that all namespaces use the Google-managed control plane, you can delete the old control plane.

kubectl delete Service,Deployment,HorizontalPodAutoscaler,PodDisruptionBudget istiod -n istio-system --ignore-not-found=true

If you used istioctl kube-inject instead of automatic injection, or if you installed additional gateways, check the metrics for the control plane, and verify that the number of connected endpoints is zero.

Roll back

Perform the following steps if you need to roll back to the previous control plane version:

  1. Update workloads to be injected with the previous version of the control plane. In the following command, the revision value asm-191-1 is used only as an example. Replace the example value with the revision label of your previous control plane.

    kubectl label namespace NAMESPACE istio-injection- istio.io/rev=asm-191-1 --overwrite
    
  2. Restart the Pods to trigger re-injection so the proxies have the previous version:

    kubectl rollout restart deployment -n NAMESPACE
    

The managed control plane will automatically scale to zero and not use any resource when not in use. The mutating webhooks and provisioning will remain and do not affect cluster behavior.

The gateway is now set to the asm-managed revision. To roll back, re-run the Anthos Service Mesh install command, which will re-deploy gateway pointing back to your in-cluster control plane:

kubectl -n istio-system rollout undo deploy istio-ingressgateway

Expect this output on success:

deployment.apps/istio-ingressgateway rolled back

Uninstall

Google-managed control plane auto-scales to zero when no namespaces are using it. For detailed steps, see Uninstall Anthos Service Mesh.

Troubleshooting

To identify and resolve problems when using managed control plane, see Resolving managed control plane issues.

ControlPlaneRevision Stalled Codes

There are multiple reasons the Stalled condition could become true in the ControlPlaneRevisions status.

Reason Message Description
PreconditionFailed Only GKE memberships are supported but ${CLUSTER_NAME} is not a GKE cluster. The current cluster does not appear to be a GKE cluster. Managed control plane only works on GKE clusters.
Unsupported ControlPlaneRevision name: ${NAME} The name of the ControlPlaneRevision must be one of the following:
  • asm-managed
  • asm-managed-rapid
  • asm-managed-stable
Unsupported ControlPlaneRevision namespace: ${NAMESPACE} The namespace of the ControlPlaneRevision must be istio-system.
Unsupported channel ${CHANNEL} for ControlPlaneRevision with name${NAME}. Expected ${OTHER_CHANNEL} The name of the ControlPlaneRevision must match the channel of the ControlPlaneRevision with the following:
  • asm-managed -> regular
  • asm-managed-rapid -> rapid
  • asm-managed-stable -> stable
Channel must not be omitted or blank Channel is a required field on the ControlPlaneRevision. It is missing or blank on the custom resource.
Unsupported control plane revision type: ${TYPE} managed_service is the only allow field for the ControlPlaneRevisionType field.
Unsupported Kubernetes version: ${VERSION} Kubernetes versions 1.15+ are supported.
Workload identity is not enabled Please enable workload identity on your cluster.
Unsupported workload pool: ${POOL} The workload pool must be of the form ${PROJECT_ID}.svc.id.goog.
Cluster project and environ project do not match Clusters must be part of the same project in which they are registered to the fleet.
ProvisioningFailed An error occurred updating cluster resources Google was unable to update your in-cluster resources such as CRDs and webhooks.
MutatingWebhookConfiguration "istiod-asm-managed" contains a webhook with URL of ${EXISTING_URL} but expected ${EXPECTED_URL} Google will not overwrite existing webhooks to avoid breaking your installation. Update this manually if it is desired behavior.
ValidatingWebhookConfiguration ${NAME} contains a webhook with URL of ${EXISTING_URL} but expected ${EXPECTED_URL} Google will not overwrite existing webhooks to avoid breaking your installation. Update this manually if it is desired behavior.

What's next?