Troubleshooting Anthos Service Mesh step-by-step

This section explains how to troubleshoot and resolve problems when using Anthos Service Mesh. If you need additional assistance, see Getting support.

Troubleshooting steps

Follow these general steps to troubleshoot Anthos Service Mesh most efficiently:

  1. Use the automated configuration validation tools.
  2. Check if you have a common problem with a known solution.
  3. Narrow the scope of the problem.
  4. Review relevant logs and information.
  5. Gather diagnostic logs and seek help.

kpt errors during installation

When you install Anthos Service Mesh using install_asm with an unsupported version of kpt, install_asm outputs the following error messages:

    2021-07-14T15:54:58.380312 install_asm_1_9_3: Downloading ASM..
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 41.7M  100 41.7M    0     0  31.1M      0  0:00:01  0:00:01 --:--:-- 31.1M
    2021-07-14T15:54:59.777425 install_asm_1_9_3: Downloading ASM kpt package...
    2021-07-14T15:54:59.805267 install_asm_1_9_3: Running: '/usr/bin/kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@release-1.9-asm asm'
    2021-07-14T15:54:59.832100 install_asm_1_9_3: -------------
    error: unknown flag: --auto-set
    2021-07-14T15:54:59.907493 install_asm_1_9_3: [WARNING]: Failed, retrying...(1 of 3)
    2021-07-14T15:55:01.936275 install_asm_1_9_3: Running: '/usr/bin/kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@release-1.9-asm asm'
    2021-07-14T15:55:01.963543 install_asm_1_9_3: -------------
    error: unknown flag: --auto-set
    2021-07-14T15:55:02.043638 install_asm_1_9_3: [WARNING]: Failed, retrying...(2 of 3)
    2021-07-14T15:55:04.074541 install_asm_1_9_3: Running: '/usr/bin/kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@release-1.9-asm asm'
    2021-07-14T15:55:04.101990 install_asm_1_9_3: -------------
    error: unknown flag: --auto-set
    2021-07-14T15:55:04.176750 install_asm_1_9_3: [WARNING]: Failed, retrying...(3 of 3)
    

If you see these errors, download the latest version of install_asm. The install_asm script must be one of the following versions or higher:

  • For version 1.8 - 1.8.6-asm.5+config1
  • For version 1.9 - 1.9.6-asm.2+config1
  • For version 1.10 - 1.10.2-asm.3+config1

If you download the anthos-service-mesh-package to install Anthos Service Mesh using istioctl install, if you have an unsupported version of kpt, you see the following error messages:

    Package "asm":
    Fetching https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages@release-1.10-asm
    From https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages
     * branch            release-1.10-asm -> FETCH_HEAD
    Error: Kptfile at "https:/github.com/GoogleCloudPlatform/anthos-service-mesh-packages/asm@release-1.10-asm" has an old version ("v1alpha1") of the Kptfile schema.
    Please update the package to the latest format by following https://kpt.dev/installation/migration.
    

Anthos Service Mesh installation requires a pre-1.x version of kpt. It is required that your shell session is set to use kpt version 0.39.2. You can run the following command to get your kpt version:

   kpt version

The output is similar to the following:

    0.39.2

If you see a version of kpt that is greater than version 1.0, see Setting up your environment to download the required version for your operating system..

Use automated validation tools

Anthos Service Mesh includes automated diagnostic and configuration validation tools that can resolve problems and help you avoid them in the future. The following sections explain how to use these tools.

istioctl analyze

The istioctl analyze diagnostic tool can detect common configuration problems. Install istioctl using these instructions.

istioctl analyze reads a cluster configuration and if it finds a problem, provides informational messages and suggests remedies. It can run against a live cluster or a set of local configuration files. It can also run against a combination of the two, allowing you to find problems before you apply changes to a cluster. For more information, see Diagnose your Configuration with istioctl analyze. For more information about the errors that istioctl analyze detects, see Configuration Analysis Messages.

Analyze a live cluster

Analyze a live cluster by using the following command.

istioctl analyze -A

If istioctl analyze detects a problem with your configuration, it will display a message with helpful information to resolve it, if known. For example, if you made the common mistake to not label your namespace correctly to enable Istio sidecar injection, it would generate the following message:

Warn [IST0102] (Namespace default) The namespace is not enabled for Istio injection.
Run 'kubectl label namespace default istio-injection=enabled' to enable it,
or 'kubectl label namespace default istio-injection=disabled'
to explicitly mark it as not needing injection

If the problem persists, see the next section to check if your problem is already known.

Check for common problems and solutions

You can save time by checking if your symptoms match an issue in these common problems and resolutions sections, grouped by Anthos Service Mesh functional area:

If this does not resolve your issue, see the next section.

Narrow the scope of the problem

Anthos Service Mesh consists of several technologies working together, which means that certain types of problems are associated with particular functional areas or components. Each of these components generate helpful logs of their own. Before you attempt to manually analyze the volume of information they provide, narrow the scope of your troubleshooting by answering the following questions:

  • Does the issue occur within the control plane or the data plane, for example istiod or Envoy proxies?
  • In which functional area are you experiencing the issue, for example Networking, Telemetry, Security, etc.?
  • Is there service-mesh wide traffic loss or in a specific deployment?
  • Does the problem appear or worsen due to lack of ability to scale traffic in service mesh?
  • Does the issue cause latency or other performance issues?
  • Can you reproduce the issue on demand?
  • Did the problem begin after a recent configuration change in Istio, GKE, etc.?
  • Is there an increase or spike in traffic within the service mesh?
  • Does this cluster have any noticeable features enabled or non-typical deployments?
  • Do you observe high CPU or memory utilization? If so, what is the expected usaged at scale?
  • Are there quota restrictions to consider?

View control plane status

The following commands can help you understand the status of the Anthos Service Mesh control plane:

  • kubectl get pods -n istio-system
  • kubectl describe -n istio-system
  • For all pods in istio-system: kubectl logs -n istio-system -l istio --all-containers
  • istioctl version
  • istioctl proxy-status
  • kubectl get configmap istio -o yaml && kubectl get configmap istio-sidecar-injector -o yaml
  • kubectl top pods -n istio-system

Use the following commands to understand the scale of the deployment:

  • kubectl get nodes
  • kubectl get services --all-namespaces
  • kubectl get pods --all-namespaces

Review relevant logs and information

After you narrow the scope of the problem, you can focus on certain logs and information more effectively. To learn about the logs that Anthos Service Mesh generates and how to interpret the information they contain, see Interpreting Anthos Service Mesh logs.