Troubleshoot Cloud Service Mesh step-by-step
This section explains how to troubleshoot and resolve problems when using Cloud Service Mesh. If you need additional assistance, see Getting support.
Troubleshooting steps
Follow these general steps to troubleshoot Cloud Service Mesh:
- Use the automated configuration validation tools.
- Check if you have a common problem with a known solution.
- Narrow the scope of the problem.
- Review relevant logs and information.
- Gather diagnostic logs and seek help.
Use automated validation tools
Cloud Service Mesh includes automated diagnostic and configuration validation tools that can resolve problems and help you avoid them in the future. The following sections explain how to use these tools.
istioctl analyze
The istioctl analyze
diagnostic tool can detect common configuration problems.
Install istioctl
using these
instructions.
istioctl analyze
reads a cluster configuration and if it finds a problem,
provides informational messages and suggests remedies. It can run against a live
cluster or a set of local configuration files. It can also run against a
combination of the two, allowing you to find problems before you apply changes
to a cluster. For more information, see
Diagnose your Configuration with istioctl analyze
.
For more information about the errors that istioctl analyze
detects, see
Configuration Analysis Messages.
Analyze a live cluster
Analyze a live cluster by using the following command.
istioctl analyze -A
If istioctl analyze
detects a problem with your configuration, it will display
a message with helpful information to resolve it, if known. For
example, if you made the common mistake to not label your namespace correctly
to enable Istio sidecar injection, it would generate the following message:
Warn [IST0102] (Namespace default) The namespace is not enabled for Istio injection. Run 'kubectl label namespace default istio-injection=enabled' to enable it, or 'kubectl label namespace default istio-injection=disabled' to explicitly mark it as not needing injection
If the problem persists, see the next section to check if your problem is already known.
Check for common problems and solutions
You can save time by checking if your symptoms match an issue in these common problems and resolutions sections, grouped by Cloud Service Mesh functional area:
- Installation issues
- Managed control plane issues
- Observability issues
- Off-Google Cloud deployment issues
- Proxy issues
- Resource issues
- Scaling issues
- Security issues
- Traffic management issues
- Webhook issues
- Sidecar proxies issues
If this does not resolve your issue, see the next section.
Narrow the scope of the problem
Cloud Service Mesh consists of several technologies working together, which means that certain types of problems are associated with particular functional areas or components. Each of these components generate helpful logs of their own. Before you attempt to manually analyze the volume of information they provide, narrow the scope of your troubleshooting by answering the following questions:
- Does the issue occur within the control plane or the data plane, for example
istiod
or Envoy proxies? - In which functional area are you experiencing the issue, for example Networking, Telemetry, Security, etc.?
- Is there service-mesh wide traffic loss or in a specific deployment?
- Does the problem appear or worsen due to lack of ability to scale traffic in service mesh?
- Does the issue cause latency or other performance issues?
- Can you reproduce the issue on demand?
- Did the problem begin after a recent configuration change in Istio, GKE, etc.?
- Is there an increase or spike in traffic within the service mesh?
- Does this cluster have any noticeable features enabled or non-typical deployments?
- Do you observe high CPU or memory utilization? If so, what is the expected usaged at scale?
- Are there quota restrictions to consider?
View control plane status
The following commands can help you understand the status of the Cloud Service Mesh control plane:
kubectl get pods -n istio-system
kubectl describe -n istio-system
- For all pods in istio-system:
kubectl logs -n istio-system -l istio --all-containers
istioctl version
istioctl proxy-status
kubectl get configmap istio -o yaml && kubectl get configmap istio-sidecar-injector -o yaml
kubectl top pods -n istio-system
Use the following commands to understand the scale of the deployment:
kubectl get nodes
kubectl get services --all-namespaces
kubectl get pods --all-namespaces
Review relevant logs and information
After you narrow the scope of the problem, you can focus on certain logs and information more effectively. To learn about the logs that Cloud Service Mesh generates and how to interpret the information they contain, see Interpreting Cloud Service Mesh logs.