This section explains common Anthos Service Mesh problems and how to resolve them. If you need additional assistance, see Getting support.
In Anthos Service Mesh telemetry, the Envoy proxies call the Google Cloud's operations suite APIs periodically to report telemetry data. The type of the API call determines its frequency:
- Logging: every ~10 seconds
- Metrics: every ~1 minute
- Edges (Context API/Topology view): incremental reports every ~1 minute, with full reports every ~10 minutes.
- Traces: determined by the sampling frequency you configure (typically, one out of every 100 requests).
The telemetry dashboards gather data from both Confluence and Google Cloud's operations suite to display the various service-focused dashboards.
Services dashboard is missing a service
The dashboard only displays HTTP(S)/gRPC services. If your service should be in the list, verify that Anthos Service Mesh telemetry identifies it as an HTTP service.
If your service remains missing, verify that a Kubernetes service configuration exists in your cluster.
Review the list of all Kubernetes services:
kubectl get services --all-namespaces
Review the list of Kubernetes services in a specific namespace:
kubectl get services -n YOUR_NAMESPACE
Missing or incorrect metrics for services
If there are missing or incorrect metrics for services in the Services dashboard, see the following sections for potential resolutions.
Verify Sidecar proxies exist and have been injected properly
The namespace might not have a label for automatic injection, or manual injection has failed. Confirm that the pods in the namespace have at least two containers and that one of those containers is the istio-proxy container:
kubectl -n YOUR_NAMESPACE get pods
Verify that telemetry configuration exists
Use EnvoyFilters in the istio-system
namespace to configure telemetry.
Without that configuration, Anthos Service Mesh will not report data to Google Cloud's operations suite.
Verify that the Google Cloud's operations suite configuration (and metadata exchange configuration) exists:
kubectl -n istio-system get envoyfilter
The expected output looks similar to the following:
NAME AGE metadata-exchange-1.4 13d metadata-exchange-1.5 13d stackdriver-filter-1.4 13d stackdriver-filter-1.5 13d ...
To further confirm that the Google Cloud's operations suite filter is properly configured, gather a configuration dump from each proxy and look for the presence of the Google Cloud's operations suite filter:
kubectl exec YOUR_POD_NAME -n YOUR_NAMESPACE -c istio-proxy curl localhost:15000/config_dump
In output of the previous command, look for the Google Cloud's operations suite filter, which looks like the following:
"config": { "root_id": "stackdriver_inbound", "vm_config": { "vm_id": "stackdriver_inbound", "runtime": "envoy.wasm.runtime.null", "code": { "local": { "inline_string": "envoy.wasm.null.stackdriver" } } }, "configuration": "{....}" }
Verify that Anthos Service Mesh identifies an HTTP service
Metrics will not show up in the user interface if the service port for the Kubernetes
service is not named with an http-
prefix. Confirm that the service has the
proper names for its ports.
Verify the Cloud Monitoring API is enabled for the project
Confirm that the Cloud Monitoring API is enabled in the APIs & Services dashboard in Google Cloud console, which is the default.
Verify no errors reporting to the Cloud Monitoring API
In the Google Cloud console API & Services dashboard, open the Traffic By Response Code graph URL:
https://console.cloud.google.com/apis/api/monitoring.googleapis.com/metrics?folder=&organizationId=&project=YOUR_PROJECT_ID
If you see error messages, it might be an issue that warrants further
investigation. In particular, look for a large number of 429
error messages,
which indicates a potential quota issue. See the next section for
troubleshooting steps.
Verify correct quota for the Cloud Monitoring API
In the Google Cloud console, open the IAM & Admin
menu and verify there is a Quotas
option. You can access this page directly using the URL:
https://console.cloud.google.com/iam-admin/quotas?project=YOUR_PROJECT_ID
This page shows the full set of quotas for the project, where you can search for
Cloud Monitoring API
.
Verify no error logs in Envoy proxies
Review the logs for the proxy in question, searching for error message instances:
kubectl -n YOUR_NAMESPACE logs YOUR_POD_NAME -c istio-proxy
However, ignore the warning messages like the following, which are normal:
[warning][filter] [src/envoy/http/authn/http_filter_factory.cc:83] mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/ [warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:91] gRPC config stream closed: 13
Missing or incorrect telemetry data for services
By default, Cloud Monitoring and Cloud Logging are enabled in your Google Cloud project when you install Anthos Service Mesh. To report telemetry data, each sidecar proxy that is injected into your service pods calls the Cloud Monitoring API and the Cloud Logging API. After deploying workloads, it takes about one or two minutes for telemetry data to be displayed in the Google Cloud console. Anthos Service Mesh automatically keeps the service dashboards up to date:
- For metrics, the sidecar proxies call the Cloud Monitoring API approximately every minute.
- To update the Topology graph, the sidecar proxies send incremental reports approximately every minute and full reports about every ten minutes.
- For logging, the sidecar proxies call the Cloud Logging API approximately every ten seconds.
- For tracing, you have to enable Cloud Trace. Traces are reported according to the sampling frequency that you have configured (typically, one out of every 100 requests).
Metrics are displayed only for HTTP services on the Anthos Service Mesh Metrics page. If you don't see any metrics, verify that all the pods in the namespace for your application's services have sidecar proxies injected:
kubectl get pod -n YOUR_NAMESPACE --all
In the output, notice that the READY
column shows two containers for each of
your workloads: the primary container and the container for the sidecar proxy.
Additionally, the Services dashboard only displays client metrics, so telemetry data may not appear if the client is not in the mesh.