Resolving resource limit issues in Anthos Service Mesh
This section explains common Anthos Service Mesh problems and how to resolve them. If you need additional assistance, see Getting support.
Anthos Service Mesh resource limit problems can be caused by any of the following:
LimitRangeobjects created in the
istio-systemnamespace or any namespace with automatic sidecar injection enabled.
- User-defined limits that are set too low.
- Nodes run out of memory or other resources.
Potential symptoms of resource problems:
- Anthos Service Mesh repeatedly not receiving configuration from
istiodindicated by the error,
Envoy proxy NOT ready. Seeing this error a few times at startup is normal, but otherwise it is a concern.
- Networking problems with some pods or nodes that become unreachable.
STALEstatuses in the output.
OOMKilledmessages in the logs of a node.
- Memory usage by containers:
kubectl top pod POD_NAME --containers.
- Memory usage by pods inside a node:
kubectl top node my-node.
- Envoy out of memory:
kubectl get podsshows status
OOMKilledin the output.
Istio sidecars take a long time to receive configuration
Slow configuration propagation can occur due to insufficient resources allocated
istiod or an excessively large cluster size.
There are several possible solutions to this problem:
If your monitoring tools (prometheus, stackdriver, etc.) show high utilization of a resource by
istiod, increase the allocation of that resource, for example increase the CPU or memory limit of the
istioddeployment. This is a temporary solution and we recommended that you investigate methods for reducing resource consumption.
If you encounter this issue in a large cluster/deployment, reduce the amount of configuration state pushed to each proxy by configuring Sidecar resources.
If the problem persists, try horizontally scaling
If all other troubleshooting steps fail to resolve the problem, report a bug detailing your deployment and the observed problems. Follow these steps to include a CPU/Memory profile in the bug report if possible, along with a detailed description of cluster size, number of pods, number of services, etc.