Resolving traffic management issues in Anthos Service Mesh

This section explains common Anthos Service Mesh problems and how to resolve them. If you need additional assistance, see Getting support.

API server connection errors in Istiod logs

Istiod cannot contact the apiserver if you see errors similar to the following:

error k8s.io/client-go@v0.18.0/tools/cache/reflector.go:125: Failed to watch *crd.IstioSomeCustomResource`…dial tcp 10.43.240.1:443: connect: connection refused

You can use the regular expression string /error.*cannot list resource/ to find this error in the logs.

This error is usually transient and if you reached the proxy logs using kubectl, the issue might be resolved already. This error is usually caused by events that make the API server temporarily unavailable, such as when an API server that is not in a high availability configuration reboots for an upgrade or autoscaling change.

The istio-init container crashes

This problem can occur when the pod iptables rules are not applied to the pod network namespace. This can be caused by:

  • An incomplete istio-cni installation
  • Insufficient workload pod permissions (missing CAP_NET_ADMIN permission)

If you use the Istio CNI plugin, verify that you followed the instructions completely. Verify that the istio-cni-node container is ready, and check the logs. If the problem persists, establish a secure shell (SSH) into the host node and search the node logs for nsenter commands, and see if there are any errors present.

If you don't use the Istio CNI plugin, verify that the workload pod has CAP_NET_ADMIN permission, which is automatically set by the sidecar injector.

Connection refused after pod starts

When a Pod starts and gets connection refused trying to connect to an endpoint, the problem might be that the application container started before the isto-proxy container. In this case, the application container sends the request to istio-proxy, but the connection is refused because istio-proxy isn't listening on the port yet.

In this case, you can:

  • Modify your application's startup code to make continuous requests to the istio-proxy health endpoint until the application receives a 200 code. The istio-proxy health endpoint is:

    http://localhost:15020/healthz/ready
    
  • Add a retry request mechanism to your application workload.

Listing gateways returns empty

Symptom: When you list Gateways using kubectl get gateway --all-namespaces after successfully creating an Anthos Service Mesh Gateway, the command returns No resources found.

This problem can happen on GKE 1.20 and later because the GKE Gateway controller automatically installs the GKE Gateway.networking.x-k8s.io/v1alpha1 resource in clusters. To workaround the issue:

  1. Check if there are multiple gateway custom resources in the cluster:

    kubectl api-resources | grep gateway
    

    Example output:

    gateways                          gw           networking.istio.io/v1beta1            true         Gateway
    gatewayclasses                    gc           networking.x-k8s.io/v1alpha1           false        GatewayClass
    gateways                          gtw          networking.x-k8s.io/v1alpha1           true         Gateway

  2. If the list shows entries other than Gateways with the apiVersion networking.istio.io/v1beta1, use the full resource name or the distinguishable short names in the kubectl command. For example, run kubectl get gw or kubectl get gateways.networking.istio.io instead of kubectl get gateway to make sure istio Gateways are listed.

For more information on this issue, see Kubernetes Gateways and Istio Gateways.