Resolving managed Anthos Service Mesh issues

This document explains common Anthos Service Mesh problems and how to resolve them, such as when a pod is injected with istio.istio-system, the installation tool generates errors such as HTTP 400 status codes and cluster membership errors.

If you need additional assistance troubleshooting Anthos Service Mesh, see Getting support.

Pod is injected with istiod.istio-system

This can occur if you did not replace the istio-injection: enabled label.

In addition, verify the mutating webhooks configuration by using the following command:

kubectl get mutatingwebhookconfiguration

...
istiod-asm-managed
…
# may include istio-sidecar-injector

kubectl get mutatingwebhookconfiguration   istio-sidecar-injector -o yaml

# Run debug commands
export T=$(echo '{"kind":"TokenRequest","apiVersion":"authentication.k8s.io/v1","spec":{"audiences":["istio-ca"], "expirationSeconds":2592000}}' | kubectl create --raw /api/v1/namespaces/default/serviceaccounts/default/token -f - | jq -j '.status.token')

export INJECT_URL=$(kubectl get mutatingwebhookconfiguration istiod-asmca -o json | jq -r .webhooks[0].clientConfig.url)
export ISTIOD_ADDR=$(echo $INJECT_URL | sed s/inject.*//)

curl -v -H"Authorization: Bearer $T" $ISTIOD_ADDR/debug/configz

The install tool generates HTTP 400 errors

The installation tool might generate HTTP 400 errors like the following:

HealthCheckContainerError, message: Cloud Run error: Container failed to start.
Failed to start and then listen on the port defined by the PORT environment
variable. Logs for this revision might contain more information.

The error can occur if you did not enable Workload Identity on your Kubernetes cluster, which you can do by using the following command:

export CLUSTER_NAME=...
export PROJECT_ID=...
export LOCATION=...
gcloud container clusters update $CLUSTER_NAME --zone $LOCATION \
    --workload-pool=$PROJECT_ID.svc.id.goog

Managed data plane state

The following command displays the state of the managed data plane:

gcloud alpha container fleet mesh describe --project PROJECT_ID

The following table lists all possible managed data plane states:

State Code Description
ACTIVE OK The managed data plane is running normally.
DISABLED DISABLED The managed data plane will be in this state if no namespace is configured to use it. Follow the instructions to enable the managed data plane. Note that the managed data plane status reporting is only available if you enabled the managed data plane by annotating a namespace. Annotating individual pods causes those pods to be managed but with a feature state of DISABLED if no namespaces are annotated.
FAILED_PRECONDITION MANAGED_CONTROL_PLANE_REQUIRED The managed data plane requires an active managed Anthos Service Mesh control plane.
PROVISIONING PROVISIONING The managed data plane is being provisioned. If this state persists for more than 10 minutes, an error has likely occurred and you should contact Support.
STALLED INTERNAL_ERROR The managed data plane is blocked from operating due to an internal error condition. If the issue persists, contact Support.
NEEDS_ATTENTION UPGRADE_FAILURES The managed data plane requires manual intervention in order to bring the service back to the normal state. For more information and how to resolve this issue, see NEEDS_ATTENTION state.

NEEDS_ATTENTION state

If the gcloud alpha container fleet mesh describe command shows that the managed data plane state is in NEEDS_ATTENTION state and the code is UPGRADE_FAILURES, then the managed data plane has failed to upgrade certain workloads. These workloads will be labeled with dataplane-upgrade: failed by the managed data plane service for further analysis. The proxies must be restarted manually to be upgraded. To get the list of pods that require attention, run the following command:

kubectl get pods --all-namespaces -l dataplane-upgrade=failed

Cluster membership error (No identity provider specified)

The installation tool might fail with Cluster membership errors like the following:

asmcli: [ERROR]: Cluster has memberships.hub.gke.io CRD but no identity
provider specified. Please ensure that an identity provider is available for the
registered cluster.

The error can occur if you don't have GKE workload identity enabled before registering the cluster. You can re-register the cluster on the command line with the following command: gcloud container hub memberships register --enable-workload-identity