Resolving managed Anthos Service Mesh issues

Stay organized with collections Save and categorize content based on your preferences.

This document explains common Anthos Service Mesh problems and how to resolve them, such as when a pod is injected with istio.istio-system, the installation tool generates errors such as HTTP 400 status codes and cluster membership errors.

If you need additional assistance troubleshooting Anthos Service Mesh, see Getting support.

Pod is injected with istiod.istio-system

This can occur if you did not replace the istio-injection: enabled label.

In addition, verify the mutating webhooks configuration by using the following command:

kubectl get mutatingwebhookconfiguration

...
istiod-asm-managed
…
# may include istio-sidecar-injector

kubectl get mutatingwebhookconfiguration   istio-sidecar-injector -o yaml

# Run debug commands
export T=$(echo '{"kind":"TokenRequest","apiVersion":"authentication.k8s.io/v1","spec":{"audiences":["istio-ca"], "expirationSeconds":2592000}}' | kubectl create --raw /api/v1/namespaces/default/serviceaccounts/default/token -f - | jq -j '.status.token')

export INJECT_URL=$(kubectl get mutatingwebhookconfiguration istiod-asmca -o json | jq -r .webhooks[0].clientConfig.url)
export ISTIOD_ADDR=$(echo $INJECT_URL | 'sed s/\/inject.*//')

curl -v -H"Authorization: Bearer $T" $ISTIOD_ADDR/debug/configz

The install tool generates HTTP 400 errors

The installation tool might generate HTTP 400 errors like the following:

HealthCheckContainerError, message: Cloud Run error: Container failed to start.
Failed to start and then listen on the port defined by the PORT environment
variable. Logs for this revision might contain more information.

The error can occur if you did not enable Workload Identity on your Kubernetes cluster, which you can do by using the following command:

export CLUSTER_NAME=...
export PROJECT_ID=...
export LOCATION=...
gcloud container clusters update $CLUSTER_NAME --zone $LOCATION \
    --workload-pool=$PROJECT_ID.svc.id.goog

Managed data plane state

The following command displays the state of the managed data plane:

gcloud container fleet mesh describe --project PROJECT_ID

The following table lists all possible managed data plane states:

State Code Description
ACTIVE OK The managed data plane is running normally.
DISABLED DISABLED The managed data plane will be in this state if no namespace or revision is configured to use it. Follow the instructions to enable managed Anthos Service Mesh via the fleet API, or enable the managed data plane after provisioning managed Anthos Service Mesh with asmcli. Note that the managed data plane status reporting is only available if you enabled the managed data plane by annotating a namespace or revision. Annotating individual pods causes those pods to be managed but with a feature state of DISABLED if no namespaces or revisions are annotated.
FAILED_PRECONDITION MANAGED_CONTROL_PLANE_REQUIRED The managed data plane requires an active managed Anthos Service Mesh control plane.
PROVISIONING PROVISIONING The managed data plane is being provisioned. If this state persists for more than 10 minutes, an error has likely occurred and you should contact Support.
STALLED INTERNAL_ERROR The managed data plane is blocked from operating due to an internal error condition. If the issue persists, contact Support.
NEEDS_ATTENTION UPGRADE_FAILURES The managed data plane requires manual intervention in order to bring the service back to the normal state. For more information and how to resolve this issue, see NEEDS_ATTENTION state.

NEEDS_ATTENTION state

If the gcloud container fleet mesh describe command shows that the managed data plane state is in NEEDS_ATTENTION state and the code is UPGRADE_FAILURES, then the managed data plane has failed to upgrade certain workloads. These workloads will be labeled with dataplane-upgrade: failed by the managed data plane service for further analysis. The proxies must be restarted manually to be upgraded. To get the list of pods that require attention, run the following command:

kubectl get pods --all-namespaces -l dataplane-upgrade=failed

Cluster membership error (No identity provider specified)

The installation tool might fail with Cluster membership errors like the following:

asmcli: [ERROR]: Cluster has memberships.hub.gke.io CRD but no identity
provider specified. Please ensure that an identity provider is available for the
registered cluster.

The error can occur if you don't have GKE workload identity enabled before registering the cluster. You can re-register the cluster on the command line by using the gcloud container fleet memberships register --enable-workload-identity commmand.

ControlPlaneRevision Stalled Codes

There are multiple reasons the Stalled condition could become true in the ControlPlaneRevisions status.

Reason Message Description
PreconditionFailed Only GKE memberships are supported but ${CLUSTER_NAME} is not a GKE cluster. The current cluster does not appear to be a GKE cluster. Managed control plane only works on GKE clusters.
Unsupported ControlPlaneRevision name: ${NAME} The name of the ControlPlaneRevision must be one of the following:
  • asm-managed
  • asm-managed-rapid
  • asm-managed-stable
Unsupported ControlPlaneRevision namespace: ${NAMESPACE} The namespace of the ControlPlaneRevision must be istio-system.
Unsupported channel ${CHANNEL} for ControlPlaneRevision with name${NAME}. Expected ${OTHER_CHANNEL} The name of the ControlPlaneRevision must match the channel of the ControlPlaneRevision with the following:
  • asm-managed -> regular
  • asm-managed-rapid -> rapid
  • asm-managed-stable -> stable
Channel must not be omitted or blank Channel is a required field on the ControlPlaneRevision. It is missing or blank on the custom resource.
Unsupported control plane revision type: ${TYPE} managed_service is the only allow field for the ControlPlaneRevisionType field.
Unsupported Kubernetes version: ${VERSION} Kubernetes versions 1.15+ are supported.
Workload identity is not enabled Please enable workload identity on your cluster.
Unsupported workload pool: ${POOL} The workload pool must be of the form ${PROJECT_ID}.svc.id.goog.
Cluster project and environ project do not match Clusters must be part of the same project in which they are registered to the fleet.
ProvisioningFailed An error occurred updating cluster resources Google was unable to update your in-cluster resources such as CRDs and webhooks.
MutatingWebhookConfiguration "istiod-asm-managed" contains a webhook with URL of ${EXISTING_URL} but expected ${EXPECTED_URL} Google will not overwrite existing webhooks to avoid breaking your installation. Update this manually if it is desired behavior.
ValidatingWebhookConfiguration ${NAME} contains a webhook with URL of ${EXISTING_URL} but expected ${EXPECTED_URL} Google will not overwrite existing webhooks to avoid breaking your installation. Update this manually if it is desired behavior.