Resolving VM support issues in Anthos Service Mesh

The following steps and logs are useful to troubleshoot problems with Anthos Service Mesh VM support.

Debug the VM

If you see that VM instances are running but not reachable from the mesh, perform the following steps on the VM instance.

Verify the agent

  1. Check the envoy proxy health:

    curl localhost:15000/ready -v
    
  2. Check the envoy error log

    less /var/log/envoy/envoy.err.log
    
  3. Check for service-proxy-agent errors:

    journalctl -u service-proxy-agent
    
  4. Check the syslog either in the Google Cloud Observability logs for the instance or on the VM under /var/log/syslog for Debian, and /var/log/messages for Centos.

Verify proxy health

  1. To debug the configuration of the proxy, run the following command on the VM:

    curl localhost:15000/config_dump > config.out
    
  2. Copy that file and run the following command:

    istioctl proxy-config [cluster|route|listener] --file config.out
    

Invalid token errors

You might see an error similar to the following in the envoy error log:

E0217 17:59:17.206995798    2411 oauth2_credentials.cc:152]  Call to http server ended with error 500 [{
  "error": "invalid_target",
  "error_description": "federated token response does not have access token. {\"error\":\"invalid_grant\",\"error_description\":\"JWT expired.\"}",
  "error_uri": ""
}].

In that case, check if the token in /var/run/secrets/tokens/istio-token on the VM is expired and confirm the exp (epoch seconds) value has not elapsed:

cat /var/run/secrets/tokens/istio-token | cut -d '.' -f2 | base64 -d | python -m json.tool
{
    ...
    "azp": "...",
    "email": "example-service-account@developer.gserviceaccount.com",
    "email_verified": true,
    "exp": 1613995395,
    "google": {
        "compute_engine": {
            "instance_creation_timestamp": 1613775765,
            "instance_id": "5678",
            "instance_name": "vm-instance-03-0mqh",
            "project_id": "...",
            "project_number": 1234,
            "zone": "us-central1-c"
        }
    },
    "iat": ...,
    "iss": "https://accounts.google.com",
    "sub": "..."
}

Unsupported OS distribution warning info

In verify the agent , if you see a warning message similar to the following in the service-proxy-agent log:

E0217 17:59:17.206995798    2021-04-09T21:21:29.6091Z service-proxy-agent Warn
Detected image is unsupported: [Ubuntu|Fedora|Suse]. Envoy may not work correctly.

This means your Linux distribution might be unsupported, which might cause your proxy to have unexpected behavior.

Debug the cluster

Use the following steps to troubleshooting problems with your cluster.

Verify auto-registration is working

  1. Check the WorkloadEntry that istiod auto-generates:

    kubectl get workloadentry -n WORKLOAD_NAMESPACE
    

    In addition, you can check the Kubernetes Object Browser for its existence.

  2. If it doesn't exist, check for errors in the istiod logs, which should be available to you in Google Cloud Observability. Alternatively, you can retrieve them directly:

    kubectl -n istio-system get pods -l app=istiod
    

    The expected output is similar to:

    NAME                                       READY   STATUS    RESTARTS   AGE
    istiod-asm-190-1-7f6699cfb-5mzxw           1/1     Running   0          5d13h
    istiod-asm-190-1-7f6699cfb-pgvpf           1/1     Running   0          5d13h
    
  3. Set the pod environment variable and use it to export the logs:

    export ISTIO_POD=istiod-asm-190-1-7f6699cfb-5mzxw
    kubectl logs -n istio-system ${ISTIO_POD} | grep -i 'auto-register\|WorkloadEntry'
    

Check the connected proxies

You can use the proxy-status command to list all connected proxies, including those for VMs:

istioctl proxy-status

The output should show connected proxies similar to:

NAME                                                    CDS        LDS        EDS        RDS          ISTIOD                               VERSION
details-v1-5f449bdbb9-bhl8d.default                     SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-5mzxw     1.9.0-asm.1
httpbin-779c54bf49-647vd.default                        SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-pgvpf     1.9.0-asm.1
istio-eastwestgateway-5b6d4ddd9d-5rzx2.istio-system     SYNCED     SYNCED     SYNCED     NOT SENT     istiod-asm-190-1-7f6699cfb-pgvpf     1.9.0-asm.1
istio-ingressgateway-66b6ddd7cb-ctb6b.istio-system      SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-pgvpf     1.9.0-asm.1
istio-ingressgateway-66b6ddd7cb-vk4bb.istio-system      SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-5mzxw     1.9.0-asm.1
vm-instance-03-39b3.496270428946                        SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-pgvpf     1.9.0
vm-instance-03-nh5k.496270428946                        SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-pgvpf     1.9.0
vm-instance-03-s4nl.496270428946                        SYNCED     SYNCED     SYNCED     SYNCED       istiod-asm-190-1-7f6699cfb-5mzxw     1.9.0

For more information about the command options, see istioctl proxy-config.

Check the workload identity configuration

Check the mesh state for potential errors in your cluster. Note that the latest gcloud version is required. For more information see update to the latest.

gcloud alpha container hub mesh describe --project=PROJECT_ID

A valid configuration will have a status code of OK for the member cluster:

createTime: '2021-06-15T21:56:10.221032150Z'
featureState:
  detailsByMembership:
    projects/<your project number>/locations/global/memberships/<your cluster name>:
      code: OK
      description: Revision(s) ready for use: istiod-asm-195-2.
      updateTime: 2021-06-15T21:56:10.221032402Z
  lifecycleState: ENABLED
name: projects/<your project name>/locations/global/features/servicemesh
servicemeshFeatureSpec: {}
updateTime: '2021-06-15T21:56:10.221032402Z'

If the VM is configured incorrectly, the status code will be WARNING with additional details in the description:

createTime: '2021-06-15T22:56:10.227167202Z'
featureState:
  detailsByMembership:
    projects/<your project number>/locations/global/memberships/<your cluster name>:
      code: WARNING
      description: |-
        Revision(s) ready for use: istiod-asm-195-2.
        WorkloadGroup <namespace>/<workloadgroup name> missing ServiceAccount field, please see https://cloud.google.com/service-mesh/v1.11/docs/troubleshooting/troubleshoot-vms#verify_the_workloadgroup_is_set_up_correctly.
      servicemeshFeatureState: {}
      updateTime: '2021-06-15T22:56:00.220164192Z'
  lifecycleState: ENABLED
name: projects/<your project name>/locations/global/features/servicemesh
servicemeshFeatureSpec: {}
updateTime: '2021-06-15T22:56:10.227167402Z'

Verify the identity provider is set up correctly

Check the IdentityProvider resource fields:

 kubectl describe identityprovider

Ensure that the fields meet these requirements:

  • The serviceAccount field is set to request.auth.claims["email"]
  • The issuerURI field is set to https://accounts.google.com (currently we only support google as the issuerURI)
  • The provider name field under metadata must be set to google, which is the only currently-supported provider.

    A valid IdentityProvider CR example:

    apiVersion: security.cloud.google.com/v1alpha1
    kind: IdentityProvider
    metadata:
      name: google
    spec:
      authentication:
        oidc:
          issuerUri: https://accounts.google.com
      serviceAccount: request.auth.claims["email"]
    

Verify the WorkloadGroup is set up correctly

Check the WorkloadGroup:

 kubectl get workloadgroup -n WORKLOAD_NAMESPACE

Ensure that the results meet these requirements:

  • The serviceAccount field is set correctly, for example 373206437219-compute@developer.gserviceaccount.com where the account is the same as the service account used by the VM instance
  • The security.cloud.google.com/IdentityProvider under the annotation field is set. e.g. security.cloud.google.com/IdentityProvider: google
  • The workload group references a valid IdentityProvider, which you can verify by checking the existing identity provider:

    kubectl describe identityprovider
    

    The output should be a list of existing providers like this:

     NAME     AGE
     google   39m
    

    Check the security.cloud.google.com/IdentityProvider field in the WorkloadGroup whether the provider exists in the list of existing providers.

    A valid WorkloadGroup CR example:

    apiVersion: networking.istio.io/v1alpha3
    kind: WorkloadGroup
    metadata:
    name: wg-a
    namespace: foo
    spec:
    metadata:
      annotations:
        security.cloud.google.com/IdentityProvider: google
      labels:
        app: wg-a
    template:
      ports:
        grpc: 3550
        http: 8080
      serviceAccount: 373206437219-compute@developer.gserviceaccount.com
    

Internal Error Found

If you receive the message Internal Error Found, see Getting support.

Istio VM troubleshooting guide

For additional troubleshooting steps, see Debugging Virtual Machines.