The following steps and logs are useful to troubleshoot problems with Anthos Service Mesh VM support.
Debug the VM
If you see that VM instances are running but not reachable from the mesh, perform the following steps on the VM instance.
Verify the agent
Check the envoy proxy health:
curl localhost:15000/ready -v
Check the envoy error log
less /var/log/envoy/envoy.err.log
Check for
service-proxy-agent
errors:journalctl -u service-proxy-agent
Check the
syslog
either in the Google Cloud's operations suite logs for the instance or on the VM under/var/log/syslog
for Debian, and/var/log/messages
for Centos.
Verify proxy health
To debug the configuration of the proxy, run the following command on the VM:
curl localhost:15000/config_dump > config.out
Copy that file and run the following command:
istioctl proxy-config [cluster|route|listener] --file config.out
Invalid token errors
You might see an error similar to the following in the envoy error log:
E0217 17:59:17.206995798 2411 oauth2_credentials.cc:152] Call to http server ended with error 500 [{
"error": "invalid_target",
"error_description": "federated token response does not have access token. {\"error\":\"invalid_grant\",\"error_description\":\"JWT expired.\"}",
"error_uri": ""
}].
In that case, check if the token in /var/run/secrets/tokens/istio-token
on the VM is expired and confirm the exp
(epoch seconds) value has not
elapsed:
cat /var/run/secrets/tokens/istio-token | cut -d '.' -f2 | base64 -d | python -m json.tool
{
...
"azp": "...",
"email": "example-service-account@developer.gserviceaccount.com",
"email_verified": true,
"exp": 1613995395,
"google": {
"compute_engine": {
"instance_creation_timestamp": 1613775765,
"instance_id": "5678",
"instance_name": "vm-instance-03-0mqh",
"project_id": "...",
"project_number": 1234,
"zone": "us-central1-c"
}
},
"iat": ...,
"iss": "https://accounts.google.com",
"sub": "..."
}
Unsupported OS distribution warning info
In verify the agent , if you see a warning message similar to the following in the service-proxy-agent log:
E0217 17:59:17.206995798 2021-04-09T21:21:29.6091Z service-proxy-agent Warn
Detected image is unsupported: [Ubuntu|Fedora|Suse]. Envoy may not work correctly.
This means your Linux distribution might be unsupported, which might cause your proxy to have unexpected behavior.
Debug the cluster
Use the following steps to troubleshooting problems with your cluster.
Verify auto-registration is working
Check the
WorkloadEntry
thatistiod
auto-generates:kubectl get workloadentry -n WORKLOAD_NAMESPACE
In addition, you can check the Kubernetes Object Browser for its existence.
If it doesn't exist, check for errors in the
istiod
logs, which should be available to you in Google Cloud's operations suite. Alternatively, you can retrieve them directly:kubectl -n istio-system get pods -l app=istiod
The expected output is similar to:
NAME READY STATUS RESTARTS AGE istiod-asm-190-1-7f6699cfb-5mzxw 1/1 Running 0 5d13h istiod-asm-190-1-7f6699cfb-pgvpf 1/1 Running 0 5d13h
Set the pod environment variable and use it to export the logs:
export ISTIO_POD=istiod-asm-190-1-7f6699cfb-5mzxw kubectl logs -n istio-system ${ISTIO_POD} | grep -i 'auto-register\|WorkloadEntry'
Check the connected proxies
You can use the proxy-status
command to list all connected proxies, including
those for VMs:
istioctl proxy-status
The output should show connected proxies similar to:
NAME CDS LDS EDS RDS ISTIOD VERSION
details-v1-5f449bdbb9-bhl8d.default SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-5mzxw 1.9.0-asm.1
httpbin-779c54bf49-647vd.default SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-pgvpf 1.9.0-asm.1
istio-eastwestgateway-5b6d4ddd9d-5rzx2.istio-system SYNCED SYNCED SYNCED NOT SENT istiod-asm-190-1-7f6699cfb-pgvpf 1.9.0-asm.1
istio-ingressgateway-66b6ddd7cb-ctb6b.istio-system SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-pgvpf 1.9.0-asm.1
istio-ingressgateway-66b6ddd7cb-vk4bb.istio-system SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-5mzxw 1.9.0-asm.1
vm-instance-03-39b3.496270428946 SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-pgvpf 1.9.0
vm-instance-03-nh5k.496270428946 SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-pgvpf 1.9.0
vm-instance-03-s4nl.496270428946 SYNCED SYNCED SYNCED SYNCED istiod-asm-190-1-7f6699cfb-5mzxw 1.9.0
For more information about the command options, see istioctl proxy-config.
Check the workload identity configuration
Check the mesh
state for potential errors in your cluster. Note that the
latest gcloud version is required. For more information see
update to the latest.
gcloud alpha container hub mesh describe --project=PROJECT_ID
A valid configuration will have a status code of OK
for the member cluster:
createTime: '2021-06-15T21:56:10.221032150Z'
featureState:
detailsByMembership:
projects/<your project number>/locations/global/memberships/<your cluster name>:
code: OK
description: Revision(s) ready for use: istiod-asm-195-2.
updateTime: 2021-06-15T21:56:10.221032402Z
lifecycleState: ENABLED
name: projects/<your project name>/locations/global/features/servicemesh
servicemeshFeatureSpec: {}
updateTime: '2021-06-15T21:56:10.221032402Z'
If the VM is configured incorrectly, the status code will be WARNING
with additional details in the description:
createTime: '2021-06-15T22:56:10.227167202Z'
featureState:
detailsByMembership:
projects/<your project number>/locations/global/memberships/<your cluster name>:
code: WARNING
description: |-
Revision(s) ready for use: istiod-asm-195-2.
WorkloadGroup <namespace>/<workloadgroup name> missing ServiceAccount field, please see https://cloud.google.com/service-mesh/v1.9/docs/troubleshooting/troubleshoot-vms#verify_the_workloadgroup_is_set_up_correctly.
servicemeshFeatureState: {}
updateTime: '2021-06-15T22:56:00.220164192Z'
lifecycleState: ENABLED
name: projects/<your project name>/locations/global/features/servicemesh
servicemeshFeatureSpec: {}
updateTime: '2021-06-15T22:56:10.227167402Z'
Verify the identity provider is set up correctly
Check the IdentityProvider
resource fields:
kubectl describe identityprovider
Ensure that the fields meet these requirements:
- The
serviceAccount
field is set torequest.auth.claims["email"]
- The
issuerURI
field is set tohttps://accounts.google.com
(currently we only support google as the issuerURI) The provider
name
field under metadata must be set togoogle
, which is the only currently-supported provider.A valid
IdentityProvider
CR example:apiVersion: security.cloud.google.com/v1alpha1 kind: IdentityProvider metadata: name: google spec: authentication: oidc: issuerUri: https://accounts.google.com serviceAccount: request.auth.claims["email"]
Verify the WorkloadGroup
is set up correctly
Check the WorkloadGroup
:
kubectl get workloadgroup -n WORKLOAD_NAMESPACE
Ensure that the results meet these requirements:
- The
serviceAccount
field is set correctly, for example373206437219-compute@developer.gserviceaccount.com
where the account is the same as the service account used by the VM instance - The
security.cloud.google.com/IdentityProvider
under the annotation field is set. e.g.security.cloud.google.com/IdentityProvider: google
The workload group references a valid
IdentityProvider
, which you can verify by checking the existing identity provider:kubectl describe identityprovider
The output should be a list of existing providers like this:
NAME AGE google 39m
Check the
security.cloud.google.com/IdentityProvider
field in theWorkloadGroup
whether the provider exists in the list of existing providers.A valid
WorkloadGroup
CR example:apiVersion: networking.istio.io/v1alpha3 kind: WorkloadGroup metadata: name: wg-a namespace: foo spec: metadata: annotations: security.cloud.google.com/IdentityProvider: google labels: app: wg-a template: ports: grpc: 3550 http: 8080 serviceAccount: 373206437219-compute@developer.gserviceaccount.com
Internal Error Found
If you receive the message Internal Error Found
, see Getting support.
Istio VM troubleshooting guide
For additional troubleshooting steps, see Debugging Virtual Machines.