Troubleshoot user access issues

This document provides troubleshooting guidance for user access issues in GKE Identity Service.

gcloud anthos create-login-config fails to get clientconfig

This issue occurs in one of the following cases:

  • The kubeconfig file passed to gcloud anthos create-login-config is incorrect.
  • The ClientConfig custom resource is not present in the cluster (GKE Identity Service is not installed on the cluster).

Error message

  failed to get clientconfig default in namespace kube-public
  

Solution

To resolve this issue, do the following:

  1. Make sure you have the correct kubeconfig file for your cluster.
  2. To verify if the ClientConfig custom resource is in the cluster, run the following command:

    kubectl --kubeconfig KUBECONFIG  get clientconfig default -n kube-public
    

    If the ClientConfig is not present in the cluster, then install and configure GKE Identity Service on the cluster. For more information on cluster setup options, see Setup options for clusters.

gcloud anthos create-login-config fails because of duplicate cluster name

This issue occurs if you attempt to create login configuration for a cluster into a file that already contains a login configuration for this cluster.

Error message

  error merging with file FILENAME because FILENAME contains a
    cluster with the same name as the one read from KUBECONFIG.
  

Solution

To resolve this issue, use the --output flag to specify a new destination file.

If you do not provide --output, this login configuration data is written to a file named kubectl-anthos-config.yaml in the current directory.

gcloud anthos auth login fails with proxyconnect tcp

This issue occurs when there is an error in the https_proxy or HTTPS_PROXY environment variable configurations. If there's an https:// specified in the environment variables, then the GoLang HTTP client libraries might fail if the proxy is configured to handle HTTPS connections using other protocols such as SOCK5.

Error message

  proxyconnect tcp: tls: first record does not look like a TLS handshake
  

Solution

To resolve this issue, modify the https_proxy and HTTPS_PROXY environment variables to omit the https:// prefix. On Windows, modify the system environment variables. For example, change the value of the https_proxy environment variable from https://webproxy.example.com:8000 to webproxy.example.com:8000.

Cluster access fails when using kubeconfig generated by gcloud anthos auth login

This issue occurs when the Kubernetes API server is unable to authorize the user for one of the following reasons:

  • There is an error in the configuration used to login with the gcloud anthos auth login command.
  • The necessary RBAC policies are incorrect or missing for the user.

Error message

  Unauthorized
  

Solution

To resolve this issue, do the following:

  1. Verify configuration used to login.

    OIDC configuration

    The authentication.oidc section in the user cluster configuration file has group and username fields that are used to set the --oidc-group-claim and --oidc-username-claim flags in the Kubernetes API server. When the API server is presented with a user's identity token, it forwards the token to GKE Identity Service, which returns the extracted group-claim and username-claim back to the API server. The API server uses the response to verify that the corresponding group or user has the correct permissions.

    Verify that the claims set for group and user in the authentication.oidc section of the cluster configuration file are present in the ID token.

  2. Verify applied RBAC policies.

    To learn how to set up the correct RBAC policies for GKE Identity Service, see Set up role-based access control (RBAC).

RBACs for groups not working for OIDC providers

  1. Verify if the ID token has the group information

    After you run gcloud anthos auth login command to initiate the OIDC authentication flow, the ID Token is stored in the kubeconfig file in the id-token field. Use jwt.io to decode the ID token and verify if it contains the group information of the user as expected.

  2. If ID token does not have group information of the user, then correctly configure the OIDC provider to return the group information as per the documentation of your OIDC provider. For example, if you're using OIDC configuration of the Okta Identity provider, then follow the documentation of the Okta Identity provider to configure groups in the ID token.

  3. If the ID token has group information, then verify if the group information key in the ID token matches the groupsClaim field configured under the oidc section.

    For example, if the ID token contains group information in the groups key:

    "groups" : ["group1", "group2" ...]
    

    then the value of the groupsClaim field should be groups in the oidc section.

    After modifying the configuration in the oidc section, make sure you run the instructions listed in Set up user access and Accessing clusters again.

Troubleshoot identity providers

If you have problems using OIDC or LDAP with your GKE cluster, follow the steps in this section to troubleshoot GKE Identity Service and help determine if there's an issue with your identity provider configuration.

Enable the GKE Identity Service debug log

To help troubleshoot identity-related issues in your cluster, enable the GKE Identity Service debug log.

  1. Patch your existing cluster with kubectl patch:

    kubectl patch deployment ais \
      -n anthos-identity-service --type=json \
      -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value":"--vmodule=cloud/identity/hybrid/charon/*=LOG_LEVEL"}]' \
      --kubeconfig KUBECONFIG
    

    Replace the following:

    • LOG_LEVEL: For the most verbose logs, set this value to level 3 when troubleshooting.

    • KUBECONFIG: The path to your user cluster kubeconfig file.

Check the GKE Identity Service container log

Review the content of the GKE Identity Service container logs for any errors or warnings.

  1. To review the logs, use kubectl logs:

    kubectl logs -f -l k8s-app=ais \
      -n anthos-identity-service \
      --kubeconfig KUBECONFIG
    

    Replace KUBECONFIG with the path to your user cluster kubeconfig file.

Restart the GKE Identity Service pod

If the container logs show problems, restart the GKE Identity Service pod.

  1. To restart the GKE Identity Service pod, delete the existing pod. A new pod is automatically created as a replacement.

    kubectl delete pod -l k8s-app=ais \
      -n anthos-identity-service \
      --kubeconfig KUBECONFIG
    

    Replace KUBECONFIG with the path to your user cluster kubeconfig file.

Troubleshoot connectivity to identity provider

If the GKE Identity Service pod looks to be running correctly, test the connectivity to the remote identity provider.

  1. Start a busybox pod in the same namespace as the GKE Identity Service pod:

    kubectl run curl --image=radial/busyboxplus:curl \
      -n anthos-identity-service -- sleep 3000 \
      --kubeconfig KUBECONFIG
    

    Replace KUBECONFIG with the path to your user cluster kubeconfig file.

  2. To check if you can fetch the discovery URL, execute into the busybox pod and run the curl command:

    kubectl exec pod/curl -n anthos-identity-service -- \
      curl ISSUER_URL \
      --kubeconfig KUBECONFIG
    

    Replace the following:

    • ISSUER_URL: The issuer URL of your identity provider.
    • KUBECONFIG: The path to your user cluster kubeconfig file.

    A successful response is a JSON result with the detailed identity provider endpoints.

  3. If the previous command doesn't return the expected result, contact your identity provider administrator for additional assistance.

LDAP login not working for Anthos on VMware admin cluster

LDAP is currently only supported for Anthos on VMware user cluster.