When working with Google Kubernetes Engine (GKE), issues with the kubectl
command-line tool can prevent you from deploying applications or managing
cluster resources. These problems generally fall into two categories:
authentication failures, where the cluster doesn't recognize your identity, and
connectivity failures, where your tool can't reach the cluster's control plane.
Use this page to help you diagnose and resolve these issues. Find steps for
troubleshooting various authentication problems and debugging connectivity
issues between the kubectl tool and your cluster's control plane. Learn to
check that the necessary plugins are installed and configured, and review network
policy and firewall considerations for services like SSH and Konnectivity.
This information is important for anyone who uses kubectl commands to manage
applications or cluster resources on GKE. It's particularly
important for Application developers and Platform admins and operators who
rely on kubectl commands for their core daily tasks. For more information
about the common roles and example tasks that we reference in Google Cloud
content, see Common GKE user roles and
tasks.
For related information, see the following resources:
- For more information about problems not specific to GKE, see Troubleshooting kubectl in the Kubernetes documentation.
- For more information about how to use
kubectlcommands to diagnose problems with your clusters and workloads, see Investigate a cluster's state withkubectl.
Authentication and authorization errors
If you're experiencing errors related to authentication and authorization when
using the kubectl command-line tool commands, read the following sections for
advice.
Error: 401 (Unauthorized)
When connecting to GKE clusters, you can get an authentication
and authorization error with HTTP status code 401 (Unauthorized). This issue
might occur when you try to run a kubectl command in your GKE
cluster from a local environment. To learn more, see
Issue: Authentication and authorization errors.
Error: Insufficient authentication scopes
When you run gcloud container clusters get-credentials, you might receive the
following error:
ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Request had insufficient authentication scopes.
This error occurs because you are attempting to access the
GKE API from a Compute Engine VM that doesn't have the
cloud-platform scope.
To resolve this error, grant the missing cloud-platform scope. For
instructions on changing the scopes on your Compute Engine VM instance, see
Creating and enabling service accounts for instances
in the Compute Engine documentation.
Error: Executable gke-gcloud-auth-plugin not found
Error messages similar to the following can occur while trying to run kubectl
commands or custom clients interacting with GKE:
Unable to connect to the server: getting credentials: exec: executable gke-gcloud-auth-plugin not found
It looks like you are trying to use a client-go credential plugin that is not installed.
To learn more about this feature, consult the documentation available at:
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
Visit cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_plugin to install gke-gcloud-auth-plugin.
Unable to connect to the server: getting credentials: exec: fork/exec /usr/lib/google-cloud-sdk/bin/gke-gcloud-auth-plugin: no such file or directory
To resolve the issue, install the gke-gcloud-auth-plugin as described in
Install required plugins.
Error: No auth provider found
The following error occurs if kubectl or custom Kubernetes clients have been
built with Kubernetes client-go version 1.26 or later:
no Auth Provider found for name "gcp"
To resolve this issue, complete the following steps:
Install
gke-gcloud-auth-pluginas described in Install required plugins.Update to the latest version of the gcloud CLI:
gcloud components updateUpdate the
kubeconfigfile:gcloud container clusters get-credentials CLUSTER_NAME \ --location=CONTROL_PLANE_LOCATIONReplace the following:
CLUSTER_NAME: the name of your cluster.CONTROL_PLANE_LOCATION: the Compute Engine location of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.
Error: The gcp auth plugin is deprecated, use gcloud instead
You might see the following warning message after you install the
gke-gcloud-auth-plugin
and run a kubectl command against a GKE cluster:
WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.25+; use gcloud instead.
This message appears if your client version is earlier than 1.26.
To resolve this issue, tell your client to use the gke-gcloud-auth-plugin
authentication plugin instead:
Open your shell login script in a text editor:
Bash
vi ~/.bashrcZsh
vi ~/.zshrcIf you're using PowerShell, skip this step.
Set the following environment variable:
Bash
export USE_GKE_GCLOUD_AUTH_PLUGIN=TrueZsh
export USE_GKE_GCLOUD_AUTH_PLUGIN=TruePowerShell
[Environment]::SetEnvironmentVariable('USE_GKE_GCLOUD_AUTH_PLUGIN', True, 'Machine')Apply the variable in your environment:
Bash
source ~/.bashrcZsh
source ~/.zshrcPowerShell
Exit the terminal and open a new terminal session.
Update the gcloud CLI:
gcloud components updateAuthenticate to your cluster:
gcloud container clusters get-credentials CLUSTER_NAME \ --location=CONTROL_PLANE_LOCATIONReplace the following:
CLUSTER_NAME: the name of your cluster.CONTROL_PLANE_LOCATION: the Compute Engine location of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.
Issue: The kubectl command isn't found
If you receive a message that the kubectl command isn't found,
reinstall the kubectl binary and set your $PATH environment variable:
Install the
kubectlbinary:gcloud components update kubectlWhen the installer prompts you to modify your
$PATHenvironment variable, enteryto proceed. Modifying this variable lets you usekubectlcommands without typing their full path.Alternatively, add the following line to wherever your shell stores environment variables, such as
~/.bashrc(or~/.bash_profilein macOS):export PATH=$PATH:/usr/local/share/google/google-cloud-sdk/bin/Run the following command to load your updated file. The following example uses
.bashrc:source ~/.bashrcIf you are using macOS, use
~/.bash_profileinstead of.bashrc.
Issue: kubectl commands return "connection refused" error
If kubectl commands return a "connection refused" error, then
you need to set the cluster context with the following command:
gcloud container clusters get-credentials CLUSTER_NAME \
--location=CONTROL_PLANE_LOCATION
Replace the following:
CLUSTER_NAME: the name of your cluster.CONTROL_PLANE_LOCATION: the Compute Engine location of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.
If you're unsure of what to enter for the cluster name or location, use the following command to list your clusters:
gcloud container clusters list
Error: kubectl command timed out
If you created a cluster and attempted to run a kubectl command against the
cluster but the kubectl command times out, you'll see an error similar to the
following:
Unable to connect to the server: dial tcp IP_ADDRESS: connect: connection timed outUnable to connect to the server: dial tcp IP_ADDRESS: i/o timeout.
These errors indicate that kubectl is unable to communicate with the
cluster control plane.
To resolve this issue, verify and set the context where the cluster is set and ensure connectivity to the cluster:
Go to
$HOME/.kube/configor run the commandkubectl config viewto verify that the config file contains the cluster context and the external IP address of the control plane.Set the cluster credentials:
gcloud container clusters get-credentials CLUSTER_NAME \ --location=CONTROL_PLANE_LOCATION \ --project=PROJECT_IDReplace the following:
CLUSTER_NAME: the name of your cluster.CONTROL_PLANE_LOCATION: the Compute Engine location of the control plane of your cluster. Provide a region for regional clusters, or a zone for zonal clusters.PROJECT_ID: the ID of the project that the cluster was created in.
If you've enabled authorized networks in the cluster, then ensure that its list of existing authorized networks includes the outgoing IP of the machine that you are attempting to connect from. You can find your existing authorized networks in the console or by running the following command:
gcloud container clusters describe CLUSTER_NAME \ --location=CONTROL_PLANE_LOCATION \ --project=PROJECT_ID \ --format "flattened(controlPlaneEndpointsConfig.ipEndpointsConfig.authorizedNetwork sConfig.cidrBlocks[])"If the outgoing IP of the machine is not included in the list of authorized networks from the output of the preceding command, then complete one of the following steps:
- If you're using the console, follow the directions in Can't reach control plane of a cluster with no external endpoint.
- If connecting from Cloud Shell, follow the directions in Using Cloud Shell to access a cluster with external endpoint disabled.
Error: kubectl commands return failed to negotiate an api version
If kubectl commands return a failed to negotiate an API version
error, then you need to ensure kubectl has authentication credentials:
gcloud auth application-default login
Issue: kubectl logs, attach, exec, or port-forward command stops responding
If the kubectl logs, attach, exec, or port-forward commands stop
responding, typically the API server is unable to communicate with the nodes.
First, check if your cluster has any nodes. If you've scaled down the number of nodes in your cluster to zero, the commands won't work. To resolve this issue, resize your cluster to have at least one node.
If your cluster has at least one node, then check whether you are using SSH or Konnectivity proxy tunnels to enable secure communication. The following sections discuss the troubleshooting steps specific to each service:
Troubleshoot SSH issues
If you're using SSH, GKE saves an SSH public key file in your Compute Engine project metadata. All Compute Engine VMs using Google-provided images regularly check their project's common metadata and their instance's metadata for SSH keys to add to the VM's list of authorized users. GKE also adds a firewall rule to your Compute Engine network for allowing SSH access from the control plane's IP address to each node in the cluster.
The following settings can cause issues with SSH communication:
Your network's firewall rules don't allow for SSH access from the control plane.
All Compute Engine networks are created with a firewall rule called
default-allow-sshthat allows SSH access from all IP addresses (requiring a valid private key). GKE also inserts an SSH rule for each public cluster of the formgke-CLUSTER_NAME-RANDOM_CHARACTERS-sshthat allows SSH access specifically from the cluster's control plane to the cluster's nodes.If neither of these rules exists, then the control plane can't open SSH tunnels.
To verify that this is the cause of the issue, check whether your configuration has these rules.
To resolve this issue, identify the tag that's on all of the cluster's nodes, then re-add a firewall rule allowing access to VMs with that tag from the IP address of the control plane.
Your project's common metadata entry for
ssh-keysis full.If the project's metadata entry named
ssh-keysis close to its maximum size limit, then GKE isn't able to add its own SSH key for opening SSH tunnels.To verify that this is the issue, check the length of the list of
ssh-keys. You can see your project's metadata by running the following command, optionally including the--projectflag:gcloud compute project-info describe [--project=PROJECT_ID]To resolve this issue, delete some of the SSH keys that are no longer needed.
You have set a metadata field with the key
ssh-keyson the VMs in the cluster.The node agent on VMs prefers per-instance SSH keys to project-wide SSH keys, so if you've set any SSH keys specifically on the cluster's nodes, then the control plane's SSH key in the project metadata won't be respected by the nodes.
To verify that this is the issue, run
gcloud compute instances describe VM_NAMEand look for anssh-keysfield in the metadata.To resolve this issue, delete the per-instance SSH keys from the instance metadata.
Troubleshoot Konnectivity proxy issues
You can determine whether your cluster uses the Konnectivity proxy by checking for the following system Deployment:
kubectl get deployments konnectivity-agent --namespace kube-system
If your cluster uses the Konnectivity proxy, the output is similar to the following:
NAME READY UP-TO-DATE AVAILABLE AGE
konnectivity-agent 3/3 3 3 18d
After you've verified that you're using the Konnectivity proxy, make sure that the Konnectivity agents have the required firewall access and that your network policies are set up correctly.
Allow required firewall access
Check that your network's firewall rules allow access to the following ports:
- Control plane port: On cluster creation, Konnectivity agents establish
connections to the control plane on port 8132. When you run a
kubectlcommand, the API server uses this connection to communicate with the cluster. Make sure you allow Egress traffic to the cluster control plane on port 8132 (for comparison, the API server uses 443). If you have rules that deny egress access, you might need to modify the rules or create exceptions. kubeletport: Because Konnectivity agents are system Pods deployed on your cluster nodes, ensure that your firewall rules allow the following types of traffic:- Incoming traffic to your workloads at port 10250 from your Pod ranges.
- Outgoing traffic from your Pod ranges.
If your firewall rules don't permit this type of traffic, modify your rules.
Adjust network policy
The Konnectivity proxy might have issues if your cluster's network policy does either of the following:
- Blocks ingress from the
kube-systemnamespace to theworkloadnamespace - Blocks egress to the cluster control plane on port 8132
When ingress is blocked by the network policy of workload Pods, the
konnectivity-agent logs include an error message similar to the
following:
"error dialing backend" error="dial tcp POD_IP_ADDRESS:PORT: i/o timeout"
In the error message, POD_IP_ADDRESS is the IP address
of the workload Pod.
When egress is blocked by network policy, the konnectivity-agent logs
include an error message similar to the following:
"cannot connect once" err="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp CP_IP_ADDRESS:8132: i/o timeout
In the error, CP_IP_ADDRESS is the cluster control
plane's IP address.
These features are not required for the correct functioning of the cluster. If you prefer to keep your cluster's network locked down from all outside access, be aware that features like these won't work.
To verify that network policy ingress or egress rules are the cause of the issue, find the network policies in the affected namespace by running the following command:
kubectl get networkpolicy --namespace AFFECTED_NAMESPACE
To resolve the issue with the ingress policy, add the following to the
spec.ingress field of the network policies:
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: konnectivity-agent
To resolve the issue with the egress policy, add the following to the
spec.egress field of the network policies:
egress:
- to:
- ipBlock:
cidr: CP_IP_ADDRESS/32
ports:
- protocol: TCP
port: 8132
If your network policy uses a combination of ingress and egress rules, then consider adjusting both.
Adjust the IP masquerade agent
The cluster control plane accepts the traffic from the Konnectivity agents if the source IP address is in the Pod IP address ranges. If you modify the configuration of ip-masq-agent to masquerade the source IP address for the traffic to the cluster control plane, the Konnectivity agents might experience connectivity errors.
To resolve the issue and to help ensure that traffic from Konnectivity agents to
the cluster control plane are not masqueraded to the node IP address, add the
control plane IP address to the nonMasqueradeCIDRs list in the ip-masq-agent
ConfigMap:
nonMasqueradeCIDRs:
- CONTROL_PLANE_IP_ADDRESS/32
For more information about this configuration, see IP masquerade agent.
Error: kubectl commands fail with no agent available error
When you run kubectl commands that need to connect from the GKE
control plane to a Pod—for example, kubectl exec, kubectl logs, or kubectl
port-forward—the command might fail with error messages similar to the
following:
Error from server: error dialing backend: No agent available
failed to call webhook: Post "https://WEBHOOK_SERVICE.WEBHOOK_NAMESPACE.svc:PORT/PATH?timeout=10s": No agent available
v1beta1.metrics.k8s.io failed with: failing or missing response from https://NODE_IP:10250/apis/metrics.k8s.io/v1beta1: Get "https://NODE_IP:10250/apis/metrics.k8s.io/v1beta1": No agent available
These errors indicate a problem with Konnectivity, the secure communication
tunnel between the GKE control plane and your cluster's nodes. In
particular, it means the konnectivity-server on the control plane cannot
connect to any healthy konnectivity-agent Pods in the kube-system namespace.
To resolve this issue, try the following solutions:
Verify the health of the
konnectivity-agentPods:Check if the
konnectivity-agentPods are running:kubectl get pods -n kube-system -l k8s-app=konnectivity-agentThe output is similar to the following:
NAME READY STATUS RESTARTS AGE konnectivity-agent-abc123def4-xsy1a 2/2 Running 0 31d konnectivity-agent-abc123def4-yza2b 2/2 Running 0 31d konnectivity-agent-abc123def4-zxb3c 2/2 Running 0 31dReview the value in the
Statuscolumn. If the Pods have a status ofRunning, review logs for connection issues. Otherwise, investigate why the Pods aren't running.Review logs for connection issues. If the Pods have a status of
Running, check logs for connection issues. Because thekubectl logscommand depends on Konnectivity, use Logs Explorer in the Google Cloud console:In the Google Cloud console, go to Logs Explorer.
In the query pane, enter the following query.
resource.type="k8s_container" resource.labels.cluster_name="CLUSTER_NAME" resource.labels.namespace_name="kube-system" labels."k8s-pod/k8s-app"="konnectivity-agent" resource.labels.container_name="konnectivity-agent"Replace
CLUSTER_NAMEwith the name of your cluster.Click Run query.
Review the output. When you review the
konnectivity-agentlogs, look for errors indicating why the agent can't connect. Authentication or permission errors often point to a misconfigured webhook blocking token reviews. "Connection refused" or "timeout" errors typically mean a firewall rule or Network Policy is blocking traffic to the control plane on TCP port 8132, or is blocking traffic between the Konnectivity agent and other nodes. Certificate errors suggest that a firewall or proxy is inspecting and interfering with the encrypted TLS traffic.
Investigate why the Pods aren't running. If Pods have a status of
Pendingor another non-running state, investigate the cause. Thekonnectivity-agentruns as a Deployment, not a DaemonSet. Because it runs as a Deployment, the agent Pods only need to run on a subset of nodes. However, if that specific subset of nodes is unavailable, the entire service can fail.Common causes of a non-running Pod include the following:
- Custom node taints that prevent a Pod from being scheduled.
- Insufficient node resources (CPU or memory).
- Restrictive Binary Authorization policies that block GKE system images.
To get more details about why a specific Pod isn't running, use the
kubectl describecommand:kubectl describe pod POD_NAME -n kube-systemReplace
POD_NAMEwith the name of the Pod that isn't running.
Investigate your admission webhooks to ensure none are blocking
TokenReviewAPI requests. Thekonnectivity-agentrelies on service account tokens, so interference with token reviews can prevent agents from connecting. If a webhook is the cause, Konnectivity cannot recover until the faulty webhook is removed or repaired.Ensure your firewall rules allow TCP egress traffic from your GKE nodes to the control plane's IP address on port 8132. This connection is required for the
konnectivity-agentto reach the Konnectivity service. For more information, see Allow required firewall access.Make sure that there are no Network Policy rules that restrict essential Konnectivity traffic. Network Policy rules should allow both intra-cluster traffic (Pod-to-Pod) within the
kube-systemnamespace and egress traffic from thekonnectivity-agentPods to the GKE control plane.
What's next
If you can't find a solution to your problem in the documentation, see Get support for further help, including advice on the following topics:
- Opening a support case by contacting Cloud Customer Care.
- Getting support from the community by
asking questions on StackOverflow
and using the
google-kubernetes-enginetag to search for similar issues. You can also join the#kubernetes-engineSlack channel for more community support. - Opening bugs or feature requests by using the public issue tracker.