This page lists known issues for GKE networking. This page is for Admins and architects who manage the lifecycle of the underlying technology infrastructure, and respond to alerts and pages when service level objectives (SLOs) aren't met or applications fail.
To filter the known issues by a product version, select your filters from the following drop-down menus.
Select your GKE version:
Or, search for your issue:
Identified version(s) | Fixed version(s) | Issue and workaround |
---|---|---|
1.27,1.28,1.29,1.30,1.31 |
NEG Controller stops managing endpoints when port removed from ServiceWhen the NEG controller is configured to create a Standalone NEG for a Service and one of the configured ports is later removed from the Service, the NEG controller will eventually stop managing endpoints for the NEG. In addition to Services where the user creates a Standalone NEG annotation, this also affects Services which are referenced by GKE Gateway, MCI, and GKE Multi Cluster Gateway. Workaround: When removing a port from a Service with a Standalone NEG annotation, the annotation needs to also be updated to remove the port in question. |
|
1.28 |
Gateway TLS configuration errorWe've identified an issue with configuring TLS for Gateways in clusters running GKE version 1.28.4-gke.1083000. This affects TLS configurations using either an SSLCertificate or a CertificateMap. If you're upgrading a cluster with existing Gateways, updates made to the Gateway will fail. For new Gateways, the load balancers won't be provisioned. This issue will be fixed in an upcoming GKE 1.28 patch version. |
|
1.27,1.28,1.29 |
|
Intermittent connection establishment failuresClusters on control plane versions 1.26.6-gke.1900 and later might encounter intermittent connection establishment failures. The chances of failures are low and it doesn't affect all clusters. The failures should stop completely after a few days since the symptom onset. |
1.27,1.28,1.29 |
|
DNS resolution issues with Container-Optimized OSWorkloads running on GKE clusters with Container-Optimized OS-based nodes might experience DNS resolution issues. |
1.28 | 1.28.3-gke.1090000 or later |
Network Policy drops a connection due to incorrect connection tracking lookupFor clusters with GKE Dataplane V2 enabled, when a client Pod connects to itself using a Service or the virtual IP address of an internal passthrough Network Load Balancer, the reply packet is not identified as a part of an existing connection due to incorrect conntrack lookup in the dataplane. This means that a Network Policy that restricts ingress traffic for the Pod is incorrectly enforced on the packet. The impact of this issue depends on the number of configured Pods for the Service. For example, if the Service has 1 backend Pod, the connection always fails. If the Service has 2 backend Pods, the connection fails 50% of the time. Workaround:
You can mitigate this issue by configuring the |
1.27,1.28 |
|
Packet drops for hairpin connection flowsFor clusters with GKE Dataplane V2 enabled, when a Pod creates a TCP connection to itself using a Service, such that the Pod is both the source and destination of the connection, GKE Dataplane V2 eBPF connection tracking incorrectly tracks the connection states, leading to leaked conntrack entries. When a connection tuple (protocol, source/destination IP, and source/destination port) has been leaked, new connections using the same connection tuple might result in return packets being dropped. Workaround: Use one of the following workarounds:
|
Earlier than 1.31.0-gke.1506000 | 1.31.0-gke.1506000 and later |
Device typed network in GKE multi-network fails with long network namesCluster creation fails with the following error:
Workaround: Limit the
length of device-typed network object names to 41 characters or less. The
full path of each UNIX domain socket is composed, including the
corresponding network name. Linux has a limitation on socket path lengths
(under 107 bytes). After accounting for the directory, filename prefix, and
the |
1.27, 1.28, 1.29, 1.30 |
|
Connectivity issues for
|
1.28, 1.29, 1.30, 1.31 |
Calico Pods not healthy on clusters with less than 3 total nodes and insufficient vCPUCalico-typha and calico-node Pods can't be scheduled on clusters meeting all of the following conditions: fewer than 3 nodes total, each node having 1 or fewer allocatable vCPUs, and network policy enabled enabled. This is due to insufficient CPU resources. Workarounds:
|