Troubleshoot load balancing in GKE


This page shows you how to resolve issues related to load balancing in Google Kubernetes Engine (GKE) clusters using Service, Ingress, or Gateway resources.

BackendConfig not found

This error occurs when a BackendConfig for a Service port is specified in the Service annotation, but the actual BackendConfig resource could not be found.

To evaluate a Kubernetes event, run the following command:

kubectl get event

The following output indicates your BackendConfig was not found:

KIND    ... SOURCE
Ingress ... loadbalancer-controller

MESSAGE
Error during sync: error getting BackendConfig for port 80 on service "default/my-service":
no BackendConfig for service port exists

To resolve this issue, ensure you have not created the BackendConfig resource in the wrong namespace or misspelled its reference in the Service annotation.

Ingress security policy not found

After the Ingress object is created, if the security policy isn't properly associated with the LoadBalancer Service, evaluate the Kubernetes event to see if there is a configuration mistake. If your BackendConfig specifies a security policy that does not exist, a warning event is periodically emitted.

To evaluate a Kubernetes event, run the following command:

kubectl get event

The following output indicates your security policy was not found:

KIND    ... SOURCE
Ingress ... loadbalancer-controller

MESSAGE
Error during sync: The given security policy "my-policy" does not exist.

To resolve this issue, specify the correct security policy name in your BackendConfig.

NEG not found when creating an Internal Ingress resource

The following error might occur when you create an internal Ingress in GKE:

Error syncing: error running backend syncing routine: googleapi: Error 404: The resource 'projects/PROJECT_ID/zones/ZONE/networkEndpointGroups/NEG' was not found, notFound

This error occurs because Ingress for internal Application Load Balancers requires Network Endpoint Groups (NEGs) as backends.

In Shared VPC environments or clusters with Network Policies enabled, you must add the annotation cloud.google.com/neg: '{"ingress": true}' to the Service manifest.

504 Gateway Timeout: upstream request timeout

The following error might occur when you access a Service from an internal Ingress in GKE:

HTTP/1.1 504 Gateway Timeout
content-length: 24
content-type: text/plain

upsteam request timeout

This error occurs because traffic sent to internal Application Load Balancers are proxied by envoy proxies in the proxy-only subnet range.

You must create a firewall rule to allow traffic from the proxy-only subnet range, on the targetPort of the Service.

Error 400: Invalid value for field 'resource.target'

The following error might occur when you access a Service from an internal Ingress in GKE:

Error syncing:LB_NAME does not exist: googleapi: Error 400: Invalid value for field 'resource.target': 'https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/regions/REGION_NAME/targetHttpProxies/LB_NAME. A reserved and active subnetwork is required in the same region and VPC as the forwarding rule.

To resolve this issue, create a proxy-only subnet.

Error during sync: error running load balancer syncing routine: loadbalancer does not exist

One of the following errors might occur when the GKE control plane upgrades or when you modify an Ingress object:

"Error during sync: error running load balancer syncing routine: loadbalancer
INGRESS_NAME does not exist: invalid ingress frontend configuration, please
check your usage of the 'kubernetes.io/ingress.allow-http' annotation."
Error during sync: error running load balancer syncing routine: loadbalancer LOAD_BALANCER_NAME does not exist:
googleapi: Error 400: Invalid value for field 'resource.IPAddress':'INGRESS_VIP'. Specified IP address is in-use and would result in a conflict., invalid

To resolve these issues, try the following:

  • Add the hosts field in the tls section of the Ingress manifest, then delete the Ingress. Wait five minutes for GKE to delete the unused Ingress resources. Then, recreate the Ingress. For more information, see The hosts field of an Ingress object.
  • Revert the changes you made to the Ingress. Then, add a certificate using an annotation or Kubernetes secret.

External Ingress produces HTTP 502 errors

Use the following guidance to troubleshoot HTTP 502 errors with external Ingress resources:

  1. Enable logs for each backend service associated with each GKE Service that is referenced by the Ingress.
  2. Use status details to identify causes for HTTP 502 responses. Status details that indicate the HTTP 502 response originated from the backend require troubleshooting within the serving Pods, not the load balancer.

Unmanaged instance groups

You might experience HTTP 502 errors with external Ingress resources if your external Ingress uses unmanaged instance group backends. This issue occurs when all of the following conditions are met:

  • The cluster has a large total number of nodes among all node pools.
  • The serving Pods for one or more Services that are referenced by the Ingress are located on only a few nodes.
  • Services referenced by the Ingress use externalTrafficPolicy: Local.

To determine if your external Ingress uses unmanaged instance group backends, do the following:

  1. Go to the Ingress page in the Google Cloud console.

    Go to Ingress

  2. Click on the name of your external Ingress.

  3. Click on the name of the Load balancer. The Load balancing details page displays.

  4. Check the table in the Backend services section to determine if your external Ingress uses NEGs or instance groups.

To resolve this issue, use one of the following solutions:

  • Use a VPC-native cluster.
  • Use externalTrafficPolicy: Cluster for each Service referenced by the external Ingress. This solution causes you to lose the original client IP address in the packet's sources.
  • Use the node.kubernetes.io/exclude-from-external-load-balancers=true annotation. Add the annotation to the nodes or node pools that do not run any serving Pod for any Service referenced by any external Ingress or LoadBalancer Service in your cluster.

Use load balancer logs to troubleshoot

You can use internal passthrough Network Load Balancer logs and external passthrough Network Load Balancer logs to troubleshoot issues with load balancers and correlate traffic from load balancers to GKE resources.

Logs are aggregated per-connection and exported in near real time. Logs are generated for each GKE node involved in the data path of a LoadBalancer Service, for both ingress and egress traffic. Log entries include additional fields for GKE resources, such as: - Cluster name - Cluster location - Service name - Service namespace - Pod name - Pod namespace

Pricing

There are no additional charges for using logs. Based on how you ingest logs, standard pricing for Cloud Logging, BigQuery, or Pub/Sub apply. Enabling logs has no effect on the performance of the load balancer.

Use diagnostic tools to troubleshoot

The check-gke-ingress diagnostic tool inspects Ingress resources for common misconfigurations. You can use the check-gke-ingress tool in the following ways:

  • Run the gcpdiag command-line tool on your cluster. Ingress results appear in the check rule gke/ERR/2023_004 section.
  • Use the check-gke-ingress tool alone or as a kubectl plugin by following the instructions in check-gke-ingress.