Troubleshoot issues with internal Application Load Balancers

This guide describes how to troubleshoot configuration issues for a Google Cloud internal Application Load Balancer. Before following this guide, familiarize yourself with the following:

Troubleshoot common issues with Network Analyzer

Network Analyzer automatically monitors your VPC network configuration and detects both suboptimal configurations and misconfigurations. It identifies network failures, provides root cause information, and suggests possible resolutions. To learn about the different misconfiguration scenarios that are automatically detected by Network Analyzer, see Load balancer insights in the Network Analyzer documentation.

Network Analyzer is available in the Google Cloud console as a part of Network Intelligence Center.

Go to Network Analyzer

Backends have incompatible balancing modes

When creating a load balancer, you might see the error:

Validation failed for instance group INSTANCE_GROUP:

backend services 1 and 2 point to the same instance group
but the backends have incompatible balancing_mode. Values should be the same.

This happens when you try to use the same backend in two different load balancers, and the backends don't have compatible balancing modes.

For more information, see the following:

Load balanced traffic does not have the source address of the original client

This is expected behavior. An internal Application Load Balancer operates as an HTTP(S) reverse proxy (gateway). When a client program opens a connection to the IP address of an INTERNAL_MANAGED forwarding rule, the connection terminates at a proxy. The proxy processes the requests that arrive over that connection. For each request, the proxy selects a backend to receive the request based on the URL map and other factors. The proxy then sends the request to the selected backend. As a result, from the point of view of the backend, the source of an incoming packet is an IP address from the region's proxy-only subnet.

Requests are rejected by the load balancer

For each request, the proxy selects a backend to receive the request based on a path matcher in the load balancer's URL map. If the URL map doesn't have a path matcher defined for a request, it cannot select a backend service, so it returns an HTTP 404 (Not Found) response code.

Load balancer doesn't connect to backends

The firewalls protecting your backend servers need to be configured to allow ingress traffic from the proxies in the proxy-only subnet range that you allocated to your internal HTTP(S) load balancer's region.

The proxies connect to backends using the connection settings specified by the configuration of your backend service. If these values don't match the configuration of the server(s) running on your backends, the proxy cannot forward requests to the backends.

Health check probes can't reach the backends

To verify that health check traffic reaches your backend VMs, enable health check logging and search for successful log entries.

Clients cannot connect to the load balancer

The proxies listen for connections to the load balancer's IP address and port configured in the forwarding rule (for example, 10.1.2.3:80), and with the protocol specified in the forwarding rule (HTTP or HTTPS). If your clients can't connect, ensure that they are using the correct address, port, and protocol.

Ensure that a firewall isn't blocking traffic between your client instances and the load balanced IP address.

Ensure that the clients are in the same region as the load balancer. Internal HTTP(S) Load Balancing is a regional product, so all clients (and backends) must be in the same region as the load balancer resource.

Organizational policy restriction for Shared VPC

If you are using Shared VPC and you cannot create a new internal Application Load Balancer in a particular subnet, an organization policy might be the cause. In the organization policy, add the subnet to the list of allowed subnets or contact your organization administrator. For more information, see constraints/compute.restrictSharedVpcSubnetworks.

Load balancer doesn't distribute traffic evenly across zones

You might observe an imbalance in your internal Application Load Balancer traffic across zones. This can happen especially when there is low utilization (< 10%) of your backend capacity.

Such behavior can affect overall latency due to traffic being sent to only a few servers in one zone.

To even out the traffic distribution across zones, you can make the following configuration changes:

Use the RATE balancing mode with a low max-rate-per-instance target capacity.
Use the LocalityLbPolicy backend traffic policy with a load balancing algorithm of LEAST_REQUEST.

Unexplained `5xx` errors

For error conditions caused by a communications issue between the load balancer proxy and its backends, the load balancer generates an HTTP status code (5xx) and returns that status code to the client. Not all HTTP 5xx errors are generated by the load balancer—for example, if a backend sends an HTTP 5xx response to the load balancer, the load balancer relays that response to its client. To determine whether an HTTP 5xx response was relayed from a backend or if it was generated by the load balancer proxy, see the proxyStatus field of the load balancer logs.

Configuration changes to the internal Application Load Balancer, such as addition or removal of a backend service, can result in a brief period of time where users see the HTTP status code 503. While these configuration changes propagate to Envoys globally, you see log entries where the proxyStatus field matches the connection_refused log string.

If HTTP 5xx status codes persist longer than a few minutes after you complete the load balancer configuration, take the following steps to troubleshoot HTTP 5xx responses:

Verify that there is a firewall rule configured to allow health checks. In the absence of one, load balancer logs typically have a proxyStatus matching destination_unavailable, which indicates that the load balancer considers the backend to be unavailable.
Verify that health check traffic reaches your backend VMs. To do this, enable health check logging and search for successful log entries.

For new load balancers, the lack of successful health check log entries doesn't mean that health check traffic is not reaching your backends. It might mean that the backend's initial health state has not yet changed from UNHEALTHY to a different state. You see successful health check log entries only after the health check prober receives an HTTP 200 OK response from the backend.
Verify that the keepalive configuration parameter for the HTTP server software running on the backend instance is not less than the keepalive timeout of the load balancer, whose value is fixed at 10 minutes (600 seconds) and is not configurable.

The load balancer generates an HTTP 5xx status code when the connection to the backend has unexpectedly closed while sending the HTTP request or before the complete HTTP response has been received. This can happen because the keepalive configuration parameter for the web server software running on the backend instance is less than the fixed keepalive timeout of the load balancer. Ensure that the keepalive timeout configuration for HTTP server software on each backend is set to slightly greater than 10 minutes (the recommended value is 620 seconds).

Limitations

If you are having trouble using an internal Application Load Balancer with other Google Cloud networking features, note the current compatibility limitations.