502 503 errors on external HTTP(s) Load Balancer

Problem

External HTTP(s) Load Balancer throws 502 and 503 errors during Pod autoscaling events.

Environment

  • External HTTP(S) Load Balancer
  • Network Endpoint Groups
  • Kubernetes Engine

Solution

  1. Make the health checks more sensitive to the pod shutting down, so that the load balancer stops routing traffic to the shutting down pod quicker. Default is once every 5 secs with a threshold of 2 failed HC. This still leaves room for 10s of 5XXs. Perhaps making the health check even more frequent to once a sec or less would help mitigate. Note: This is still just a mitigation and does not completely remove the problem.
  2. Alter the sig-term handling in the application so that the pod is still able to process new requests until the SIGKILL. This in combination with a terminationGracePeriodSeconds could work where the pod will delay shutdown for the time equal to the time it takes for the POD to be considered unhealthy and then starts shutdown. The grace period on the pod would have to equal the time it takes for the pod to clean up resources + 15 seconds.
  3. Use a pre-stop hook to introduce a delay between the endpoint being removed from the Endpoints & EndpointSlice Resource (triggers Neg detach call). A simple sleep that takes about the same amount of time for the endpoint to be removed from the NEG will delay the SIGTERM so that the pod begins shutdown after it has already been removed from the NEG.

PreStop hooks allow you to add a delay before pod terminates without modifying anything on the application. Following is a sample example which can be used to add a PreStop hook:

    spec:
      containers:
      - image: k8s.gcr.io/serve_hostname:v1.4
        name: hostname-server
        ports:
        - containerPort: 9376
          protocol: TCP
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","sleep 40"]


A kubernetes example is documented in here.

Please keep in mind if you are using terminationGracePeriodSeconds along with preStop hook, ensure that the terminationGracePeriodSeconds value is more than the timer value you are setting in preStop hook. If one of the Pod's containers has defined a preStop hook, the kubelet runs that hook inside the container. If the preStop hook is still running after the grace period expires, the kubelet requests a small, one-off grace period extension of 2 seconds. After that Kubernetes does not wait for the PreStop hook to finish.

Cause

When a pod is being scaled down, the endpoint is removed from the Endpoints and EndpointSlice Resources. This triggers the Neg Controller to send a request to detach the endpoint to the Network Endpoint Group API. At the same time the endpoint is removed, the pod is signalled (SIGTERM) to start shutting down and stops processing requests. Due to the Network Endpoint Group API latency, there is a period of time, that the load balancer will continue to consider the Pod a valid backend even though the pod is already shutting down. This latency results in the 502 and 503 errors. The same can be confirmed from the logging as the HTTP(S) LB log says that the 503 was sent by the backend.