Using Container-native Load Balancing

This page explains how to use container-native load balancing in Google Kubernetes Engine.


Container-native load balancing enables HTTP(S) Google Cloud Load Balancers (GCLBs) to target Pods directly and to evenly distribute their traffic to Pods.

Container-native load balancing leverages a data model called network endpoint groups (NEGs), collections of network endpoints represented by IP-port pairs.


Container-native load balancing offers the following benefits:

Pods are first-class citizens for load balancing
kube-proxy configures nodes' iptable rules to distribute traffic to Pods. Without container-native load balancing, load balancer traffic travels to the node instance groups and gets routed via iptable rules to Pods which might or might not be in the same node. With container-native load balancing, load balancer traffic is distributed directly to the Pods which should receive the traffic, eliminating the extra network hop. Container-native load balancing also helps with improved health checking since it targets Pods directly.

Diagram comparing default behavior (left) with container-native load balancer behavior.
Improved network performance
Because the container-native load balancer talks directly with the Pods and connections have fewer network hops, both latency and throughput are improved.
Increased visibility
With container-native load balancing, source IP is preserved for easier tracing back to the source of the traffic. You have visibility into the round-trip time (RTT) from the client to the HTTP(S) load balancer, including Stackdriver UI support. This makes troubleshooting your services at the NEG-level easier.
Support for GCLB features
Container-native load balancing offer native support in Google Kubernetes Engine for several GCLB features, such as integration with GCP services like Cloud Armor, Cloud Content Delivery Network, and Cloud Identity-Aware Proxy. It also features load balancing algorithms for accurate traffic distribution.


Container-native load balancers on Google Kubernetes Engine have the following requirements:

Google Kubernetes Engine version 1.10
Container-native load balancers are available in Google Kubernetes Engine clusters running Google Kubernetes Engine version 1.10 or later.
To use container-native load balancing, clusters must be VPC-native. To learn more, refer to Creating VPC-native clusters using Alias IPs.


Container-native load balancers do not work with legacy networks.


Container-native load balancers do not support internal load balancers or network load balancers.


You are charged for the HTTP(S) load balancer provisioned by the Ingress that you create in this guide. For load balancer pricing information, refer to Load balancing and forwarding rules on the Compute Engine pricing page.

Using container-native load balancing

The following sections walk you through a container-native load balancing configuration on Google Kubernetes Engine.

Creating a VPC-native cluster

To use container-native load balancing, you must create a cluster with alias IPs enabled.

For example, the following command creates a cluster, neg-demo-cluster, with an auto-provisioned subnetwork in zone us-central1-a:

gcloud container clusters create neg-demo-cluster \
    --enable-ip-alias \
    --create-subnetwork="" \
    --network=default \

Creating a Deployment

Next, deploy a workload to the cluster.

The following sample Deployment, neg-demo-app, runs a single instance of a containerized HTTP server:

apiVersion: apps/v1
kind: Deployment
    run: neg-demo-app # Label for the Deployment
  name: neg-demo-app # Name of Deployment
spec: # Deployment's specification
  minReadySeconds: 60 # Number of seconds to wait after a Pod is created and its status is Ready
      run: neg-demo-app
  template: # Pod template
        run: neg-demo-app # Labels Pods from this Deployment
    spec: # Pod specification; each Pod created by this Deployment has this specification
      - image: # Application to run in Deployment's Pods
        name: hostname # Container name
        - containerPort: 9376 # Port used by containers running in these Pods
          protocol: TCP
      terminationGracePeriodSeconds: 60 # Number of seconds to wait for connections to terminate before shutting down Pods

In this Deployment, each container runs an HTTP server. The HTTP server simply returns the hostname of the application server (the name of the Pod on which the server runs) as a response.

Save this manifest as neg-demo-app.yaml, then create the Deployment by running the following command:

kubectl apply -f neg-demo-app.yaml

Creating a Service for a container-native load balancer

After you have created a Deployment, you need to group its Pods into a Service.

The following sample Service, neg-demo-svc, targets the sample Deployment you created in the previous section:

apiVersion: v1
kind: Service
  name: neg-demo-svc # Name of Service
  annotations: '{"ingress": true}' # Creates an NEG after an Ingress is created
spec: # Service's specification
    run: neg-demo-app # Selects Pods labelled run: neg-demo-app
  - port: 80 # Service's port
    protocol: TCP
    targetPort: 9376 # Should match the containerPort used by the Deployment's containers

The Service's annotation, '{"ingress": true}', enables container-native load balancing. However, the load balancer is not created until you create an Ingress for the Service.

Save this manifest as neg-demo-svc.yaml, then create the Service by running the following command:

kubectl apply -f neg-demo-svc.yaml

Creating an Ingress for the Service

The following sample Ingress, neg-demo-ing, targets the Service you created:

apiVersion: extensions/v1beta1
kind: Ingress
  name: neg-demo-ing
    serviceName: neg-demo-svc # Name of the Service targeted by the Ingress
    servicePort: 80 # Should match the port used by the Service

Save this manifest as neg-demo-ing.yaml, then create the Ingress by running the following command:

kubectl apply -f neg-demo-ing.yaml

Upon creating the Ingress, a GCLB is created in the project, and NEGs are created in each zone in which the cluster runs. The endpoints in the NEG and the endpoints of Service are kept in sync.

Verifying the Ingress

After you have deployed a workload, grouped its Pods into a Service, and created an Ingress for the Service, you should verify that the Ingress has provisioned the container-native load balancer successfully.

To retrieve the status of the Ingress, run the following command:

kubectl describe ingress neg-demo-ing

In the command output, look for ADD and CREATE events:

Type     Reason   Age                From                     Message
----     ------   ----               ----                     -------
Normal   ADD      16m                loadbalancer-controller  default/neg-demo-ing
Normal   Service  4s                 loadbalancer-controller  default backend set to neg-demo-svc:32524
Normal   CREATE   2s                 loadbalancer-controller  ip:

Testing load balancer functionality

The following sections explain how you can test the functionality of a container-native load balancer.

Visit Ingress IP address

You can verify that the container-native load balancer is functioning by visiting the Ingress' IP address.

To get the Ingress IP address, run the following command:

kubectl get ingress neg-demo-ing

In the command output, the Ingress' IP address is displayed in the ADDRESS column. Visit the IP address in a web browser.

Check backend service health status

You can also get the health status of the load balancer's [backend service].

First, get a list of the backend services running in your project:

gcloud beta compute backend-services list

Copy of the name of the backend that includes the name of the Service, such as neg-demo-svc. Then, get the health status of the backend service:

gcloud compute backend-services get-health [BACKEND_SERVICE_NAME] --global

Verifying Ingress functionality

Another way you can test that the load balancer functions as expected is by scaling the sample Deployment, sending test requests to the Ingress, and verifying that the correct number of replicas respond.

The following command scales the neg-demo-app Deployment from one instance to two instances:

kubectl scale deployment neg-demo-app --replicas 2

Wait a few minutes for the rollout to complete. To verify that the rollout is complete, run the following command:

kubectl get deployment neg-demo-app

In the command output, verify that there are two available replicas:

neg-demo-app   2         2         2            2           26m

Then, run the following command to count the number of distinct responses from the load balancer:

for i in `seq 1 100`; do \
  curl --connect-timeout 1 -s [IP_ADDRESS] && echo; \
done  | sort | uniq -c

where [IP_ADDRESS] is the Ingress' IP address. You can get the Ingress' IP address from kubectl describe ingress neg-demo-ing.

In the command output, observe that the number of distinct responses is the same as the number of the replicas, indicating that all backend Pods are serving traffic:

44 neg-demo-app-7f7dfd7bc6-dcn95
56 neg-demo-app-7f7dfd7bc6-jrmzf

Cleaning up

After completing the tasks on this page, follow these steps to remove the resources to prevent unwanted charges incurring on your account:

Delete the clusters


gcloud container clusters delete neg-demo-cluster


  1. Visit the Google Kubernetes Engine menu in GCP Console.

    Visit the Google Kubernetes Engine menu

  2. Select neg-demo-cluster.

  3. Click Delete.


The following sections explain how to resolve common issues related to container-native load balancing.

Cannot create a cluster with alias IPs

When you attempt to create a cluster with alias IPs, you might encounter the
following error:
ResponseError: code=400, message=IP aliases cannot be used with a legacy network.
Potential causes
You encounter this error if you attempt to create a cluster with alias IPs that also uses a legacy network.

Ensure that you do not create a cluster with alias IPs and a legacy network enabled simultaneously. For more information about using alias IPs, refer to Creating VPC-native clusters using Alias IPs.

Traffic does not reach endpoints

502 errors or rejected connections.
Potential causes
New endpoints generally become reachable after attaching them to the load balancer, provided that they respond to health checks. You might encounter 502 errors or rejected connections if traffic cannot reach the endpoints.
To resolve this issue, verify that firewall rules allow incoming TCP traffic to your endpoints in the and ranges. To learn more, refer to Adding Health Checks in the Cloud Load Balancing documentation.
View the backend services in your project. The relevant backend service has the name of the corresponding Google Kubernetes Engine Service:
gcloud beta compute backend-services list
Retrieve the backend health status from the backend service:
gcloud beta compute backend-services get-health [BACKEND_SERVICE_NAME]
If all backends are unhealthy, your firewall, Ingress, or Service might be misconfigured.
If some backends are unhealthy for a short period of time, network programming latency might be the cause.
If some backends do not appear in the list of backend services, programming latency might be the cause. You can verify this by running the following command, where [NEG] is the name of the backend service. (NEGs and backend services share the same name):
gcloud beta compute network-endpoint-groups list-network-endpoints [NEG]
Check if all the expected endpoints are in the NEG.

Known issues

Container-native load balancing on Google Kubernetes Engine has the following known issues in beta:

Aligning workload rollouts with endpoint propagation

When you deploy a workload to your cluster, or when you update an existing workload, the container-native load balancer can take longer to propagate new endpoints than it takes to finish the workload rollout. The sample Deployment that you deploy in this guide uses two fields to align its rollout with the propagation of endpoints: terminationGracePeriodSeconds and minReadySeconds.

terminationGracePeriodSeconds allows the Pod to shut down gracefully by waiting for connections to terminate after a Pod to scheduled for deletion.

minReadySeconds adds a latency period after a Pod is created. You specify a minimum number of seconds for which a new Pod should be in Ready status, without any of its containers crashing, for the Pod to be considered available.

You should configure your workloads' minReadySeconds and terminationGracePeriodSeconds values to be 60 seconds or higher to ensure that service is not disrupted due to workload rollouts.

terminationGracePeriodSeconds is available in all Pod specifications, and minReadySeconds is available for Deployments and DaemonSets.

To learn more about fine-tuning rollouts, refer to RollingUpdateStrategy.

Incomplete garbage collection

Google Kubernetes Engine garbage collects container-native load balancers every ten minutes. If a cluster is deleted before load balancers are fully removed, you need to manually delete the load balancer's NEGs.

View the NEGs in your project by running the following command:

gcloud beta compute network-endpoint-groups list

In the command output, look for the relevant NEGs.

To delete a NEG, run the following command, where [NEG] is the name of the NEG:

gcloud beta compute network-endpoint-groups delete [NEG]

Scale-to-zero workloads interruption

Scale-to-zero workloads might experience momentary interruptions when the number of endpoints in a NEG transitions from zero to non-zero and vice versa. During such interruptions, the load balancer might return non-200 responses and backends might appear unhealthy.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine