Internal load balancing

This page explains how to create a Compute Engine internal load balancer on Google Kubernetes Engine.


Internal Load Balancing makes your cluster's services accessible to applications outside of your cluster that use the same VPC network and that are located in the same GCP region. For example, suppose you have a cluster in the us-west1 region and you need to make its services accessible to some Compute Engine VM instances running in that region on the same VPC network. You can do so by adding an internal load balancer to one of your cluster's Service resources.

Without Internal Load Balancing, you would need to set up external load balancers, create firewall rules to limit the access, and set up network routes to make the IP address of the application accessible outside of the cluster.

Internal Load Balancing creates a private (RFC 1918) LoadBalancer Ingress IP address in the cluster for receiving traffic on the network within the same compute region.

You create an internal load balancer by using kubectl to create a Service resource with a "Internal" annotation and a LoadBalancer specification.


You are charged per Compute Engine's pricing model. For more information, refer to the Internal Load Balancing pricing page.

Before you begin

Read about the limitations of Internal Load Balancing.

To prepare for this task, perform the following steps:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Cloud SDK.
  • Set your default project ID:
    gcloud config set project [PROJECT_ID]
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone [COMPUTE_ZONE]
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region [COMPUTE_REGION]
  • Update gcloud to the latest version:
    gcloud components update

Creating an Internal Load Balancer

The following sections explain how to create an internal load balancer using a Service. Internal load balancers support Service parameters, such as externalTrafficPolicy, sessionAffinity, and loadBalancerSourceRanges.

Writing the Service Configuration File

The following is an example of a Service, service.yaml, that creates an internal load balancer:

apiVersion: v1
kind: Service
  name: [SERVICE_NAME]
  annotations: "Internal"
    [KEY]: [VALUE]
  type: LoadBalancer
  loadBalancerIP: [IP_ADDRESS] # if omitted, an IP is generated
  - [IP_RANGE] # defaults to
  - name: [PORT_NAME]
    port: 9000
    protocol: TCP # default; can also specify UDP
    [KEY]: [VALUE] # label selector for Pods to target

Your Service configuration file must contain the following:

  • [SERVICE_NAME], the name you choose for the Service
  • An annotation, "Internal", which specifies that an internal load balancer is to be configured
  • The type LoadBalancer and port fields.

You should also include the following:

  • a spec: loadBalancerSourceRanges array to specify one or more RFC 1918 ranges used by your VPC Networks, Subnetworks, or VPN Gateways. loadBalancerSourceRanges restricts traffic through the load balancer to the IPs specified in this field. If you do not set this field manually, the field defaults to, which allows all IPv4 traffic to reach the nodes.
  • a spec: selector field to specify the Pods the Service should target. For example, the selector might target Pods labelled app: web.

You can also include the following optional fields:

  • spec: loadBalancerIP enables you to choose a specific IP address for the load balancer. The IP address must not be in use by another internal load balancer or Service. If omitted, an ephemeral IP is assigned. For more information about reservering private IP addresses within subnets, see Reserving a Static Internal IP Address.
  • spec: ports: protocol defines the network protocol the internal load balancer's port should use. If omitted, the port uses TCP.

For more information about configuring loadBalancerSourceRanges to restrict access to your internal load balancer, refer to Configure Your Cloud Provider's Firewalls. For more information about the Service specification, see the Service API reference.

Deploying the Service

To create the internal load balancer, run the following command in your shell or terminal window:

kubectl apply -f service.yaml

Inspecting the Service

After deployment, inspect the Service to verify that it has been configured successfully.


To inspect the internal load balancer, run the following command:

kubectl describe service [SERVICE_NAME]

The command's output is similar to the following:

Name:     [SERVICE_NAME]
Namespace:    default
Labels:     app=echo
Selector:   app=echo
Type:     LoadBalancer
LoadBalancer Ingress:
Port:       9000/TCP
NodePort:   30387/TCP
Session Affinity: ClientIP

IP is the Service's cluster IP address.


To inspect the internal load balancer, perform the following steps:

  1. Visit the Google Kubernetes Engine Services menu in GCP Console.

    Visit the Services menu

  2. Select the desired Service.

The Service details menu includes the following:

  • External endpoints
  • Cluster IP
  • Load balancer IP
  • A list of Pods served by the Service

Using the Internal Load Balancer

You can access the Service from within the cluster using the cluster IP address. To access the Service from outside the cluster, use the LoadBalancer Ingress IP address.

Considerations for Existing Ingresses

If your cluster has an existing Ingress resource, that resource must use the balancing mode RATE. UTILIZATION balancing mode is not compatible with internal load balancers.

Earlier BackendService resources created by Kubernetes Ingress Resources objects were created with no balancing mode specified. By default, the API used balancing mode UTILIZATION for HTTP load balancers. However, internal load balancers cannot be pointed to instance groups with other load balancers using UTILIZATION.

To ensure compatibility with an internal load balancer and Ingress resources, you may need to perform some manual steps.

Determining if your Ingress is Compatible

To determine if your Ingress is compatible, run the following commands from your shell or terminal window:

GROUPNAME=`kubectl get configmaps ingress-uid -o jsonpath='k8s-ig--{.data.uid}' --namespace=kube-system`
gcloud compute backend-services list --format="table(name,backends[].balancingMode,backends[].group)" | grep $GROUPNAME

These commands export a shell variable, GROUPNAME, which fetches your cluster's instance group name. Then, your project's Compute Engine BackendService resources are polled and the results are narrowed down based on the contents of $GROUPNAME.

The output is similar to the following:

k8s-be-31210--...  [u'RATE']       us-central1-b/instanceGroups/k8s-ig--...
k8s-be-32125--...  [u'RATE']       us-central1-b/instanceGroups/k8s-ig--...

If the output return RATE entries or zero entries are returned, then internal load balancers are compatible and no additional work is needed.

If the output returns entries marked with UTILIZATION, your Ingresses are not compatible.

Updating your Existing Ingresses

Ingress balancing mode type can only change when there are no existing HTTP(S) load balancers pointing to the cluster.

To update your Ingress resources to be compatible with an internal load balancer, you can create a new cluster running Kubernetes version 1.7.2 or higher, then migrate your services to that cluster. Migrating to the new cluster ensures that no Ingresses can exist with the incompatible balancing mode.

Restrictions for Internal Load Balancers

  • Your master and nodes must be running Kubernetes version 1.7.2 or higher.
  • Internal load balancers are only accessible from within the same network and region.
  • Internal load balancer ports can only serve traffic on one type of protocol, TCP or UDP. The internal load balancer uses the protocol of the first port specified in the Service definition.
  • Internal load balancers do not support using UDP and sessionAffinity: ClientIP together.
  • For clusters running Kubernetes version 1.7.4 or later, you can use internal load balancers with custom-mode subnets in addition to auto-mode subnets.
  • For clusters running Kubernetes 1.7.X, while the clusterIP remains unchanged, internal load balancer IP addresses cannot be reserved. Changes made to ports, protocols, or session affinity may cause these IP addresses to change.
  • Specifying all ports with an internal load balancer forwarding rule (in beta) is not currently supported with Google Kubernetes Engine (GKE).


The limit for the number of internal forwarding rules (ILBs) is 50 per VPC network and Shared VPC network. If a VPC is peered with another VPC, the limit is shared across all peered VPCs.

A Kubernetes Service with type Loadbalancer and the Internal annotation creates an ILB that targets the Kubernetes Service. Hence, GKE does not support creating more than 50 such Kubernetes Services in a single VPC or Shared VPC network, including peered VPCs.

In a GKE cluster, an internal forwarding rule points to all the nodes in the cluster. Each node in the cluster is a backend VM for the ILB. The maximum number of backend VMs for an ILB is 250, regardless of how the VMs are associated with instance groups. So the maximum number of nodes in a GKE cluster with an ILB is 250. If you have autoscaling enabled for your cluster, you must ensure that autoscaling does not scale your cluster beyond 250 nodes.

For more information about these limits, see VPC Resource Quotas.

For information on the limitations of internal load balancers, see the Limits section of the Internal Load Balancing page.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine