Internal TCP/UDP Load Balancing

This page explains how to create a Compute Engine internal TCP/UDP load balancer on Google Kubernetes Engine.

Overview

Internal TCP/UDP Load Balancing makes your cluster's services accessible to applications outside of your cluster that use the same VPC network and are located in the same Google Cloud region. For example, suppose you have a cluster in the us-west1 region and you need to make one of its services accessible to Compute Engine VM instances running in that region on the same VPC network.

You can create an internal TCP/UDP load balancer by creating a Service resource with the cloud.google.com/load-balancer-type: "Internal" annotation and a type: LoadBalancer specification. The instructions and example below highlight how to do this.

Without Internal TCP/UDP Load Balancing, you would need to set up an external load balancer and firewall rules to make the application accessible outside of the cluster.

Internal TCP/UDP Load Balancing creates a private (RFC 1918) IP address for the cluster that receives traffic on the network within the same compute region.

Pricing

You are charged per Compute Engine's pricing model. For more information, see Load balancing and forwarding rules pricing and the Compute Engine page on the Google Cloud pricing calculator.

Before you begin

To prepare for this task, perform the following steps:

  • Ensure that you have enabled the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • Ensure that you have installed the Cloud SDK.
  • Set your default project ID:
    gcloud config set project [PROJECT_ID]
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone [COMPUTE_ZONE]
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region [COMPUTE_REGION]
  • Update gcloud to the latest version:
    gcloud components update

Create a Deployment

The following manifest describes a Deployment that runs 3 replicas of a Hello World app.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
spec:
  selector:
    matchLabels:
      app: hello
  replicas: 3
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello
        image: "gcr.io/google-samples/hello-app:2.0"

The source code and Dockerfile for this sample app is available on GitHub. Since no PORT environment variable is specified, the containers listen on the default port: 8080.

To create the Deployment, create the file my-deployment.yaml from the manifest, and then run the following command in your shell or terminal window:

kubectl apply -f my-deployment.yaml

Create an internal TCP load balancer

The following sections explain how to create an internal TCP load balancer using a Service.

Writing the Service configuration file

The following is an example of a Service that creates an internal TCP load balancer:

apiVersion: v1
kind: Service
metadata:
  name: ilb-service
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
  labels:
    app: hello
spec:
  type: LoadBalancer
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

Minimum Service requirements

Your manifest must contain the following:

  • A name for the Service, in this case ilb-service.
  • The annotation: cloud.google.com/load-balancer-type: "Internal", which specifies that an internal TCP/UDP load balancer is to be configured.
  • The type: LoadBalancer.
  • A spec: selector field to specify the Pods the Service should target, for example, app: hello.
  • The port, the port over which the Service is exposed, and targetPort, the port on which the containers are listening.

Deploying the Service

To create the internal TCP load balancer, create the file my-service.yaml from the manifest, and then run the following command in your shell or terminal window:

kubectl apply -f my-service.yaml

Inspecting the Service

After deployment, inspect the Service to verify that it has been configured successfully.

Get detailed information about the Service:

kubectl get service ilb-service --output yaml

In the output, you can see the internal load balancer's IP address under status.loadBalancer.ingress. Notice that this is different from the value of clusterIP. In this example, the load balancer's IP address is 10.128.15.193:

apiVersion: v1
kind: Service
metadata:
  ...
  labels:
    app: hello
  name: ilb-service
  ...
spec:
  clusterIP: 10.0.9.121
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30835
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: hello
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 10.128.15.193

Any Pod that has the label app: hello is a member of this Service. These are the Pods that can be the final recipients of requests sent to your internal load balancer.

Clients call the Service by using the loadBalancer IP address and the TCP port specified in the port field of the Service manifest. The request is forwarded to one of the member Pods on the TCP port specified in the targetPort field. So for the preceding example, a client calls the Service at 10.128.15.193 on TCP port 80. The request is forwarded to one of the member Pods on TCP port 8080. Note that the member Pod must have a container listening on port 8080.

The nodePort value of 30835 is extraneous; it is not relevant to your internal load balancer.

Viewing the load balancer's forwarding rule

An internal load balancer is implemented as a forwarding rule. The forwarding rule has a backend service, which has an instance group.

The internal load balancer address, 10.128.15.193 in the preceding example, is the same as the forwarding rule address. To see the forwarding rule that implements your internal load balancer, start by listing all of the forwarding rules in your project:

gcloud compute forwarding-rules list --filter="loadBalancingScheme=INTERNAL"

In the output, look for the forwarding rule that has the same address as your internal load balancer, 10.128.15.193 in this example.

NAME                          ... IP_ADDRESS  ... TARGET
...
aae3e263abe0911e9b32a42010a80008  10.128.15.193   us-central1/backendServices/aae3e263abe0911e9b32a42010a80008

The output shows the associated backend service, ae3e263abe0911e9b32a42010a80008 in this example.

Describe the backend service:

gcloud compute backend-services describe aae3e263abe0911e9b32a42010a80008 --region us-central1

The output shows the associated instance group, k8s-ig--2328fa39f4dc1b75 in this example:

backends:
- balancingMode: CONNECTION
  group: .../us-central1-a/instanceGroups/k8s-ig--2328fa39f4dc1b75
...
kind: compute#backendService
loadBalancingScheme: INTERNAL
name: aae3e263abe0911e9b32a42010a80008
...

How the Service abstraction works

When a packet is handled by your forwarding rule, the packet gets forwarded to one of your cluster nodes. When the packet arrives at the cluster node, the addresses and port are as follows:

Destination IP addressForwarding rule, 10.128.15.193 in this example
Destination TCP portService port field, 80 in this example

Note that the forwarding rule (that is, your internal load balancer) does not change the destination IP address or destination port. Instead, iptables rules on the cluster node route the packet to an appropriate Pod. The iptables rules change the destination IP address to a Pod IP address and the destination port to the targetPort value of the Service, 8080 in this example.

Verifying the internal TCP load balancer

SSH into a VM instance, and run the following command:

curl [LOAD_BALANCER_IP]

Where [LOAD_BALANCER_IP] is your LoadBalancer Ingress IP address.

The response shows the output of hello-app:

Hello, world!
Version: 2.0.0
Hostname: hello-app-77b45987f7-pw54n

Running the command from outside of the same VPC network or outside the same region results in a timed out error.

Packets sent from a cluster node to an internal load balancer

Suppose a process running on a cluster node sends a packet to an internal TCP/UDP load balancer. How that packet is forwarded depends on the externalTrafficPolicy of the Service that you used to create the load balancer. The forwarding behavior also depends on whether the node has a member Pod for that Service.

The following table summarizes the forwarding behavior:

externalTrafficPolicyNode running a member Pod?Behavior for packets sent from a process running on the node
ClusterYesPackets are delivered to any member Pod, either on the node or on a different node.
ClusterNoPackets are delivered to any member Pod, which must be on a different node.
LocalYesPackets are delivered to any member Pod on the same node.
LocalNo

Kubernetes 1.14 and earlier: Packets are dropped.

Kubernetes 1.15 and later: Packets are delivered to any member Pod, which must be on a different node.

Cleaning up

You can delete the Deployment and Service using kubectl delete or Cloud Console.

kubectl

Delete the Deployment

To delete the Deployment, run the following command:

kubectl delete deployment hello-app

Delete the Service

To delete the Service, run the following command:

kubectl delete service ilb-service

Console

Delete the Deployment

To delete the Deployment, perform the following steps:

  1. Visit the Google Kubernetes Engine Workloads menu in Cloud Console.

    Visit the Workloads menu

  2. From the menu, select the desired workload.

  3. Click Delete.

  4. From the confirmation dialog menu, click Delete.

Delete the Service

To delete the Service, perform the following steps:

  1. Visit the Google Kubernetes Engine Services menu in Cloud Console.

    Visit the Services menu

  2. From the menu, select the desired Service.

  3. Click Delete.

  4. From the confirmation dialog menu, click Delete.

Considerations for existing Ingresses

You cannot have both an internal TCP/UDP load balancer and an Ingress that uses balancing mode UTILIZATION. To use both an Ingress and internal TCP/UDP load balancing, the Ingress must use the balancing mode RATE.

If your cluster has an existing Ingress resource created with Kubernetes version 1.7.1 or lower, it is not compatible with internal TCP/UDP load balancers. Earlier BackendService resources created by Kubernetes Ingress Resource objects were created with no balancing mode specified. By default, the API used balancing mode UTILIZATION for HTTP load balancers. However, internal TCP/UDP load balancers cannot be pointed to instance groups with other load balancers using UTILIZATION.

Determining your Ingress balancing mode

To determine what your Ingress balancing mode is, run the following commands from your shell or terminal window:

GROUPNAME=`kubectl get configmaps ingress-uid -o jsonpath='k8s-ig--{.data.uid}' --namespace=kube-system`
gcloud compute backend-services list --format="table(name,backends[].balancingMode,backends[].group)" | grep $GROUPNAME

These commands export a shell variable, GROUPNAME, which fetches your cluster's instance group name. Then, your project's Compute Engine backend service resources are polled and the results are narrowed down based on the contents of $GROUPNAME.

The output is similar to the following:

k8s-be-31210--...  [u'RATE']       us-central1-b/instanceGroups/k8s-ig--...
k8s-be-32125--...  [u'RATE']       us-central1-b/instanceGroups/k8s-ig--...

If the output returns RATE entries or zero entries are returned, then internal load balancers are compatible and no additional work is needed.

If the output returns entries marked with UTILIZATION, your Ingresses are not compatible.

To update your Ingress resources to be compatible with an internal TCP/UDP load balancer, you can create a new cluster running Kubernetes version 1.7.2 or higher, then migrate your services to that cluster.

Additional Service parameters

Internal TCP/UDP load balancers support Service parameters, such as loadBalancerSourceRanges.

  • A spec: loadBalancerSourceRanges array to specify one or more RFC 1918 ranges used by your VPC Networks, Subnetworks, or VPN Gateways. loadBalancerSourceRanges restricts traffic through the load balancer to the IPs specified in this field. If you do not set this field manually, the field defaults to 0.0.0.0, which allows all IPv4 traffic to reach the nodes.

  • The spec: loadBalancerIP enables you to choose a specific IP address for the load balancer. The IP address must not be in use by another internal TCP/UDP load balancer or Service. If omitted, an ephemeral IP is assigned. For more information, see Reserving a Static Internal IP Address.

For more information about configuring loadBalancerSourceRanges to restrict access to your internal TCP/UDP load balancer, refer to Configure Your Cloud Provider's Firewalls. For more information about the Service specification, see the Service API reference.

Using all ports

If you create an internal TCP/UDP load balancer by using an annotated Service, there is no way to set up a forwarding rule that uses all ports. However, if you create an internal TCP/UDP load balancer manually, you can choose your Google Kubernetes Engine nodes' instance group as the backend. Kubernetes Services of type: NodePort are available through the ILB.

Restrictions for internal TCP/UDP load balancers

  • For clusters running Kubernetes version 1.7.3 and earlier, you could only use internal TCP/UDP load balancers with auto-mode subnets, but with Kubernetes version 1.7.4 and later, you can use internal load balancers with custom-mode subnets in addition to auto-mode subnets.
  • For clusters running Kubernetes 1.7.X or later, while the clusterIP remains unchanged, internal TCP/UDP load balancers cannot use reserved IP addresses. The spec.loadBalancerIP field can still be defined using an unused IP address to assign a specific internal IP. Changes made to ports, protocols, or session affinity may cause these IP addresses to change.

Restrictions for internal UDP load balancers

  • Internal UDP load balancers do not support using sessionAffinity: ClientIP.

Limits

A Kubernetes Service with type: Loadbalancer and the cloud.google.com/load-balancer-type: Internal annotation creates an ILB that targets the Kubernetes Service. The number of such Services is limited by the number of internal forwarding rules that you can create in a VPC network. For details, see Per network limits.

In a GKE cluster, an internal forwarding rule points to all the nodes in the cluster. Each node in the cluster is a backend VM for the ILB. The maximum number of backend VMs for an ILB is 250, regardless of how the VMs are associated with instance groups. So the maximum number of nodes in a GKE cluster with an ILB is 250. If you have autoscaling enabled for your cluster, you must ensure that autoscaling does not scale your cluster beyond 250 nodes.

For more information about these limits, see VPC Resource Quotas.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Kubernetes Engine Documentation