Internal TCP/UDP Load Balancing

This page explains how to create a Compute Engine internal TCP/UDP load balancer on Google Kubernetes Engine.

Overview

Internal TCP/UDP Load Balancing makes your cluster's services accessible to applications outside of your cluster that use the same VPC network and are located in the same Google Cloud region. For example, suppose you have a cluster in the us-west1 region and you need to make one of its services accessible to Compute Engine VM instances running in that region on the same VPC network.

You can create an internal TCP/UDP load balancer by creating a Service resource with the cloud.google.com/load-balancer-type: "Internal" annotation and a type: LoadBalancer specification. The instructions and example below highlight how to do this.

Without Internal TCP/UDP Load Balancing, you would need to set up an external load balancer and firewall rules to make the application accessible outside of the cluster.

Internal TCP/UDP Load Balancing creates an internal IP address for the Service that receives traffic from clients in the same VPC network and compute region. If you enable global access, clients in any region of the same VPC network can access the Service.

Pricing

You are charged per Compute Engine's pricing model. For more information, see Load balancing and forwarding rules pricing and the Compute Engine page on the Google Cloud pricing calculator.

Before you begin

Before you start, make sure you have performed the following tasks:

Set up default gcloud settings using one of the following methods:

  • Using gcloud init, if you want to be walked through setting defaults.
  • Using gcloud config, to individually set your project ID, zone, and region.

Using gcloud init

If you receive the error One of [--zone, --region] must be supplied: Please specify location, complete this section.

  1. Run gcloud init and follow the directions:

    gcloud init

    If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

    gcloud init --console-only
  2. Follow the instructions to authorize gcloud to use your Google Cloud account.
  3. Create a new configuration or select an existing one.
  4. Choose a Google Cloud project.
  5. Choose a default Compute Engine zone.

Using gcloud config

  • Set your default project ID:
    gcloud config set project project-id
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone compute-zone
  • If you are working with regional clusters, set your default compute region:
    gcloud config set compute/region compute-region
  • Update gcloud to the latest version:
    gcloud components update

Create a Deployment

The following manifest describes a Deployment that runs 3 replicas of a Hello World app.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
spec:
  selector:
    matchLabels:
      app: hello
  replicas: 3
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello
        image: "gcr.io/google-samples/hello-app:2.0"

The source code and Dockerfile for this sample app is available on GitHub. Since no PORT environment variable is specified, the containers listen on the default port: 8080.

To create the Deployment, create the file my-deployment.yaml from the manifest, and then run the following command in your shell or terminal window:

kubectl apply -f my-deployment.yaml

Create an internal TCP load balancer

The following sections explain how to create an internal TCP load balancer using a Service.

Writing the Service configuration file

The following is an example of a Service that creates an internal TCP load balancer:

apiVersion: v1
kind: Service
metadata:
  name: ilb-service
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
  labels:
    app: hello
spec:
  type: LoadBalancer
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

Minimum Service requirements

Your manifest must contain the following:

  • A name for the Service, in this case ilb-service.
  • The annotation: cloud.google.com/load-balancer-type: "Internal", which specifies that an internal TCP/UDP load balancer is to be configured.
  • The type: LoadBalancer.
  • A spec: selector field to specify the Pods the Service should target, for example, app: hello.
  • The port, the port over which the Service is exposed, and targetPort, the port on which the containers are listening.

Deploying the Service

To create the internal TCP load balancer, create the file my-service.yaml from the manifest, and then run the following command in your shell or terminal window:

kubectl apply -f my-service.yaml

Inspecting the Service

After deployment, inspect the Service to verify that it has been configured successfully.

Get detailed information about the Service:

kubectl get service ilb-service --output yaml

In the output, you can see the internal load balancer's IP address under status.loadBalancer.ingress. Notice that this is different from the value of clusterIP. In this example, the load balancer's IP address is 10.128.15.193:

apiVersion: v1
kind: Service
metadata:
  ...
  labels:
    app: hello
  name: ilb-service
  ...
spec:
  clusterIP: 10.0.9.121
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30835
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: hello
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 10.128.15.193

Any Pod that has the label app: hello is a member of this Service. These are the Pods that can be the final recipients of requests sent to your internal load balancer.

Clients call the Service by using the loadBalancer IP address and the TCP port specified in the port field of the Service manifest. The request is forwarded to one of the member Pods on the TCP port specified in the targetPort field. So for the preceding example, a client calls the Service at 10.128.15.193 on TCP port 80. The request is forwarded to one of the member Pods on TCP port 8080. Note that the member Pod must have a container listening on port 8080.

The nodePort value of 30835 is extraneous; it is not relevant to your internal load balancer.

Viewing the load balancer's forwarding rule

An internal load balancer is implemented as a forwarding rule. The forwarding rule has a backend service, which has an instance group.

The internal load balancer address, 10.128.15.193 in the preceding example, is the same as the forwarding rule address. To see the forwarding rule that implements your internal load balancer, start by listing all of the forwarding rules in your project:

gcloud compute forwarding-rules list --filter="loadBalancingScheme=INTERNAL"

In the output, look for the forwarding rule that has the same address as your internal load balancer, 10.128.15.193 in this example.

NAME                          ... IP_ADDRESS  ... TARGET
...
aae3e263abe0911e9b32a42010a80008  10.128.15.193   us-central1/backendServices/aae3e263abe0911e9b32a42010a80008

The output shows the associated backend service, ae3e263abe0911e9b32a42010a80008 in this example.

Describe the backend service:

gcloud compute backend-services describe aae3e263abe0911e9b32a42010a80008 --region us-central1

The output shows the associated instance group, k8s-ig--2328fa39f4dc1b75 in this example:

backends:
- balancingMode: CONNECTION
  group: .../us-central1-a/instanceGroups/k8s-ig--2328fa39f4dc1b75
...
kind: compute#backendService
loadBalancingScheme: INTERNAL
name: aae3e263abe0911e9b32a42010a80008
...

How the Service abstraction works

When a packet is handled by your forwarding rule, the packet gets forwarded to one of your cluster nodes. When the packet arrives at the cluster node, the addresses and port are as follows:

Destination IP address Forwarding rule, 10.128.15.193 in this example
Destination TCP port Service port field, 80 in this example

Note that the forwarding rule (that is, your internal load balancer) does not change the destination IP address or destination port. Instead, iptables rules on the cluster node route the packet to an appropriate Pod. The iptables rules change the destination IP address to a Pod IP address and the destination port to the targetPort value of the Service, 8080 in this example.

Verifying the internal TCP load balancer

SSH into a VM instance, and run the following command:

curl load-balancer-ip

Where load-balancer-ip is your LoadBalancer Ingress IP address.

The response shows the output of hello-app:

Hello, world!
Version: 2.0.0
Hostname: hello-app-77b45987f7-pw54n

Running the command from outside of the same VPC network or outside the same region results in a timed out error. If you configure global access, clients in any region in the same VPC network can access the load balancer.

Cleaning up

You can delete the Deployment and Service using kubectl delete or Cloud Console.

kubectl

Delete the Deployment

To delete the Deployment, run the following command:

kubectl delete deployment hello-app

Delete the Service

To delete the Service, run the following command:

kubectl delete service ilb-service

Console

Delete the Deployment

To delete the Deployment, perform the following steps:

  1. Visit the Google Kubernetes Engine Workloads menu in Cloud Console.

    Visit the Workloads menu

  2. From the menu, select the desired workload.

  3. Click Delete.

  4. From the confirmation dialog menu, click Delete.

Delete the Service

To delete the Service, perform the following steps:

  1. Visit the Google Kubernetes Engine Services menu in Cloud Console.

    Visit the Services menu

  2. From the menu, select the desired Service.

  3. Click Delete.

  4. From the confirmation dialog menu, click Delete.

Considerations for existing Ingresses

You cannot have both an internal TCP/UDP load balancer and an Ingress that uses balancing mode UTILIZATION. To use both an Ingress and internal TCP/UDP load balancing, the Ingress must use the balancing mode RATE.

If your cluster has an existing Ingress resource created with Kubernetes version 1.7.1 or lower, it is not compatible with internal TCP/UDP load balancers. Earlier BackendService resources created by Kubernetes Ingress Resource objects were created with no balancing mode specified. By default, the API used balancing mode UTILIZATION for HTTP load balancers. However, internal TCP/UDP load balancers cannot be pointed to instance groups with other load balancers using UTILIZATION.

Determining your Ingress balancing mode

To determine what your Ingress balancing mode is, run the following commands from your shell or terminal window:

GROUPNAME=`kubectl get configmaps ingress-uid -o jsonpath='k8s-ig--{.data.uid}' --namespace=kube-system`
gcloud compute backend-services list --format="table(name,backends[].balancingMode,backends[].group)" | grep $GROUPNAME

These commands export a shell variable, GROUPNAME, which fetches your cluster's instance group name. Then, your project's Compute Engine backend service resources are polled and the results are narrowed down based on the contents of $GROUPNAME.

The output is similar to the following:

k8s-be-31210--...  [u'RATE']       us-central1-b/instanceGroups/k8s-ig--...
k8s-be-32125--...  [u'RATE']       us-central1-b/instanceGroups/k8s-ig--...

If the output returns RATE entries or zero entries are returned, then internal load balancers are compatible and no additional work is needed.

If the output returns entries marked with UTILIZATION, your Ingresses are not compatible.

To update your Ingress resources to be compatible with an internal TCP/UDP load balancer, you can create a new cluster running Kubernetes version 1.7.2 or higher, then migrate your services to that cluster.

Service parameters

The following parameters are supported for GKE internal LoadBalancer Services.

Feature Summary Service Field GKE Version Support
Local External Traffic Policy Configures whether or not external traffic is load balanced across GKE nodes. spec:externalTrafficPolicy:Local GKE 1.14+
Load Balancer Source Ranges Configures optional firewall rules in GKE and in the VPC to only allow certain source ranges. spec:loadBalancerSourceRanges All supported versions
Load Balancer IP Specifies an IP for the load balancers spec:loadBalancerIP All supported versions
Load Balancer Subnet Specifies from which subnet the load balancer should automatically provision an IP metadata:annotations: networking.gke.io/internal-load-balancer-subnet Beta in GKE 1.17+ and 1.16.8-gke.10+
Global Access Allows the TCP/UDP load balancer VIP to be accessible by clients across GCP regions metadata:annotations: networking.gke.io/internal-load-balancer-allow-global-access Beta in GKE 1.16+
All-ports The ability for the TCP/UDP load balancer to forward all ports instead of specific ports N/A No native support

External traffic policy

The externalTrafficPolicy is a standard Service option that defines how and whether traffic incoming to a GKE node is load balanced. Cluster is the default policy but Local is often used to preserve the source IP of traffic coming into a cluster node. Local effectively disables load balancing on the cluster node so that traffic that is received by a local Pod sees the original source address.

externalTrafficPolicy is supported for internal LoadBalancer Services (via the TCP/UDP load balancer), but load balancing behavior depends on where traffic originates from and the configured traffic policy.

For traffic sourced from outside the cluster to a TCP/UDP load balancer it will have the following behavior if there is at least one healthy Pod of the Service in the cluster:

  • Cluster policy: Traffic will be load balanced to any healthy GKE node in the cluster and then the kube-proxy will send it to a node with the Pod.
  • Local policy: Nodes that do not have one of the backend Pods will appear as unhealthy to the TCP/UDP load balancer. Traffic will only be sent to one of the remaining healthy cluster nodes which has the Pod. Traffic is not routed again by the kube-proxy and instead will be sent directly to the local Pod with its IP header information intact.

If traffic to a given LoadBalancer Service IP is sourced from a GKE node inside the cluster, there is a different traffic behavior. The following table summarizes the traffic behavior for traffic sourced by a node or Pod inside the cluster destined for a member Pod of a LoadBalancer Service:

externalTrafficPolicy Service member Pod running on same node where traffic originates? Traffic behavior
Cluster Yes Packets are delivered to any member Pod, either on the node or on a different node.
Cluster No Packets are delivered to any member Pod, which must be on a different node.
Local Yes Packets are delivered to any member Pod on the same node.
Local No

Kubernetes 1.14 and earlier: Packets are dropped.

Kubernetes 1.15 and later: Packets are delivered to any member Pod, which must be on a different node.

Load balancer source ranges

The spec: loadBalancerSourceRanges array specifies one or more internal IP address ranges. loadBalancerSourceRanges restricts traffic through the load balancer to the IPs specified in this field. With this configuration, kube-proxy creates the corresponding iptables rules in Kubernetes nodes. GKE also creates a firewall rule in your VPC network automatically. If you omit this field, your Service accepts traffic from any IP address (0.0.0.0/0).

For more information about the Service specification, see the Service API reference.

Load balancer IP

The spec: loadBalancerIP enables you to choose a specific IP address for the load balancer. The IP address must not be in use by another internal TCP/UDP load balancer or Service. If omitted, an ephemeral IP is assigned. For more information, see Reserving a Static Internal IP Address.

Load balancer subnet (Beta)

By default, GKE will deploy an internal TCP/UDP load balancer using the node subnet range. The subnet can be user-specified on a per-Service basis using the networking.gke.io/internal-load-balancer-subnet annotation. This is useful for separately firewalling the internal load balancer IPs from node IPs or for sharing the same Service subnet across multiple GKE clusters. This parameter is only relevant for the internal TCP/UDP LoadBalancer Services.

The subnet must exist before it is referenced by the Service resource as GKE does not manage the lifecycle of the subnet itself. The subnet must also be in the same VPC and region as the GKE cluster. In this step it's created out of band from GKE:

gcloud compute networks subnets create gke-vip-subnet \
    --network=default \
    --range=10.23.0.0/24 \
    --region=us-central1

The following Service definition uses the internal-load-balancer-subnet to reference the subnet by name. By default an available IP from the subnet will automatically be chosen. You can also specify the loadBalancerIP but it must be part of the referenced subnet.

There are multiple ways to share this internal load balancer subnet to achieve different use cases:

  • Multiple subnets for groups of Services in the same cluster
  • A single subnet for all Services in a cluster
  • A single subnet shared across multiple clusters and multiple Services
apiVersion: v1
kind: Service
metadata:
  name: ilb-service
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
    networking.gke.io/internal-load-balancer-subnet: "gke-vip-subnet"
  labels:
    app: hello
spec:
  type: LoadBalancer
  loadBalancerIP: 10.23.0.15
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

Global access (Beta)

Global access is an optional parameter for internal LoadBalancer Services that allows clients from any region in your VPC network to access the internal TCP/UDP load balancer. Without global access, traffic originating from clients in your VPC network must be in the same region as the load balancer. Global access allows clients in any region to access the load balancer. Backend instances must still be located in the same region as the load balancer.

Global access is enabled per-Service using the following annotation: networking.gke.io/internal-load-balancer-allow-global-access: "true".

Global access is not supported with legacy networks. Normal inter-region traffic costs apply when using global access across regions. Refer to Network pricing for information about network pricing for egress between regions. Global access is available in Beta on GKE clusters 1.16 and up.

Shared IP (Beta)

The Internal TCP/UDP Load Balancer allows the sharing of a Virtual IP address amongst multiple forwarding rules. This is useful for expanding the number of simultaneous ports on the same IP or for accepting UDP and TCP traffic on the same IP. It allows up to a maximum of 50 exposed ports per IP address. Shared IPs are supported natively on GKE clusters with internal LoadBalancer Services. When deploying, the Service's loadBalancerIP field is used to indicate which IP should be shared across Services.

Limitations

A shared IP for multiple load balancers has the following limitations and capabilities:

  • Each Service (or forwarding rule) can have a maximum of five ports.
  • A maximum of ten Services (forwarding rules) can share an IP address. This results in a maximum of 50 ports per shared IP.
  • Protocol/port tuples cannot overlap between Services that share the same IP.
  • A combination of TCP-only and UDP-only Services is supported on the same shared IP, however you cannot expose both TCP and UDP ports in the same Service.

Enabling Shared IP

To enable an internal LoadBalancer Services to share a common IP, follow these steps:

  1. Create a static internal IP with --purpose SHARED_LOADBALANCER_VIP. An IP address must be created with this purpose to enable its ability to be shared.

  2. Deploy up to ten internal LoadBalancer Services using this static IP in the loadBalancerIP field. The Internal TCP/UDP Load Balancers are reconciled by the GKE service controller and deploy using the same frontend IP.

The following example demonstrates how this is done to support multiple TCP and UDP ports against the same internal load balancer IP.

  1. Create a static IP in the same region as your GKE cluster. The subnet must be the same subnet that the load balancer uses, which by default is the same subnet that is used by the GKE cluster node IPs.

    gcloud beta compute addresses create internal-10-128-2-98 \
        --region=us-central1 \
        --subnet=default \
        --addresses=10.128.2.98 \
        --purpose=SHARED_LOADBALANCER_VIP
    
  2. Save the following TCP Service configuration to a file named tcp-service.yaml and then deploy to your cluster. It uses the shared IP 10.128.2.98.

    apiVersion: v1
    kind: Service
    metadata:
      name: tcp-service
      namespace: default
      annotations:
        cloud.google.com/load-balancer-type: "Internal"
    spec:
      type: LoadBalancer
      loadBalancerIP: 10.128.2.98
      selector:
        app: myapp
      ports:
      - name: 8001-to-8001
        protocol: TCP
        port: 8001
        targetPort: 8001
      - name: 8002-to-8002
        protocol: TCP
        port: 8002
        targetPort: 8002
      - name: 8003-to-8003
        protocol: TCP
        port: 8003
        targetPort: 8003
      - name: 8004-to-8004
        protocol: TCP
        port: 8004
        targetPort: 8004
      - name: 8005-to-8005
        protocol: TCP
        port: 8005
        targetPort: 8005
    
  3. Apply this Service definition against your cluster:

    kubectl apply -f tcp-service.yaml
    
  4. Save the following UDP Service configuration to a file named udp-service.yaml and then deploy it. It also uses the shared IP 10.128.2.98.

    apiVersion: v1
    kind: Service
    metadata:
      name: udp-service
      namespace: default
      annotations:
        cloud.google.com/load-balancer-type: "Internal"
    spec:
      type: LoadBalancer
      loadBalancerIP: 10.128.2.98
      selector:
        app: my-udp-app
      ports:
      - name: 9001-to-9001
        protocol: UDP
        port: 9001
        targetPort: 9001
      - name: 9002-to-9002
        protocol: UDP
        port: 9002
        targetPort: 9002
    
  5. Apply this file against your cluster:

    kubectl apply -f udp-service.yaml
    
  6. Validate that the VIP is shared amongst LB forwarding rules by listing them out and filtering for the static IP. This shows that there is a UDP and a TCP forwarding rule both listening across seven different ports on the shared IP address 10.128.2.98.

    gcloud compute forwarding-rules list | grep 10.128.2.98
    ab4d8205d655f4353a5cff5b224a0dde                         us-west1   10.128.2.98     UDP          us-west1/backendServices/ab4d8205d655f4353a5cff5b224a0dde
    acd6eeaa00a35419c9530caeb6540435                         us-west1   10.128.2.98     TCP          us-west1/backendServices/acd6eeaa00a35419c9530caeb6540435
    

All-ports

If you create an internal TCP/UDP load balancer by using an annotated Service, there is no way to set up a forwarding rule that uses all ports. However, if you create an internal TCP/UDP load balancer manually, you can choose your Google Kubernetes Engine nodes' instance group as the backend. Kubernetes Services of type: NodePort are available through the ILB.

Restrictions for internal TCP/UDP load balancers

  • For clusters running Kubernetes version 1.7.3 and earlier, you could only use internal TCP/UDP load balancers with auto-mode subnets, but with Kubernetes version 1.7.4 and later, you can use internal load balancers with custom-mode subnets in addition to auto-mode subnets.
  • For clusters running Kubernetes 1.7.X or later, while the clusterIP remains unchanged, internal TCP/UDP load balancers cannot use reserved IP addresses. The spec.loadBalancerIP field can still be defined using an unused IP address to assign a specific internal IP. Changes made to ports, protocols, or session affinity may cause these IP addresses to change.

Restrictions for internal UDP load balancers

  • Internal UDP load balancers do not support using sessionAffinity: ClientIP.

Limits

A Kubernetes Service with type: Loadbalancer and the cloud.google.com/load-balancer-type: Internal annotation creates an ILB that targets the Kubernetes Service. The number of such Services is limited by the number of internal forwarding rules that you can create in a VPC network. For details, see Per network limits.

In a GKE cluster, an internal forwarding rule points to all the nodes in the cluster. Each node in the cluster is a backend VM for the ILB. The maximum number of backend VMs for an ILB is 250, regardless of how the VMs are associated with instance groups. So the maximum number of nodes in a GKE cluster with an ILB is 250. If you have autoscaling enabled for your cluster, you must ensure that autoscaling does not scale your cluster beyond 250 nodes.

For more information about these limits, see VPC Resource Quotas.

What's next