Using an internal TCP/UDP load balancer

This page explains how to create an internal TCP/UDP load balancer on Google Kubernetes Engine (GKE).

Overview

Internal TCP/UDP Load Balancing makes your cluster's services accessible to applications outside of your cluster that use the same VPC network and are located in the same Google Cloud region. For example, suppose you have a cluster in the us-west1 region and you need to make one of its services accessible to Compute Engine virtual machine (VM) instances running in that region on the same VPC network.

Without Internal TCP/UDP Load Balancing, you would need to set up an external load balancer and firewall rules to make the application accessible outside the cluster.

Annotation

You can create an internal TCP/UDP load balancer by creating a Service resource with a type: LoadBalancer specification and an annotation. The annotation depends on the version of your GKE cluster.

For GKE versions 1.17 and later, use the annotation networking.gke.io/load-balancer-type: "Internal".

For earlier versions, use the annotation cloud.google.com/load-balancer-type: "Internal".

Architecture

Internal TCP/UDP Load Balancing creates an internal IP address for the Service that receives traffic from clients in the same VPC network and compute region. If you enable global access, clients in any region of the same VPC network can access the Service. In addition, clients in a VPC network connected to the LoadBalancer network using VPC Network Peering can also access the Service.

The GKE Ingress controller deploys and manages load balancing resources. To learn more about the GKE Ingress controller, see Summary of GKE Ingress controller behavior.

Pricing

You are charged per Compute Engine's pricing model. For more information, see Load balancing and forwarding rules pricing and the Compute Engine page on the Google Cloud pricing calculator.

Before you begin

Before you start, make sure you have performed the following tasks:

Set up default gcloud settings using one of the following methods:

  • Using gcloud init, if you want to be walked through setting defaults.
  • Using gcloud config, to individually set your project ID, zone, and region.

Using gcloud init

If you receive the error One of [--zone, --region] must be supplied: Please specify location, complete this section.

  1. Run gcloud init and follow the directions:

    gcloud init

    If you are using SSH on a remote server, use the --console-only flag to prevent the command from launching a browser:

    gcloud init --console-only
  2. Follow the instructions to authorize gcloud to use your Google Cloud account.
  3. Create a new configuration or select an existing one.
  4. Choose a Google Cloud project.
  5. Choose a default Compute Engine zone for zonal clusters or a region for regional or Autopilot clusters.

Using gcloud config

  • Set your default project ID:
    gcloud config set project PROJECT_ID
  • If you are working with zonal clusters, set your default compute zone:
    gcloud config set compute/zone COMPUTE_ZONE
  • If you are working with Autopilot or regional clusters, set your default compute region:
    gcloud config set compute/region COMPUTE_REGION
  • Update gcloud to the latest version:
    gcloud components update

Create a Deployment

The following manifest describes a Deployment that runs 3 replicas of a Hello World app.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
spec:
  selector:
    matchLabels:
      app: hello
  replicas: 3
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello
        image: "us-docker.pkg.dev/google-samples/containers/gke/hello-app:2.0"

The source code and Dockerfile for this sample app is available on GitHub. Since no PORT environment variable is specified, the containers listen on the default port: 8080.

To create the Deployment, create the file my-deployment.yaml from the manifest, and then run the following command in your shell or terminal window:

kubectl apply -f my-deployment.yaml

Create an internal TCP load balancer

The following sections explain how to create an internal TCP load balancer using a Service.

Writing the Service configuration file

The following is an example of a Service that creates an internal TCP load balancer:

apiVersion: v1
kind: Service
metadata:
  name: ilb-service
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
  labels:
    app: hello
spec:
  type: LoadBalancer
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

Minimum Service requirements

Your manifest must contain the following:

  • A name for the Service, in this case ilb-service.
  • An annotation that specifies an internal TCP/UDP load balancer. The annotation depends on the version of your GKE cluster. For GKE versions 1.17 and later, use the annotation networking.gke.io/load-balancer-type: "Internal". For earlier versions, use the annotation cloud.google.com/load-balancer-type: "Internal".
  • The type: LoadBalancer.
  • A spec: selector field to specify the Pods the Service should target, for example, app: hello.
  • The port, the port over which the Service is exposed, and targetPort, the port on which the containers are listening.

Deploying the Service

To create the internal TCP load balancer, create the file my-service.yaml from the manifest, and then run the following command in your shell or terminal window:

kubectl apply -f my-service.yaml

Inspecting the Service

After deployment, inspect the Service to verify that it has been configured successfully.

Get detailed information about the Service:

kubectl get service ilb-service --output yaml

In the output, you can see the internal load balancer's IP address under status.loadBalancer.ingress. Notice that this is different from the value of clusterIP. In this example, the load balancer's IP address is 10.128.15.193:

apiVersion: v1
kind: Service
metadata:
  ...
  labels:
    app: hello
  name: ilb-service
  ...
spec:
  clusterIP: 10.0.9.121
  externalTrafficPolicy: Cluster
  ports:
  - nodePort: 30835
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: hello
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 10.128.15.193

Any Pod that has the label app: hello is a member of this Service. These are the Pods that can be the final recipients of requests sent to your internal load balancer.

Clients call the Service by using the loadBalancer IP address and the TCP port specified in the port field of the Service manifest. The request is forwarded to one of the member Pods on the TCP port specified in the targetPort field. So for the preceding example, a client calls the Service at 10.128.15.193 on TCP port 80. The request is forwarded to one of the member Pods on TCP port 8080. Note that the member Pod must have a container listening on port 8080.

The nodePort value of 30835 is extraneous; it is not relevant to your internal load balancer.

Viewing the load balancer's forwarding rule

An internal load balancer is implemented as a forwarding rule. The forwarding rule has a backend service, which has an instance group.

The internal load balancer address, 10.128.15.193 in the preceding example, is the same as the forwarding rule address. To see the forwarding rule that implements your internal load balancer, start by listing all of the forwarding rules in your project:

gcloud compute forwarding-rules list --filter="loadBalancingScheme=INTERNAL"

In the output, look for the forwarding rule that has the same address as your internal load balancer, 10.128.15.193 in this example.

NAME                          ... IP_ADDRESS  ... TARGET
...
aae3e263abe0911e9b32a42010a80008  10.128.15.193   us-central1/backendServices/aae3e263abe0911e9b32a42010a80008

The output shows the associated backend service, ae3e263abe0911e9b32a42010a80008 in this example.

Describe the backend service:

gcloud compute backend-services describe aae3e263abe0911e9b32a42010a80008 --region us-central1

The output shows the associated instance group, k8s-ig--2328fa39f4dc1b75 in this example:

backends:
- balancingMode: CONNECTION
  group: .../us-central1-a/instanceGroups/k8s-ig--2328fa39f4dc1b75
...
kind: compute#backendService
loadBalancingScheme: INTERNAL
name: aae3e263abe0911e9b32a42010a80008
...

How the Service abstraction works

When a packet is handled by your forwarding rule, the packet gets forwarded to one of your cluster nodes. When the packet arrives at the cluster node, the addresses and port are as follows:

Destination IP address Forwarding rule, 10.128.15.193 in this example
Destination TCP port Service port field, 80 in this example

Note that the forwarding rule (that is, your internal load balancer) does not change the destination IP address or destination port. Instead, iptables rules on the cluster node route the packet to an appropriate Pod. The iptables rules change the destination IP address to a Pod IP address and the destination port to the targetPort value of the Service, 8080 in this example.

Verifying the internal TCP load balancer

SSH into a VM instance, and run the following command:

curl LOAD_BALANCER_IP

Replace LOAD_BALANCER_IP with your LoadBalancer Ingress IP address.

The response shows the output of hello-app:

Hello, world!
Version: 2.0.0
Hostname: hello-app-77b45987f7-pw54n

Running the command from outside of the same VPC network or outside the same region results in a timed out error. If you configure global access, clients in any region in the same VPC network can access the load balancer.

Cleaning up

You can delete the Deployment and Service using kubectl delete or Cloud Console.

kubectl

Delete the Deployment

To delete the Deployment, run the following command:

kubectl delete deployment hello-app

Delete the Service

To delete the Service, run the following command:

kubectl delete service ilb-service

Console

Delete the Deployment

To delete the Deployment, perform the following steps:

  1. Go to the Workloads page in Cloud Console.

    Go to Workloads

  2. Select the Deployment you want to delete, then click Delete.

  3. When prompted to confirm, select the Delete Horizontal Pod Autoscaler associated with selected Deployment checkbox, then click Delete.

Delete the Service

To delete the Service, perform the following steps:

  1. Go to the Services & Ingress page in Cloud Console.

    Go to Services & Ingress

  2. Select the Service you want to delete, then click Delete.

  3. When prompted to confirm, click Delete.

Using internal TCP/UDP load balancer subsetting

Internal load balancer subsetting for GKE improves the scalability of internal TCP/UDP load balancers by partitioning backends into smaller, overlapping groups. With subsetting, you can configure internal TCP/UDP load balancers on clusters with more than 250 nodes.

You can enable subsetting when you create a cluster and by editing an existing cluster.

Architecture

Subsetting changes how internal TCP/UDP load balancers are deployed. Without subsetting, the GKE controller places all nodes of a cluster into one or more zonal unmanaged instance groups, which are shared by all internal load balancers in the GKE cluster. For example, all internal TCP/UDP load balancers in a 40-node GKE cluster share the same 40 nodes as backends.

With internal TCP/UDP load balancer subsetting, the GKE controller places nodes into GCE_VM_IP zonal network endpoint groups (NEGs). Unlike instance groups, nodes can be members of more than one zonal NEG, and each of the zonal NEGs can be referenced by an internal TCP/UDP load balancer. The GKE controller creates a NEG for each service using a subset of the GKE nodes as members. For example, a 40-node GKE cluster might have one internal TCP/UDP load balancer with 25 nodes in a backend zonal NEG and another internal TCP/UDP load balancer with 25 nodes in a different backend zonal NEG.

The following diagram shows two different services in a cluster that has internal TCP/UDP load balancer subsetting enabled. Each service has two pods scheduled across three nodes. Google Cloud creates a GCE_VM_IP NEG for each service. The GKE controller selects a subset of the nodes in the cluster to be members of the NEG and uses the IP address of each selected node as endpoints.

Node virtual IP addresses used as endpoints for services when subsetting.
Diagram: Example of subsetting in a cluster with services.

Backend node subset selection

When you enable subsetting for your cluster, the GKE controller automatically determines how to subset nodes. You can use the Local or Cluster value for externalTrafficPolicy, but the backend node subset selection differs for each value.

  • externalTrafficPolicy: Cluster: client requests are sent to the node IP and load balanced to a backend pod. The backend pod can be on the same node or on a different node. The GKE controller selects a random subset of 25 nodes. This distribution prevents single points of failure. If backend pods are hosted on more than 25 nodes, those pods still receive traffic, but that traffic ingresses into the cluster through at most 25 nodes that are part of the subset. If the cluster has less than 25 nodes, then all nodes are part of the subset.

  • externalTrafficPolicy: Local: client requests are sent to the node IP and load balanced only to backend pods running on the same node. As a result, the subset of backend nodes only contains nodes which host one of the service objects pods. The subset size is the number of nodes that host pods from this service up to a maximum of 250 nodes. Do not schedule these services across more than 250 nodes as any additional nodes do not receive traffic from the load balancer.

Requirements and limitations

Subsetting for GKE has the following requirements and limitations:

  • You can enable subsetting in new and existing clusters in GKE versions 1.18.19-gke.1400 and later.
  • The cluster must have the HttpLoadBalancing add-on enabled. This add-on is enabled by default. A cluster that has disabled this add-on is unable to use subsetting. To learn how to run a custom Ingress controller with the HttpLoadBalancing add-on enabled, see Use a custom Ingress controller with the HttpLoadBalancing add-on enabled.
  • Cloud SDK version 345.0.0 and later.
  • Subsetting cannot be used with Autopilot clusters.
  • Quotas for Network Endpoint Groups apply. Google Cloud creates 1 NEG per internal TCP/UDP load balancer per zone.
  • Quotas for forwarding rules, backend services and other network resources apply.
  • Subsetting cannot be disabled once it is enabled in a cluster.
  • Subsetting cannot be used with the annotation to share backend services, alpha.cloud.google.com/load-balancer-backend-share.

Enabling internal load balancer subsetting in a new cluster

You can create a cluster with internal load balancer subsetting enabled using the gcloud command-line tool or the Cloud Console:

Console

  1. Go to the Google Kubernetes Engine page in the Cloud Console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. Configure your cluster as desired.

  4. From the navigation pane, under Cluster, click Networking.

  5. Select the Enable subsetting for L4 internal load balancers checkbox.

  6. Click Create.

gcloud

gcloud container clusters create CLUSTER_NAME \
    --cluster-version=VERSION \
    --enable-l4-ilb-subsetting \
    --region=COMPUTE_REGION

Replace the following:

  • CLUSTER_NAME: the name of the new cluster.
  • VERSION: the GKE version, which must be 1.18.19-gke.1400 or later. You can also use the --release-channel option to select a release channel. The release channel must have a default version 1.18.19-gke.1400 or later.
  • COMPUTE_REGION: the compute region for the cluster.

Enabling internal load balancer subsetting in an existing cluster

You can enable internal load balancer subsetting for an existing cluster using the gcloud tool or the Google Cloud Console. You cannot disable internal load balancer subsetting after you have enabled it in a cluster.

Console

  1. In the Cloud Console, go to the Google Kubernetes Engine page.

    Go to Google Kubernetes Engine

  2. In the cluster list, click the name of the cluster you want to modify.

  3. Under Networking, next to the Subsetting for L4 Internal Load Balancers field, click Enable subsetting for L4 internal load balancers.

  4. Select the Enable subsetting for L4 internal load balancers checkbox.

  5. Click Save Changes.

gcloud

gcloud container clusters update CLUSTER_NAME \
    --enable-l4-ilb-subsetting

Replace the following:

  • CLUSTER_NAME: the name of the cluster.

Verifying internal load balancer subsetting

To verify that internal load balancer subsetting is working correctly for your cluster, perform the following steps:

  1. Deploy a workload.

    The following manifest describes a Deployment that runs a sample web application container image. Save the manifest as ilb-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ilb-deployment
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: ilb-deployment
      template:
        metadata:
          labels:
            app: ilb-deployment
        spec:
          containers:
          - name: hello-app
            image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
    
  2. Apply the manifest to your cluster:

    kubectl apply -f ilb-deployment.yaml
    
  3. Create a Service.

    The following manifest describes a Service that creates an internal load balancer on TCP port 8080. Save the manifest as ilb-svc.yaml:

    apiVersion: v1
    kind: Service
    metadata:
      name: ilb-svc
      annotations:
        networking.gke.io/load-balancer-type: "Internal"
    spec:
      type: LoadBalancer
      externalTrafficPolicy: Cluster
      selector:
        app: ilb-deployment
      ports:
      - name: tcp-port
        protocol: TCP
        port: 8080
        targetPort: 8080
    
  4. Apply the manifest to your cluster:

    kubectl apply -f ilb-svc.yaml
    
  5. Inspect the Service:

    kubectl get svc ilb-svc -o=jsonpath="{.metadata.annotations.cloud\.google\.com/neg-status}"
    

    The output is similar to the following:

    {"network_endpoint_groups":{"0":"k8s2-knlc4c77-default-ilb-svc-ua5ugas0"},"zones":["us-central1-c"]}
    

    The response indicates that GKE has created a network endpoint group named k8s2-knlc4c77-default-ilb-svc-ua5ugas0. This annotation is present in services of type LoadBalancer that use GKE subsetting and is not present in Services that do not use subsetting.

Troubleshooting

To determine the list of nodes in a subset for a service, use the following command:

gcloud compute network-endpoint-groups list-network-endpoints NEG_NAME \
    --zone=COMPUTE_ZONE

Replace the following:

  • NEG_NAME: the name of the network endpoint group created by the GKE controller.
  • COMPUTE_ZONE: the compute zone of the network endpoint group to operate on.

To determine the list of healthy nodes for an internal TCP/UDP load balancer, use the following command:

gcloud compute backend-services get-health SERVICE_NAME \
    --region=COMPUTE_REGION

Replace the following:

  • SERVICE_NAME: the name of the backend service. This value is the same as the name of the network endpoint group created by the GKE controller.
  • COMPUTE_REGION: the compute region of the backend service to operate on.

Known Issues

Connection timeout every 10 minutes

Internal LoadBalancer Services created with Subsetting might observe traffic disruptions roughly every 10 minutes. This bug has been fixed in versions:

  • 1.18.19-gke.1700 and later
  • 1.19.10-gke.1000 and later
  • 1.20.6-gke.1000 and later

Creating an internal TCP/UDP load balancer with Private Service Connect

As a service producer, you can use service attachments to make your services available to service consumers in other VPC networks using Private Service Connect. You can create, manage, and delete service attachments using a ServiceAttachment custom resource.

Requirements and limitations

  • Limitations for Private Service Connect apply.
  • You can create a service attachment in GKE versions 1.21.4-gke.300 and later.
  • You cannot use the same subnet in multiple service attachment configurations.
  • You must create a GKE service that uses an internal TCP/UDP load balancer.

Creating a ServiceAttachment

  1. Create a subnet.

    You must create a new subnet for each ServiceAttachment.

    gcloud beta compute networks subnets create SUBNET_NAME \
        --project PROJECT_ID \
        --network NETWORK_NAME \
        --region REGION \
        --range SUBNET_RANGE \
        --purpose PRIVATE_SERVICE_CONNECT
    

    Replace the following:

    • SUBNET_NAME: the name of the new subnet.
    • PROJECT_ID: the ID of your Google Cloud project.
    • NETWORK_NAME: the name of the VPC network for the subnet.
    • REGION: the region for the new subnet. You must use the same region as the service that you create.
    • SUBNET_RANGE: the IP address range to use for the subnet.
  2. Deploy a workload.

    The following manifest describes a Deployment that runs a sample web application container image. Save the manifest as my-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: psc-ilb
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: psc-ilb
      template:
        metadata:
          labels:
            app: psc-ilb
        spec:
          containers:
          - name: whereami
            image: gcr.io/google-samples/whereami:v1.2.1
            ports:
              - name: http
                containerPort: 8080
            readinessProbe:
              httpGet:
                path: /healthz
                port: 8080
                scheme: HTTP
              initialDelaySeconds: 5
              timeoutSeconds: 1
    
  3. Apply the manifest to your cluster:

    kubectl apply -f my-deployment.yaml
    
  4. Create a service. The following manifest describes a service that creates an internal TCP/UDP load balancer on TCP port 8080. Save the manifest as my-service.yaml:

     apiVersion: v1
     kind: Service
     metadata:
       name: SERVICE_NAME
       annotations:
         networking.gke.io/load-balancer-type: "Internal"
     spec:
       type: LoadBalancer
       selector:
         app: psc-ilb
       ports:
       - port: 80
         targetPort: 8080
         protocol: TCP
    

    Replace the following:

    • SERVICE_NAME: the name of the new service.
  5. Apply the manifest to your cluster:

    kubectl apply -f my-service.yaml
    
  6. Create ServiceAttachment.

    The following manifest describes a ServiceAttachment that exposes the service that you created to service consumers. Save the manifest as my-psc.yaml:

    apiVersion: networking.gke.io/v1beta1
    kind: ServiceAttachment
    metadata:
     name: SERVICE_ATTACHMENT_NAME
     namespace: default
    spec:
     connectionPreference: ACCEPT_AUTOMATIC
     natSubnets:
     - SUBNET_NAME
     proxyProtocol: false
     resourceRef:
       kind: Service
       name: SERVICE_NAME
    

    Replace the following:

    • SERVICE_ATTACHMENT_NAME: the name of the new service attachment.

    The ServiceAttachment has the following fields:

    • connectionPreference: the connection preference that determines how customers connect to the service. You can either use automatic project approval using ACCEPT_AUTOMATIC or explicit project approval using ACCEPT_MANUAL. For more information, see Publishing services using Private Service Connect.
    • natSubnets: a list of subnetwork resource names to use for the service attachment.
    • proxyProtocol: when set to true, the consumer source IP and Private Service Connect connection ID are available in the requests. This field is optional and defaults to false if not provided.
    • consumerAllowList: the list of consumer projects that are allowed to connect to the ServiceAttachment. This field can only be used when connectionPreference is ACCEPT_MANUAL. For more information about this field, see Publishing services using Private Service Connect.
      • project: the project ID or number for the consumer project.
      • connectionLimit: the connection limit for the consumer project. This field is optional.
      • forceSendFields: the field names to send to include in API requests. This field is optional.
      • nullFields: the field names to include in API requests with a null value. This field is optional.
    • consumerRejectList: the list of consumer project IDs or numbers that are not allowed to connect to the ServiceAttachment. This field can only be used when connectionPreference is ACCEPT_MANUAL. For more information about this field, see Publishing services using Private Service Connect.
    • resourceRef: a reference to the Kubernetes resource.
      • kind: the type of Kubernetes resource. You must use Service.
      • name: the name of the Kubernetes resource that must be in the same namespace as the internal TCP/UDP load balancer.
  7. Apply the manifest to your cluster:

    kubectl apply -f my-psc.yaml
    
  8. Verify that the Private Service Connect controller created the service attachment:

    gcloud beta compute service-attachments list
    

    The output shows a service attachment with an automatically generated name:

    NAME        REGION       PRODUCER_FORWARDING_RULE          CONNECTION_PREFERENCE
    k8s1-sa-... us-central1  a3fea439c870148bdba5e59c9ea9451a  ACCEPT_AUTOMATIC
    

Viewing a ServiceAttachment

You can view the details of a ServiceAttachment using the following command:

kubectl describe serviceattachment SERVICE_ATTACHMENT_NAME /
    --project PROJECT_ID

The output is similar to the following:

 kubectl describe serviceattachment foo-sa
Name:        <sa-name>
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  networking.gke.io/v1beta1
Kind:         ServiceAttachment
Metadata:
  ...
Status:
  Forwarding Rule URL:      https://www.googleapis.com/compute/beta/projects/<project>/regions/<region>/forwardingRules/<fr-name>
  Last Modified Timestamp:  2021-07-08T01:32:39Z
  Service Attachment URL:   https://www.googleapis.com/compute/beta/projects/<projects>/regions/<region>/serviceAttachments/<gce-service-attachment-name>
Events:                     <none>

Consuming a ServiceAttachment

To consume your service from another project, perform the following steps:

  1. Get the URL of the ServiceAttachment:

    kubectl get serviceattachment SERVICE_ATTACHMENT_NAME -o=jsonpath="{.status.serviceAttachmentURL}"
    

    The output is similar to the following:

      serviceAttachmentURL: https://www.googleapis.com/compute/alpha/projects/<project>/region/<region>/serviceAttachments/k8s1-...my-sa
    
  2. Create a Private Service Connect endpoint using the URL of the ServiceAttachment.

  3. Verify that you can connect to the Service that you deployed in the producer project by using a curl command from a VM in the consumer project:

    curl PSC_IP_ADDRESS
    

    Replace PSC_IP_ADDRESS with the IP address of the forwarding rule in the consumer project.

    The output is similar to the following:

    {
      "cluster_name":"cluster",
      "host_header":"10.128.15.200",
      "node_name":"gke-psc-default-pool-be9b6e0e-dvxg.c.gke_project.internal",
      "pod_name":"foo-7bf648dcfd-l5jf8",
      "pod_name_emoji":"👚",
      "project_id":"gke_project",
      "timestamp":"2021-06-29T21:32:03",
      "zone":"us-central1-c"
    }
    

Updating a ServiceAttachment

You can update a ServiceAttachment using the following steps:

  1. Edit the ServiceAttachment manifest in my-psc.yaml:

    apiVersion: networking.gke.io/v1beta1
    kind: ServiceAttachment
    metadata:
      name: my-sa
      namespace: default
    spec:
      connectionPreference: ACCEPT_AUTOMATIC
      natSubnets:
      - my-nat-subnet
      proxyProtocol: false
      resourceRef:
        kind: Service
        name: ilb-service
    
  2. Apply the manifest to your cluster:

    kubectl apply -f my-psc.yaml
    

Cleaning up

You cannot delete an internal TCP/UDP load balancer that is connected to a service attachment. You must delete the service attachment and GKE Service separately.

  1. Delete the service attachment:

    kubectl delete serviceattachment SERVICE_ATTACHMENT_NAME --wait=false
    

    This command marks the service attachment for deletion, but the resource continues to exist. You can also wait for the deletion to finish by omitting the --wait flag.

  2. Delete the Service:

    kubectl delete svc SERVICE_NAME
    
  3. Delete the subnet:

    gcloud compute networks subnets delete SUBNET_NAME
    

Troubleshooting

You can view error messages using the following command:

kubectl get events -n NAMESPACE

Replace NAMESPACE with the namespace of the internal TCP/UDP load balancer.

An error message similar to the following occurs if you try to delete a internal TCP/UDP load balancer that is being used by a service attachment. You must delete the ServiceAttachment before you can delete the internal TCP/UDP load balancer.

Error syncing load balancer: failed to ensure load balancer: googleapi:
Error 400: The forwarding_rule resource '<fwd-rule-URL>' is already being used
by '<svc-attachment-URL>', resourceInUseByAnotherResource.

Service parameters

For more information about the load balancers parameters you can configure, see see Configuring TCP/UDP load balancing. In addition, internal LoadBalancer Services support the following additional parameters:

Feature Summary Service Field GKE Version Support
Load Balancer Subnet Specifies from which subnet the load balancer should automatically provision an IP metadata:annotations: networking.gke.io/internal-load-balancer-subnet Beta in GKE 1.17+ and 1.16.8-gke.10+
GA in GKE 1.17.9-gke.600+
Global Access Allows the internal TCP/UDP load balancer virtual IP address to be accessible by clients across Google Cloud regions metadata:annotations: networking.gke.io/internal-load-balancer-allow-global-access Beta in GKE 1.16+
GA in GKE 1.17.9-gke.600+

Load balancer subnet

By default, GKE will deploy an internal TCP/UDP load balancer using the node subnet range. The subnet can be user-specified on a per-Service basis using the networking.gke.io/internal-load-balancer-subnet annotation. This is useful for separately firewalling the internal load balancer IPs from node IPs or for sharing the same Service subnet across multiple GKE clusters. This parameter is only relevant for the internal TCP/UDP load balancer Services.

The subnet must exist before it is referenced by the Service resource as GKE does not manage the lifecycle of the subnet itself. The subnet must also be in the same VPC and region as the GKE cluster. In this step it's created out of band from GKE:

gcloud compute networks subnets create gke-vip-subnet \
    --network=default \
    --range=10.23.0.0/24 \
    --region=us-central1

The following Service definition uses the internal-load-balancer-subnet to reference the subnet by name. By default an available IP from the subnet will automatically be chosen. You can also specify the loadBalancerIP but it must be part of the referenced subnet.

There are multiple ways to share this internal load balancer subnet to achieve different use cases:

  • Multiple subnets for groups of Services in the same cluster
  • A single subnet for all Services in a cluster
  • A single subnet shared across multiple clusters and multiple Services
apiVersion: v1
kind: Service
metadata:
  name: ilb-service
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
    networking.gke.io/internal-load-balancer-subnet: "gke-vip-subnet"
  labels:
    app: hello
spec:
  type: LoadBalancer
  loadBalancerIP: 10.23.0.15
  selector:
    app: hello
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

Global access

Global access is an optional parameter for internal LoadBalancer Services that allows clients from any region in your VPC network to access the internal TCP/UDP load balancer. Without global access, traffic originating from clients in your VPC network must be in the same region as the load balancer. Global access allows clients in any region to access the load balancer. Backend instances must still be located in the same region as the load balancer.

Global access is enabled per-Service using the following annotation: networking.gke.io/internal-load-balancer-allow-global-access: "true".

Global access is not supported with legacy networks. Normal inter-region traffic costs apply when using global access across regions. Refer to Network pricing for information about network pricing for egress between regions. Global access is available in Beta on GKE clusters 1.16+ and GA on 1.17.9-gke.600+.

For on-premises clients, global access lets clients access the load balancer using Cloud VPN or Cloud Interconnect (VLAN) in any region. For more information, see Using Cloud VPN and Cloud Interconnect.

Shared IP

The internal TCP/UDP load balancer allows the sharing of a Virtual IP address amongst multiple forwarding rules. This is useful for expanding the number of simultaneous ports on the same IP or for accepting UDP and TCP traffic on the same IP. It allows up to a maximum of 50 exposed ports per IP address. Shared IPs are supported natively on GKE clusters with internal LoadBalancer Services. When deploying, the Service's loadBalancerIP field is used to indicate which IP should be shared across Services.

Limitations

A shared IP for multiple load balancers has the following limitations and capabilities:

  • Each Service (or forwarding rule) can have a maximum of five ports.
  • A maximum of ten Services (forwarding rules) can share an IP address. This results in a maximum of 50 ports per shared IP.
  • Protocol/port tuples cannot overlap between Services that share the same IP.
  • A combination of TCP-only and UDP-only Services is supported on the same shared IP, however you cannot expose both TCP and UDP ports in the same Service.

Enabling Shared IP

To enable an internal LoadBalancer Services to share a common IP, follow these steps:

  1. Create a static internal IP with --purpose SHARED_LOADBALANCER_VIP. An IP address must be created with this purpose to enable its ability to be shared. If you create the static internal IP address in a Shared VPC, you must create the IP address in the same service project as the instance that will use the IP address, even though the value of the IP address will come from the range of available IPs in a selected shared subnet of the Shared VPC network. Refer to reserving a static internal IP on the Provisioning Shared VPC page for more information.

  2. Deploy up to ten internal LoadBalancer Services using this static IP in the loadBalancerIP field. The internal TCP/UDP load balancers are reconciled by the GKE service controller and deploy using the same frontend IP.

The following example demonstrates how this is done to support multiple TCP and UDP ports against the same internal load balancer IP.

  1. Create a static IP in the same region as your GKE cluster. The subnet must be the same subnet that the load balancer uses, which by default is the same subnet that is used by the GKE cluster node IPs.

    If your cluster and the VPC network are in the same project:

    gcloud compute addresses create IP_ADDR_NAME \
        --project=PROJECT_ID \
        --subnet=SUBNET \
        --addresses=IP_ADDRESS \
        --region=COMPUTE_REGION \
        --purpose=SHARED_LOADBALANCER_VIP
    

    If your cluster is in a Shared VPC service project but uses a Shared VPC network in a host project:

    gcloud compute addresses create IP_ADDR_NAME \
        --project=SERVICE_PROJECT_ID \
        --subnet=projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNET \
        --addresses=IP_ADDRESS \
        --region=COMPUTE_REGION \
        --purpose=SHARED_LOADBALANCER_VIP
    

    Replace the following:

    • IP_ADDR_NAME: a name for the IP address object.
    • SERVICE_PROJECT_ID: the ID of the service project.
    • PROJECT_ID: the ID of your project (single project).
    • HOST_PROJECT_ID: the ID of the Shared VPC host project.
    • COMPUTE_REGION: the compute region containing the shared subnet.
    • IP_ADDRESS: an unused internal IP address from the selected subnet's primary IP address range. If you omit specifying an IP address, Google Cloud selects an unused internal IP address from the selected subnet's primary IP address range. To determine an automatically selected address, you'll need to run gcloud compute addresses describe.
    • SUBNET: the name of the shared subnet.
  2. Save the following TCP Service configuration to a file named tcp-service.yaml and then deploy to your cluster. Replace IP_ADDRESS with the IP address you chose in the previous step.

    apiVersion: v1
    kind: Service
    metadata:
      name: tcp-service
      namespace: default
      annotations:
        networking.gke.io/load-balancer-type: "Internal"
    spec:
      type: LoadBalancer
      loadBalancerIP: IP_ADDRESS
      selector:
        app: myapp
      ports:
      - name: 8001-to-8001
        protocol: TCP
        port: 8001
        targetPort: 8001
      - name: 8002-to-8002
        protocol: TCP
        port: 8002
        targetPort: 8002
      - name: 8003-to-8003
        protocol: TCP
        port: 8003
        targetPort: 8003
      - name: 8004-to-8004
        protocol: TCP
        port: 8004
        targetPort: 8004
      - name: 8005-to-8005
        protocol: TCP
        port: 8005
        targetPort: 8005
    
  3. Apply this Service definition against your cluster:

    kubectl apply -f tcp-service.yaml
    
  4. Save the following UDP Service configuration to a file named udp-service.yaml and then deploy it. It also uses the IP_ADDRESS that you specified in the previous step.

    apiVersion: v1
    kind: Service
    metadata:
      name: udp-service
      namespace: default
      annotations:
        networking.gke.io/load-balancer-type: "Internal"
    spec:
      type: LoadBalancer
      loadBalancerIP: IP_ADDRESS
      selector:
        app: my-udp-app
      ports:
      - name: 9001-to-9001
        protocol: UDP
        port: 9001
        targetPort: 9001
      - name: 9002-to-9002
        protocol: UDP
        port: 9002
        targetPort: 9002
    
  5. Apply this file against your cluster:

    kubectl apply -f udp-service.yaml
    
  6. Validate that the VIP is shared amongst load balancer forwarding rules by listing them out and filtering for the static IP. This shows that there is a UDP and a TCP forwarding rule both listening across seven different ports on the shared IP_ADDRESS, which in this example is 10.128.2.98.

    gcloud compute forwarding-rules list | grep 10.128.2.98
    ab4d8205d655f4353a5cff5b224a0dde                         us-west1   10.128.2.98     UDP          us-west1/backendServices/ab4d8205d655f4353a5cff5b224a0dde
    acd6eeaa00a35419c9530caeb6540435                         us-west1   10.128.2.98     TCP          us-west1/backendServices/acd6eeaa00a35419c9530caeb6540435
    

All ports

Internal forwarding rules support up to five ports per forwarding rule or an optional parameter --ports=ALL that forwards all ports on the forwarding rule.

Requirements

All ports on GKE has the following requirements and limitations:

  • Only supported when --enable-l4-ilb-subsetting is enabled.
  • Only supported for internal load balancer services.
  • Supports any number of ports across a maximum of 100 contiguous port ranges.

The GKE controller automatically enables all ports on the forwarding rule when a service has more than five ports. For example, the following service manifest has six ports configured across two contiguous ranges:

apiVersion: v1
kind: Service
metadata:
  name: all-ports
  annotations:
    networking.gke.io/load-balancer-type: "Internal"
spec:
  type: LoadBalancer
  selector:
    app: myapp
  ports:
  - port: 8081
    targetPort: 8081
    name: 8081-to-8081
    protocol: TCP
  - port: 8082
    targetPort: 8082
    name: 8082-to-8082
    protocol: TCP
  - port: 8083
    targetPort: 8083
    name: 8083-to-8083
    protocol: TCP
  - port: 9001
    targetPort: 9001
    name: 9001-to-9001
    protocol: TCP
  - port: 9002
    targetPort: 9002
    name: 9002-to-9002
    protocol: TCP
  - port: 9003
    targetPort: 9003
    name: 9003-to-9003
    protocol: TCP

The GKE controller enables all ports on the forwarding rule because the service has more than five ports. However, the GKE controller only creates firewall ports for the ports specified in the service. All other rules are blocked by VPC firewalls.

Restrictions for internal TCP/UDP load balancers

  • For clusters running Kubernetes version 1.7.4 and later, you can use internal load balancers with custom-mode subnets in addition to auto-mode subnets.
  • Clusters running Kubernetes version 1.7.X and later support using a reserved IP address for the internal TCP/UDP load balancer if you create the reserved IP address with the --purpose flag set to SHARED_LOADBALANCER_VIP. Refer to Enabling Shared IP for step-by-step directions. GKE only preserves the IP address of an internal TCP/UDP load balancer if the Service references an internal IP address with that purpose. Otherwise, GKE might change the load balancer's IP address (spec.loadBalancerIP) if the Service is updated (for example, if ports are changed).
  • Even if the load balancer's IP address changes (see previous point), the spec.clusterIP remains constant.

Restrictions for internal UDP load balancers

  • Internal UDP load balancers do not support using sessionAffinity: ClientIP.

Limits

A Kubernetes service with type: LoadBalancer and the networking.gke.io/load-balancer-type: Internal annotation creates an internal load balancer that targets the Kubernetes service. The number of such services is limited by the number of internal forwarding rules that you can create in a VPC network. For details, see Per network limits.

The maximum number of nodes in a GKE cluster with an internal TCP/UDP load balancer depends on the value of externalTrafficPolicy:

  • externalTrafficPolicy: Cluster: the internal TCP/UDP load balancer backend uses a maximum of 250 randomly selected nodes. If the cluster has more than 250 nodes, all load balancer traffic enters the cluster through the 250 nodes and is forwarded to a randomly selected matching Pod. Using this mode with more than 250 nodes is not recommended.

  • externalTrafficPolicy: Local: the internal TCP/UDP load balancer backend uses a maximum of 250 randomly selected nodes. If none of the selected 250 nodes run the backend Pods for the internal TCP/UDP load balancer service, connections to the LoadBalancer IP fail. Using this mode with more than 250 nodes is not supported.

To remove this limitation, enable internal load balancer subsetting.

For more information about VPC limits, see Quotas and limits.

What's next