About kube-dns for GKE

If you're running applications in Standard clusters, kube-dns is the default DNS provider that helps you enable service discovery and communication. This document describes how to manage DNS with kube-dns, including its architecture, configuration, and best practices for optimizing DNS resolution within your GKE environment.

This document is for Developers and Admins and architects who are responsible for managing DNS in GKE. For context on common roles and tasks in Google Cloud, see Common GKE Enterprise user roles and tasks.

Before you begin, ensure that you're familiar with Kubernetes Services and general DNS concepts.

Understand kube-dns architecture

kube-dns operates inside your GKE cluster to enable DNS resolution between Pods and Services.

The following diagram shows how your Pods interact with the kube-dns Service:

Figure 1: Diagram showing how Pods send DNS queries to the `kube-dns`
Service, which is backed by `kube-dns` Pods. The `kube-dns` Pods handle
internal DNS resolution and forward external queries to upstream DNS
servers.

Key components

kube-dns includes the following key components:

  • kube-dns Pods: these Pods run the kube-dns server software. Multiple replicas of these Pods run in the kube-system namespace, and they provide high availability and redundancy.
  • kube-dns Service: this Kubernetes Service of type ClusterIP groups the kube-dns Pods and exposes them as a single, stable endpoint. The ClusterIP acts as the DNS server for the cluster, which Pods use to send DNS queries. kube-dns supports up to 1,000 endpoints per headless service.
  • kube-dns-autoscaler: this Pod adjusts the number of kube-dns replicas based on the cluster's size, which includes the number of nodes and CPU cores. This approach helps ensure that kube-dns can handle varying DNS query loads.

Internal DNS resolution

When a Pod needs to resolve a DNS name within the cluster's domain, such as myservice.my-namespace.svc.cluster.local, the following process occurs:

  1. Pod DNS configuration: the kubelet on each node configures the Pod's /etc/resolv.conf file. This file uses the kube-dns Service's ClusterIP as the name server.
  2. DNS query: the Pod sends a DNS query to the kube-dns Service.
  3. Name resolution: kube-dns receives the query. It looks up the corresponding IP address in its internal DNS records and responds to the Pod.
  4. Communication: the Pod then uses the resolved IP address to communicate with the target Service.

External DNS resolution

When a Pod needs to resolve an external DNS name, or a name that's outside the cluster's domain, kube-dns acts as a recursive resolver. It forwards the query to upstream DNS servers that are configured in its ConfigMap file. You can also configure custom resolvers for specific domains, which are also known as stub domains. This configuration directs kube-dns to forward requests for those domains to specific upstream DNS servers.

Configure Pod DNS

In GKE, the kubelet agent on each node configures DNS settings for the Pods that run on that node.

Configure the /etc/resolv.conf file

When GKE creates a Pod, the kubelet agent modifies the Pod's /etc/resolv.conf file. This file configures the DNS server for name resolution and specifies search domains. By default, the kubelet configures the Pod to use the cluster's internal DNS service, kube-dns, as its name server. It also populates search domains in the file. These search domains let you use unqualified names in DNS queries. For example, if a Pod queries myservice, Kubernetes first tries to resolve myservice.default.svc.cluster.local, then myservice.svc.cluster.local, and then other domains from the search list.

The following example shows a default /etc/resolv.conf configuration:

nameserver 10.0.0.10
search default.svc.cluster.local svc.cluster.local cluster.local c.my-project-id.internal google.internal
options ndots:5

This file has the following entries:

  • nameserver: defines the ClusterIP of the kube-dns service.
  • search: defines the search domains that are appended to unqualified names during DNS lookups.
  • options ndots:5: sets the threshold for when GKE considers a name to be fully qualified. A name is considered fully qualified if it has five or more dots.

Pods that are configured with the hostNetwork: true setting inherit their DNS configuration from the host and don't query kube-dns directly.

Customize kube-dns

kube-dns provides robust default DNS resolution. You can tailor its behavior for specific needs, such as improving resolution efficiency or using preferred DNS resolvers. Both stub domains and upstream name servers are configured by modifying the kube-dns ConfigMap in the kube-system namespace.

Modify the kube-dns ConfigMap

To modify the kube-dns ConfigMap, do the following:

  1. Open the ConfigMap for editing:

    kubectl edit configmap kube-dns -n kube-system
    
  2. In the data section, add the stubDomains and upstreamNameservers fields to:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      labels:
        addonmanager.kubernetes.io/mode: EnsureExists
      name: kube-dns
      namespace: kube-system
    data:
      stubDomains: |
        {
          "example.com": [
            "8.8.8.8",
            "8.8.4.4"
          ],
          "internal": [ # Required if your upstream nameservers can't resolve GKE internal domains
            "169.254.169.254" # IP of the metadata server
          ]
        }
      upstreamNameservers: |
        [
          "8.8.8.8", # Google Public DNS
          "1.1.1.1" # Cloudflare DNS
        ]
    
  3. Save the ConfigMap. kube-dns automatically reloads the configuration.

Stub domains

Stub domains let you define custom DNS resolvers for specific domains. When a Pod queries for a name within that stub domain, kube-dns forwards the query to the specified resolver instead of using its default resolution mechanism.

You include a stubDomains section in the kube-dns ConfigMap.

This section specifies the domain and corresponding upstream name servers. kube-dns then forwards queries for names within that domain to the designated servers. For example, you can route all DNS queries for internal.mycompany.com to 192.168.0.10, add "internal.mycompany.com": ["192.168.0.10"] to stubDomains.

When you set a custom resolver for a stub domain, such as example.com, kube-dns forwards all name resolution requests for that domain, including subdomains like *.example.com, to the specified servers.

Upstream name servers

You can configure kube-dns to use custom upstream name servers to resolve external domain names. This configuration instructs kube-dns to forward all DNS requests, except the requests for the cluster's internal domain (*.cluster.local), to the designated upstream servers. Internal domains like metadata.internal and *.google.internal might not be resolvable by your custom upstream servers. If you enable Workload Identity Federation for GKE or have workloads that depend on these domains, add a stub domain for internal in the ConfigMap. Use 169.254.169.254, the metadata server's IP address, as the resolver for this stub domain.

Manage a custom kube-dns Deployment

In a standard GKE, kube-dns runs as a Deployment. A custom kube-dns deployment means that you, as the cluster administrator, can control the Deployment and customize it to your needs, rather than using the default GKE-provided deployment.

Reasons for a custom deployment

Consider a custom kube-dns deployment for the following reasons:

  • Resource allocation: fine-tune CPU and memory resources for kube-dns Pods to optimize performance in clusters with high DNS traffic.
  • Image version: use a specific version of the kube-dns image or switch to an alternative DNS provider like CoreDNS.
  • Advanced configuration: customize logging levels, security policies, and DNS caching behavior.

Autoscaling for custom Deployments

The built-in kube-dns-autoscaler works with the default kube-dns Deployment. If you create a custom kube-dns Deployment, the built-in autoscaler does not manage it. Therefore, you must set up a separate autoscaler that's specifically configured to monitor and adjust the replica count of your custom Deployment. This approach involves creating and deploying your own autoscaler configuration in your cluster.

When you manage a custom Deployment, you are responsible for all its components, such as keeping the autoscaler image up-to-date. Using outdated components can lead to performance degradation or DNS failures.

For detailed instructions on how to configure and manage your own kube-dns deployment, see Setting up a custom kube-dns Deployment.

Troubleshoot

For information about troubleshooting kube-dns, see the following pages:

Optimize DNS resolution

This section describes common issues and best practices for managing DNS in GKE.

Limit of a Pod's dnsConfig search domains

Kubernetes limits the number of DNS search domains to 32. If you attempt to define more than 32 search domains in a Pod's dnsConfig, the kube-apiserver Won't create the Pod, with an error similar to the following:

The Pod "dns-example" is invalid: spec.dnsConfig.searches: Invalid value: []string{"ns1.svc.cluster-domain.example", "my.dns.search.suffix1", "ns2.svc.cluster-domain.example", "my.dns.search.suffix2", "ns3.svc.cluster-domain.example", "my.dns.search.suffix3", "ns4.svc.cluster-domain.example", "my.dns.search.suffix4", "ns5.svc.cluster-domain.example", "my.dns.search.suffix5", "ns6.svc.cluster-domain.example", "my.dns.search.suffix6", "ns7.svc.cluster-domain.example", "my.dns.search.suffix7", "ns8.svc.cluster-domain.example", "my.dns.search.suffix8", "ns9.svc.cluster-domain.example", "my.dns.search.suffix9", "ns10.svc.cluster-domain.example", "my.dns.search.suffix10", "ns11.svc.cluster-domain.example", "my.dns.search.suffix11", "ns12.svc.cluster-domain.example", "my.dns.search.suffix12", "ns13.svc.cluster-domain.example", "my.dns.search.suffix13", "ns14.svc.cluster-domain.example", "my.dns.search.suffix14", "ns15.svc.cluster-domain.example", "my.dns.search.suffix15", "ns16.svc.cluster-domain.example", "my.dns.search.suffix16", "my.dns.search.suffix17"}: must not have more than 32 search paths.

The kube-apiserver returns this error message in response to a Pod creation attempt. To resolve this issue, remove extra search paths from the configuration.

Upstream nameservers limit for kube-dns

kube-dns limits the number of upstreamNameservers values to three. If you define more than three, Cloud Logging displays an error similar to the following:

Invalid configuration: upstreamNameserver cannot have more than three entries (value was &TypeMeta{Kind:,APIVersion:,}), ignoring update

In this scenario, kube-dns ignores the upstreamNameservers configuration and continues to use the previous valid configuration. To resolve this issue, remove the extra upstreamNameservers from the kube-dns ConfigMap.

Scale up kube-dns

In Standard clusters, you can use a lower value for nodesPerReplica so that more kube-dns Pods are created when cluster nodes scale up. We highly recommend setting an explicit value for the max field to help ensure that the GKE control plane virtual machine (VM) is not overwhelmed due to the large number of kube-dns Pods that are watching the Kubernetes API.

You can set the value of the max field to the number of nodes in the cluster. If the cluster has more than 500 nodes, set the value of the max field to 500.

You can modify the number of kube-dns replicas by editing the kube-dns-autoscaler ConfigMap.

kubectl edit configmap kube-dns-autoscaler --namespace=kube-system

The output is similar to the following:

linear: '{"coresPerReplica":256, "nodesPerReplica":16,"preventSinglePointFailure":true}'

The number of kube-dns replicas is calculated by using the following formula:

replicas = max( ceil( cores * 1/coresPerReplica ) , ceil( nodes * 1/nodesPerReplica ) )

To scale up, change the value of the nodesPerReplica field to a smaller value, and include a value for the max field.

linear: '{"coresPerReplica":256, "nodesPerReplica":8,"max": 15,"preventSinglePointFailure":true}'

This configuration creates one kube-dns Pod for every eight nodes in the cluster. A 24-node cluster has three replicas and a 40-node cluster has five replicas. If the cluster grows beyond 120 nodes, the number of kube-dns replicas does not grow beyond 15, which is the value of the max field.

To help ensure a baseline level of DNS availability in your cluster, set a minimum replica count for the kube-dns field.

The output for the kube-dns-autoscaler ConfigMap with the min field configured is similar to the following:

linear: '{"coresPerReplica":256, "nodesPerReplica":8,"max": 15,"min": 5,"preventSinglePointFailure":true}'

Improve DNS lookup times

Several factors can cause high latency with DNS lookups or DNS resolution failures with the default kube-dns provider. Applications might experience these issues as getaddrinfo EAI_AGAIN errors, which indicate a temporary failure in name resolution. Causes include the following:

  • Frequent DNS lookups within your workload.
  • High Pod density per node.
  • Running kube-dns on Spot VMs or preemptible VMs, which can lead to unexpected node deletions.
  • High query volume that exceeds the capacity of the dnsmasq instance within the kube-dns Pod. A single kube-dns instance has a limit of 200 concurrent TCP connections in GKE version 1.31 and later, and a limit of 20 concurrent TCP connections in GKE version 1.30 and earlier.

To improve DNS lookup times, do the following:

  • Avoid running critical system components like kube-dns on Spot VMs or preemptible VMs. Create at least one node pool that has standard VMs and doesn't have Spot VMs or Preemptible VMs. Use taints and tolerations to help ensure critical workloads are scheduled on these reliable nodes.
  • Enable NodeLocal DNSCache. NodeLocal DNSCache caches DNS responses directly on each node, which reduces latency and the load on the kube-dns service. If you enable NodeLocal DNSCache and use network policies with default-deny rules, add a policy to permit workloads to send DNS queries to the node-local-dns Pods.
  • Scale up kube-dns.
  • Ensure that your application uses dns.resolve* based functions rather than dns.lookup based functions because dns.lookup is synchronous.
  • Use fully qualified domain names (FQDNs), for example, https://google.com./ instead of https://google.com/.

DNS resolution failures might occur during GKE cluster upgrades due to concurrent upgrades of control plane components, including kube-dns. These failures typically affect a small percentage of nodes. Thoroughly test cluster upgrades in a non-production environment before you apply them to production clusters.

Ensure Service discoverability

kube-dns only creates DNS records for Services that have Endpoints. If a Service doesn't have any Endpoints, kube-dns doesn't create DNS records for that Service.

Manage DNS TTL discrepancies

If kube-dns receives a DNS response from an upstream DNS resolver with a large or infinite TTL, it keeps this TTL value. This behavior can create a discrepancy between the cached entry and the actual IP address.

GKE resolves this issue in specific control plane versions, such as 1.21.14-gke.9100 and later or 1.22.15-gke.2100 and later. These versions set a maximum TTL value to 30 seconds for any DNS response that has a higher TTL. This behavior is similar to NodeLocal DNSCache.

View kube-dns metrics

You can retrieve metrics about DNS queries in your cluster directly from the kube-dns Pods.

  1. Find the kube-dns Pods in the kube-system namespace:

    kubectl get pods -n kube-system --selector=k8s-app=kube-dns
    

    The output is similar to the following:

    NAME                        READY     STATUS    RESTARTS   AGE
    kube-dns-548976df6c-98fkd   4/4       Running   0          48m
    kube-dns-548976df6c-x4xsh   4/4       Running   0          47m
    
  2. Choose one of the Pods and set up port forwarding to access metrics from that Pod:

    • Port 10055 exposes kube-dns metrics.
    • Port 10054 exposes dnsmasq metrics.

    Replace POD_NAME with the name of your chosen Pod.

    POD_NAME="kube-dns-548976df6c-98fkd" # Replace with your pod name
    kubectl port-forward pod/${POD_NAME} -n kube-system 10055:10055 10054:10054
    

    The output is similar to the following:

    Forwarding from 127.0.0.1:10054 -> 10054
    Forwarding from 127.0.0.1:10055 -> 10055
    
  3. In a new terminal session, use the curl command to access the metrics endpoints.

    # Get kube-dns metrics
    curl http://127.0.0.1:10055/metrics
    
    # Get dnsmasq metrics
    curl http://127.0.0.1:10054/metrics
    

    The output will be similar to the following:

    kubedns_dnsmasq_errors 0
    kubedns_dnsmasq_evictions 0
    kubedns_dnsmasq_hits 3.67351e+06
    kubedns_dnsmasq_insertions 254114
    kubedns_dnsmasq_max_size 1000
    kubedns_dnsmasq_misses 3.278166e+06
    

What's next