Setting up NodeLocal DNSCache

This page explains how to configure NodeLocal DNSCache on a Google Kubernetes Engine (GKE) cluster. NodeLocal DNSCache improves DNS lookup latency, makes DNS lookup times more consistent, and reduces the number of DNS queries to kube-dns by running a DNS cache on each cluster node.

For an overview of how service discovery and managed DNS works on GKE, see Service discovery and DNS.

Overview

NodeLocal DNSCache is an optional GKE add-on that you can run in addition to kube-dns. NodeLocal DNSCache is implemented as a DaemonSet that runs a DNS cache on each node in your cluster. When a Pod makes a DNS request, the request goes to the DNS cache running on the same node as the Pod. If the cache can't resolve the DNS request, the cache forwards the request to:

  • Cloud DNS for external hostname queries. These queries are forwarded to Cloud DNS by the local MetaData Server running on the same node as the Pod the query originated from.
  • kube-dns for all other DNS queries. The kube-dns-upstream service is used by node-local-dns Pods to reach out to kube-dns Pods.

A diagram of the path of a DNS request, as described in the previous paragraph

Pods do not need to be modified to use NodeLocal DNSCache. NodeLocal DNSCache consumes compute resources on each node of your cluster.

Benefits of NodeLocal DNSCache

  • Reduced average DNS lookup time
  • Connections from Pods to their local cache don't create conntrack table entries. This prevents dropped and rejected connections caused by conntrack table exhaustion and race conditions.

Details

  • NodeLocal DNSCache requires GKE version 1.15 or higher.
  • Connections between the local DNS cache and kube-dns use TCP instead of UDP for improved reliability.
  • DNS queries for external URLs (URLs that don't refer to cluster resources) are forwarded directly to the local Cloud DNS metadata server, bypassing kube-dns.
  • The local DNS caches automatically pick up stub domains and upstream nameservers that are specified in the kube-dns ConfigMap.

  • DNS records are cached for:

    • The record's TTL, or 30 seconds if the TTL is more than 30 seconds.
    • 5 seconds if the DNS response is NXDOMAIN.
  • NodeLocal DNSCache Pods listen on port 53, 9253 and 8080 on the nodes. Running any other hostNetwork Pod using the above ports or configuring hostPorts with the above ports causes NodeLocal DNSCache to fail and result in DNS errors.

Enabling NodeLocal DNSCache

You can enable NodeLocal DNSCache in an existing cluster or when creating a new cluster. Enabling NodeLocal DNSCache in an existing cluster is a disruptive process, all cluster nodes running GKE 1.15 and higher are recreated. Nodes are recreated per the GKE node upgrade process.

gcloud

Enabling NodeLocal DNSCache in a new cluster

To enable NodeLocal DNSCache in a new cluster, use the --addons NodeLocalDNS flag:

gcloud container clusters create cluster-name \
  --zone compute-zone \
  --cluster-version cluster-version \
  --addons NodeLocalDNS

Replace the following:

  • cluster-name: the name of your new cluster.
  • compute-zone: the zone for your cluster.
  • cluster-version: the version for your cluster (1.15 or higher).

Enabling NodeLocal DNSCache in an existing cluster

To enable NodeLocal DNSCache in an existing cluster, use the --update-addons=NodeLocalDNS=ENABLED flag:

gcloud container clusters update cluster-name \
  --update-addons=NodeLocalDNS=ENABLED

Console

You can use Google Cloud Console to enable NodeLocal DNSCache when creating a new cluster.

  1. Visit the Google Kubernetes Engine menu in Cloud Console.

    Visit the Google Kubernetes Engine menu

  2. Click Create.

  3. For Name, enter cluster-name.

  4. For Zone, select us-central1-a.

  5. For Number of nodes, enter 1.

  6. From the navigation pane, under Cluster, click Networking.

  7. Under Advanced networking options, select the Enable NodeLocal DNSCache checkbox.

  8. Click Create.

Verifying that NodeLocal DNSCache is enabled

You can verify that NodeLocal DNSCache is running by listing the node-local-dns Pods. There should be a node-local-dns Pod running on each node running GKE version 1.15 or higher.

kubectl get pods -n kube-system -o wide | grep node-local-dns

Disabling NodeLocal DNSCache

NodeLocal DNSCache can be disabled using gcloud:

gcloud container clusters update cluster-name \
  --update-addons=NodeLocalDNS=DISABLED

Troubleshooting NodeLocal DNSCache

See Debugging DNS Resolution for general information about diagnosing Kubernetes DNS issues.

Validating Pod configuration

To verify that a Pod is using NodeLocal DNSCache, check /etc/resolv.conf on the Pod to see if the Pod is configured to use the correct nameserver:

kubectl exec -it pod-name -- cat /etc/resolv.conf | grep nameserver

The nameserver IP should match the IP address output by:

kubectl get svc -n kube-system kube-dns -o jsonpath="{.spec.clusterIP}"

If the nameserver IP address configured in /etc/resolv.conf doesn't match, you need to modify the configuration to use the correct nameserver IP address.

Network policy with NodeLocal DNSCache

When using NetworkPolicy with the NodeLocalDNS add-on, additional rules are needed to permit node-local-dns Pods to send and receive DNS queries. Use an ipBlock rule in your NetworkPolicy to allow communication between node-local-dns Pods and kube-dns:

spec:
  egress:
  - ports:
    - port: 53
      protocol: TCP
    - port: 53
      protocol: UDP
    to:
    - ipBlock:
        cidr: kube-dns-cluster-ip/32
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Replace kube-dns-cluster-ip with the IP address of kube-dns service obtained using:

kubectl get svc -n kube-system kube-dns -o jsonpath="{.spec.clusterIP}"

This example uses an ipBlock rule because node-local-dns Pods run in hostNetwork:True mode. A matchLabels rule would not match these Pods.

Known Issues

NodeLocalDNS timeout errors

On clusters with NodeLocal DNSCache enabled, the logs might contain entries like:

[ERROR] plugin/errors: 2 <hostname> A: read tcp <node IP: port>-><kubedns IP>:53: i/o timeout

This indicates that the response to a DNS request was not received from kube-dns in 2 seconds. This could be due to one of the following:

  • underlying network connectivity problems
  • a known issue with dnsmasq handling TCP connections

Node-local-dns Pods reach out to kube-dns via TCP for improved reliability. When handling connections from multiple source IPs, dnsmasq prioritizes connections from existing connections over new connections. As a result, on a cluster with high DNS QPS, node-local-dns Pods on newly created nodes can see higher DNS latency leading to error logs as mentioned above. This can happen especially on clusters with Cluster-Autoscaler enabled, that dynamically changes the number of nodes.

This is being fixed in GKE versions 1.19.7-gke.1500.

A workaround is to increase the number of kube-dns replicas by tuning the dns autoscaling parameters. A lower value for "nodesPerReplica" ensures that more kube-dns Pods are created as cluster nodes scale up. We highly recommend setting an explicit "max" value to ensure that GKE master VM is not overwhelmed due to large number of kube-dns pods watching k8s API. The "max" value can be set to the number of nodes in the cluster. If the cluster has more than 500 nodes, set the max to 500. This should be more than sufficient number of replicas for any GKE cluster.

The number of kube-dns replicas can be tuned by editing the kube-dns-autoscaler configmap.

kubectl edit configmap kube-dns-autoscaler --namespace=kube-system

Look for a line similar to:

linear: '{"coresPerReplica":256, "nodesPerReplica":16,"preventSinglePointFailure":true}'

The number of kube-dns replicas is calculated as:

replicas = max( ceil( cores × 1/coresPerReplica ) , ceil( nodes × 1/nodesPerReplica ), maxValue )

In order to scale up, change "nodesPerReplica" to a smaller value and include a "max" value.

Example config:

  linear: '{"coresPerReplica":256, "nodesPerReplica":8,"max": 15,"preventSinglePointFailure":true}'

This config creates 1 kube-dns pod for every 8 nodes in the cluster. A 24-node cluster will have 3 replicas, a 40-node cluster will get 5. Once the cluster grows beyond 120 nodes, the number of kube-dns replicas will stay at 15, which is the "max" value.

What's next