This page explains how you can reduce the resources consumed by cluster add-ons. Use these techniques on small clusters, such as clusters with three or fewer nodes or clusters that use machine types with limited resources.
In addition to your workload, cluster nodes run several add-ons that integrate the node with the cluster master and provide other functionality. As such, there is a disparity between a node's total resources and the resources that are available for your workload. You can learn more about GKE's cluster architecture.
The default configurations of these add-ons are appropriate for typical clusters, but you can fine-tune the resource usage of add-ons depending on your particular cluster configuration. You can also disable some add-ons that aren't required by your use case.
Fine-tuning is especially useful for clusters with limited compute resources, for example clusters with a single node or very few nodes, or clusters that run on low-cost machine types. You can use a small cluster to try out GKE or to experiment with new features. Clusters created using the Your first cluster cluster template can also benefit from fine-tuning.
You can configure each individual add-on to reduce its impact on node resources.
Add-ons provide visibility and debugging for your cluster. For instance, if you disable monitoring on a cluster running production workloads, problems will be more difficult to debug.
Cloud Logging automatically collects, processes, and stores your container and system logs.
You can disable Cloud Logging entirely, or leave it enabled but restrict its resource usage by fine-tuning the Fluentd addon.
To disable Cloud Logging, complete the Disable logging steps.
To enable Cloud Logging, complete the Enable logging steps.
Viewing logs when Cloud Logging is disabled
When Cloud Logging is disabled, you can still view recent log entries. To view logs for a specific Pod:
kubectl logs -f pod-name
-f option streams logs.
To view the logs from all pods matching a selector:
kubectl logs -l selector
where selector is a deployment selector, for example
Fluentd collects logs from your nodes and sends them to Google Cloud's operations suite. The Fluentd agent is deployed to your cluster using a DaemonSet so that an instance of the agent runs on each node of your cluster. For applications that write a large quantity of logging data, Fluentd might require additional resources.
You can tune Fluentd's resource allocation by creating a scaling policy specific to your requirements. A scaling policy defines a Pod's resource requests and limits. Refer to Managing Compute Resources for Containers to learn how the Kubernetes scheduler handles resource requests and limits. For more information about how resource requests and limits affect quality of services (QoS), see Resource Quality of Service in Kubernetes.
Expand the following section for instructions on how to measure Fluentd's resource usage and how to write a custom scaling policy using these values.
How to create a custom scaling policy
- Visit the Kubernetes Workloads menu in the Google Cloud Console.
Use the filter bar to find the Fluentd DaemonSet: Remove the "Is System: false" filter, and add a filter for Name: "fluentd-gcp-v".
- Click on the fluentd workload in the list.
View the resource usage statistics of each Fluentd Pod by clicking on the Pod names in the Managed Pod section. There will be one Pod per cluster node. The cluster in this example has three nodes.
Look at each Pod in your cluster and note how much CPU and memory is being used by the fluentd-gcp container. Note the maximum values of each resource, we will use those values in our custom scaling policy. You can change the timescale of the graphs to view longer term trends.
In this example, the Pod's peak CPU usage is near 2.0e-2 CPU:
In this example, the Pod's peak memory usage is 135M of memory.
- Use the observed resource usage data to calculate values to use in the
- Multiply the max CPU usage by 1000 to get an mCPU value for GKE.
- The max memory will be the new memory request. To leave room for growth we can make the memory limit 1.1x the request.
- Use the values you calculated to write a scaling policy for Fluentd. Use the
following manifest as a template:
Note that this manifest does not specify a CPU limit. This allows Fluentd to consume additional CPU as needed without being evicted. You can specify a CPU if fluentd must not consume more than a specific amount of CPU.
apiVersion: scalingpolicy.kope.io/v1alpha1 kind: ScalingPolicy metadata: name: fluentd-gcp-scaling-policy namespace: kube-system spec: containers: - name: fluentd-gcp resources: requests: - resource: cpu base: [CPU request value from above] - resource: memory base: [Memory request value from above] limits: - resource: memory base: [Memory limit value from above]
- Apply the scaling policy with kubectl apply. For example, if your scaling policy file is
kubectl apply -f fluentd_scaling_policy.yaml
- Confirm that the new values have been applied. Note that it may take a minute
for the new values to be applied to the cluster.
kubectl get ds --namespace=kube-system -l k8s-app=fluentd-gcp -o custom-columns=NAME:.metadata.name,CPU_REQUEST:.spec.template.spec.containers.resources.requests.cpu,MEMORY_REQUEST:.spec.template.spec.containers.resources.requests.memory,MEMORY_LIMIT:.spec.template.spec.containers.resources.limits.memoryThis command's output should resemble this, with request and limit values that correspond to the values in your policy:
NAME CPU_REQUEST MEMORY_REQUEST MEMORY_LIMIT fluentd-gcp-v3.0.0 20m 135M 200M
We recommend that you use Cloud Monitoring. However, you can disable monitoring in order to reclaim some resources.
For more information about Cloud Monitoring, refer to the overview of GKE monitoring.
If you use the Horizontal Pod autoscaler add-on together with custom metrics from Cloud Monitoring, you must disable Horizontal Pod Autoscaler (HPA) on the cluster before you can fully disable Cloud Monitoring.
To disable Cloud Monitoring, complete the Disable monitoring steps.
To enable Cloud Monitoring, complete the Enable monitoring steps.
Google Cloud's operations suite Kubernetes Monitoring
To enable or disable Google Cloud's operations suite Kubernetes Monitoring in your cluster, use the Kubernetes console. Click this link to go to the console.
Horizontal Pod autoscaling
Horizontal Pod Autoscaling (HPA) scales the replicas of your deployments based on metrics like CPU usage or memory. If you don’t need HPA and you've already disabled Cloud Monitoring, you can also disable HPA.
To disable HPA:
gcloud container clusters update cluster-name \ --update-addons=HorizontalPodAutoscaling=DISABLED
To enable HPA:
gcloud container clusters update cluster-name \ --update-addons=HorizontalPodAutoscaling=ENABLED
You can disable the Kubernetes Dashboard add-on to conserve cluster resources. There is little or no downside to disabling the dashboard because it is redundant to the GKE dashboards available in Google Cloud Console.
To disable the dashboard:
gcloud container clusters update cluster-name \ --update-addons=KubernetesDashboard=DISABLED
To enable to dashboard:
gcloud container clusters update cluster-name \ --update-addons=KubernetesDashboard=ENABLED
schedules a DNS Deployment and service in your cluster, and the Pods in your
cluster use Kube DNS service to resolve DNS names to IP addresses for Services,
Pods, Nodes, as well as public IP addresses. Kube DNS resolves public domain
example.com, and it resolves service names like
servicename.namespace.svc.cluster.local. You can learn more about
DNS for Services and Pods.
By default, several Kube DNS replicas run in order to maintain high availability. If you do not require highly available DNS resolution, you can conserve cluster resources by reducing the number of Kube DNS replicas to one. If you don't require DNS name resolution at all, you can disable Kube DNS entirely.
Reducing Kube DNS replication
If your cluster doesn't require highly available DNS resolution, you can conserve cluster resources by turning off Kube DNS horizontal autoscaling and reducing the number of replicas to one.
To turn off the kube-dns autoscaler and reduce kube-dns to a single replica:
kubectl scale --replicas=0 deployment/kube-dns-autoscaler --namespace=kube-system kubectl scale --replicas=1 deployment/kube-dns --namespace=kube-system
To enable autoscaling:
kubectl scale --replicas=1 deployment/kube-dns-autoscaler --namespace=kube-system
For more precise control of autoscaling, you can tune the [autoscaling parameters]
Disabling Kube DNS
You can completely disable Kube DNS. Kube DNS is required for workloads that resolve the DNS name of any dependent service. This includes public domain names and the names of cluster services.
To disable Kube DNS:
kubectl scale --replicas=0 deployment/kube-dns-autoscaler --namespace=kube-system kubectl scale --replicas=0 deployment/kube-dns --namespace=kube-system
To enable Kube DNS:
kubectl scale --replicas=1 deployment/kube-dns --namespace=kube-system
--replicas=1 is your desired number of replicas.
To enable Kube DNS autoscaling:
kubectl scale --replicas=1 deployment/kube-dns-autoscaler --namespace=kube-system
External DNS lookups without Kube DNS
apiVersion: v1 kind: Pod metadata: namespace: default name: dns-example spec: containers: - name: test image: nginx dnsPolicy: "None" dnsConfig: nameservers: - 188.8.131.52 - 184.108.40.206
Service discovery without Kube DNS
You can use service environment variables as an alternative to DNS-based service discovery. When a Pod is created, service environment variables are automatically created for every service in the same Namespace as the Pod. This is more restrictive than Kube DNS because environment variables are only created for services that are created before the Pod is created.