Cluster administration overview


This page provides a quick overview of managing GKE clusters for administrators.

If you are a developer running workloads on GKE, you might not need to carry out most of these tasks. For an introduction to deploying workloads on GKE, see Deploying workloads.

Before reading this page you should be familiar with the following, as well as basic Kubernetes concepts:

What tools do I use?

As an administrator, you use a range of tools to work with GKE clusters.

  • To control a cluster's configuration and overall characteristics, you use Google Cloud tools and APIs, including the Google Cloud CLI and the Google Cloud console. These tasks include creating, updating, viewing, and deleting clusters, and controlling who can access the cluster using Identity and Access Management (IAM). You might also use other Google Cloud tools and services, such as observability services for monitoring, logging, and alerting.

  • To control a cluster's internal behavior, you use the Kubernetes API and the kubectl command-line interface. Tasks where you might need to use kubectl include deploying workloads, applying Kubernetes role-based access control (RBAC) policies, and specifying Kubernetes network policy rules. You can read more about configuring kubectl for use with GKE clusters in Install kubectl and configure cluster access.

  • To declaratively provision clusters and workloads, you might use Terraform. You can read more about using Terraform with GKE in Provision GKE resources with Terraform.

Basic cluster administration

Basic cluster administration tasks include cluster configuration, cluster upgrades, and node configuration. If you use our default Autopilot mode for your clusters (recommended), GKE handles most of this for you: cluster upgrades and node configuration are managed by GKE. If you use Standard mode, only upgrades are managed by GKE and you need to configure nodes yourself. You can read more about when you might need to choose Standard mode for clusters in GKE modes of operation.

Basic cluster administration tasks are specific to GKE clusters on Google Cloud and typically don't involve the Kubernetes system itself; you perform these tasks entirely by using the Google Cloud console, the Google Cloud CLI, the GKE API, or Terraform's Google Cloud provider.

Cluster and node upgrades

By default, clusters and nodes are upgraded automatically. You can learn more about configuring how upgrades work on each cluster, including when they can and cannot occur.

Cluster-level configuration

Cluster-level configuration tasks include creating and deleting GKE clusters and nodes. You can also update some cluster settings such as when cluster maintenance tasks can occur.

You can find out more about cluster configuration in the Cluster configuration overview.

Node configuration

If you use Autopilot for your clusters, you don't need to worry about node configuration because GKE configures your nodes for you. Autopilot cluster nodes are all fully managed by GKE and all use the same node operating system (OS), cos_containerd.

However, if you need to use Standard mode for any clusters, GKE offers a range of options for your cluster's nodes. For example, you can create one or more node pools; node pools are groups of nodes within your cluster that share a common configuration. Your cluster must have at least one node pool, and a node pool called default is, by default, created when you create the cluster. You can read more about node pool management in GKE in Add and manage node pools.

Other node configuration options for Standard clusters include choosing a non-default OS, using ephemeral spot VMs, and choosing a minimum CPU platform for new nodes (Autopilot users can also specify a minimum CPU platform for compute-intensive workloads, but only on a workload-scoped basis).

Even with Standard clusters, you can't change iptables rules or other node-level settings that GKE manages. The node might become unreachable or unintentionally exposed when manual changes revert to the cluster's declarative configuration.

Configuring cluster networking

An important aspect of cluster administration is enabling and controlling various networking features for your cluster, such as IP address options for Standard clusters, whether your cluster's nodes can be accessed from public networks (nodes that can't be accessed from public networks are known as a private nodes), and network access policies.

Many networking features are set at cluster creation (and many of them cannot be changed without recreating the cluster): when you create a cluster using a Google Cloud interface, you must enable the networking features that you want to use. Because of this behavior, if you are not a Network administrators you might need to work closely with your Network administrators when setting up production-ready clusters.

Some networking features that can be enabled with Google Cloud tools, such as network policy enforcement, also require further configuration using Kubernetes APIs.

You can learn much more about GKE networking in Network overview.

Cluster observability

Another important part of cluster administration is configuring and using observability tooling to understand the health of your infrastructure and applications, and maintain application availability and reliability. By default, GKE clusters are configured to do the following:

GKE also provides observability features that help you use the data that you gather, including default and custom dashboards, alerting, service level objective (SLO) monitoring, and log analysis.

You can find out much more about setting up and using GKE observability in Observability for GKE.

Configuring cluster security

GKE includes Google Cloud-specific and Kubernetes security features that you can use with your cluster. You can manage Google Cloud-level security, such as IAM, using the Google Cloud console. You manage intra-cluster security features such as Kubernetes role-based access control (RBAC) using Kubernetes APIs and other interfaces.

To learn about the security features and capabilities that are available in GKE, refer to the Security overview and Harden your cluster security. GKE Autopilot clusters implement many of these security features and hardening best practices automatically. For more information, refer to Security capabilities in GKE Autopilot.

Optimizing costs

GKE's tools let you view your cluster costs, and help you make sure that you are making the most efficient use of the Google Cloud resources that you're paying for. You can view utilization metrics for CPU, memory, and disk usage over different timescale, and use these metrics to help optimize resource usage: for example if you have potentially underutilized or overutilized clusters that you might want to resize. You can also use autoscaling to reduce cluster size during off-peak hours, and use insights and recommendations to identify idle clusters, along with other best practices.

If you are using GKE Enterprise, you can also view metrics to optimize costs across your fleet and for individual teams.

Configuring for disaster recovery

To ensure that your production workloads remain available in the event of a service-interrupting event, you should prepare a disaster recovery (DR) plan. To learn more about DR planning, see the Disaster recovery planning guide.

Your Kubernetes configuration and any persistent volumes will not be backed up unless you take explicit action. To backup and restore your Kubernetes configuration and persistent volumes on GKE clusters, you can use Backup for GKE.

What's next