Cluster administration overview

Autopilot Standard

In Google Kubernetes Engine (GKE), you configure a cluster's configuration and characteristics using Google Cloud tools and APIs, including the Google Cloud CLI and the Google Cloud console. These tasks include creating, updating, and deleting clusters, adding or removing nodes, and controlling who can access the cluster using Identity and Access Management (IAM).

To control the cluster's internal behavior, you use the Kubernetes API and the kubectl command-line interface. You can also configure many aspects of a cluster's behavior using the Google Cloud console.

Basic cluster administration

Basic cluster administration tasks are specific to GKE clusters on Google Cloud and typically do not involve the Kubernetes system itself; you perform these tasks entirely by using the Google Cloud console, the Google Cloud CLI, or the GKE API.

Cluster and node upgrades

By default, clusters and node pools are upgraded automatically. You can learn more about configuring how upgrades work on each cluster, including when they can and cannot occur.

Cluster-level configuration

Cluster-level configuration tasks include creating and deleting GKE clusters and nodes. You can control when cluster maintenance tasks can occur, and configure cluster-level autoscaling.

Node configuration

GKE offers a range of options for your cluster's nodes. For example, you can create one or more node pools; node pools are groups of nodes within your cluster that share a common configuration. Your cluster must have at least one node pool, and a node pool called default is created when you create the cluster.

For Standard clusters, you can set other node options on a per-pool basis, including:

Automatic repairs: enforced on Autopilot clusters
Spot VMs
Local SSDs
Minimum CPU platform

Configuring cluster monitoring

Google recommends that you use Google Cloud's Managed Service for Prometheus to monitor your Kubernetes applications and infrastructure.

Managed Service for Prometheus is Google Cloud's fully managed multi-cloud solution for Prometheus metrics. It lets you globally monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.

Managed Service for Prometheus collects metrics from Prometheus exporters and lets you query the data globally using PromQL, meaning that you can keep using any existing Grafana dashboards, PromQL-based alerts, and workflows. It is hybrid- and multi-cloud compatible, can monitor both Kubernetes and VM workloads, retains data for 24 months, and maintains portability by staying compatible with upstream Prometheus. You can also supplement your Prometheus monitoring by querying over 1,500 free metrics in Cloud Monitoring, including free GKE system metrics, using PromQL.

For more information about configuring cluster monitoring, refer to the following guides:

Configuring cluster networking

Another aspect of cluster administration is to enable and control various networking features for your cluster. Most networking features are set at cluster creation: when you create a cluster using a Google Cloud interface, you must enable the networking features that you want to use. Some of these features might require further configuration using Kubernetes interfaces, such as the kubectl command-line interface.

For example, to enable network policy enforcement on your GKE cluster, you must first enable the feature using Google Cloud console or Google Cloud CLI. Then, you specify the actual network policy rules using the Kubernetes network policy API or kubectl command-line interface. For Autopilot clusters, network policy is turned off by default, but you can enable this feature.

For more information about enabling networking features on GKE, refer to the following guides:

Configuring cluster security

GKE includes Google Cloud-specific and Kubernetes security features that you can use with your cluster. You can manage Google Cloud-level security, such as IAM, using the Google Cloud console. You manage intra-cluster security features, such as role-based access control, using Kubernetes APIs and other interfaces.

To learn about the security features and capabilities that are available in GKE, refer to the Security overview and Harden your cluster security. GKE Autopilot clusters implement many of these security features and hardening best practices automatically. For more information, refer to Security capabilities in GKE Autopilot.

Configuring for disaster recovery

To ensure that your production workloads remain available in the event of a service-interrupting event, you should prepare a disaster recovery (DR) plan. To learn more about DR planning, see the Disaster recovery planning guide.

Your Kubernetes configuration and any persistent volumes will not be backed up unless you take explicit action. To backup and restore your Kubernetes configuration and persistent volumes on GKE clusters, you can use Backup for GKE.