In Google Kubernetes Engine (GKE), you configure a cluster's configuration and characteristics using Google Cloud tools and APIs, including the Google Cloud CLI and the Google Cloud console. These tasks include creating, updating, and deleting clusters, adding or removing nodes, and controlling who can access the cluster using Identity and Access Management (IAM).
To control the cluster's internal behavior, you use the Kubernetes API and the
kubectl
command-line interface. You can also configure many aspects of a
cluster's behavior using the Google Cloud console.
Basic cluster administration
Basic cluster administration tasks are specific to GKE clusters on Google Cloud and typically do not involve the Kubernetes system itself; you perform these tasks entirely by using the Google Cloud console, the Google Cloud CLI, or the GKE API.
Cluster and node upgrades
By default, clusters and node pools are upgraded automatically. You can learn more about configuring how upgrades work on each cluster, including when they can and cannot occur.
Cluster-level configuration
Cluster-level configuration tasks include creating and deleting GKE clusters and nodes. You can control when cluster maintenance tasks can occur, and configure cluster-level autoscaling.
Node configuration
GKE offers a range of options for your
cluster's nodes. For example, you can create one or more
node pools; node
pools are groups of nodes within your cluster that share a common configuration.
Your cluster must have at least one node pool, and a node pool called default
is created when you create the cluster.
For Standard clusters, you can set other node options on a per-pool basis, including:
- Automatic repairs: enforced on Autopilot clusters
- Spot VMs
- Local SSDs
- Minimum CPU platform
Configuring cluster monitoring
Google recommends that you use Google Cloud's Managed Service for Prometheus to monitor your Kubernetes applications and infrastructure.
Managed Service for Prometheus is Google Cloud's fully managed multi-cloud solution for Prometheus metrics. It lets you globally monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.
Managed Service for Prometheus collects metrics from Prometheus exporters and lets you query the data globally using PromQL, meaning that you can keep using any existing Grafana dashboards, PromQL-based alerts, and workflows. It is hybrid- and multi-cloud compatible, can monitor both Kubernetes and VM workloads, retains data for 24 months, and maintains portability by staying compatible with upstream Prometheus. You can also supplement your Prometheus monitoring by querying over 1,500 free metrics in Cloud Monitoring, including free GKE system metrics, using PromQL.
For more information about configuring cluster monitoring, refer to the following guides:
Configuring cluster networking
Another aspect of cluster administration is to enable and control various
networking features for your cluster. Most networking features are set at
cluster creation: when you create a cluster using a Google Cloud
interface, you must enable the networking features that you want to use. Some of
these features might require further configuration using Kubernetes interfaces,
such as the kubectl
command-line interface.
For example, to enable
network policy enforcement
on your GKE cluster, you must first enable the feature using
Google Cloud console or Google Cloud CLI. Then, you specify
the actual network policy rules using the Kubernetes network policy API or
kubectl
command-line interface. For Autopilot clusters, network
policy is turned off by default, but you can enable this feature.
For more information about enabling networking features on GKE, refer to the following guides:
- Create a VPC-native cluster
- Configure an IP masquerade agent in Standard clusters
- Create a network policy
Configuring cluster security
GKE includes Google Cloud-specific and Kubernetes security features that you can use with your cluster. You can manage Google Cloud-level security, such as IAM, using the Google Cloud console. You manage intra-cluster security features, such as role-based access control, using Kubernetes APIs and other interfaces.
To learn about the security features and capabilities that are available in GKE, refer to the Security overview and Harden your cluster security. GKE Autopilot clusters implement many of these security features and hardening best practices automatically. For more information, refer to Security capabilities in GKE Autopilot.
Configuring for disaster recovery
To ensure that your production workloads remain available in the event of a service-interrupting event, you should prepare a disaster recovery (DR) plan. To learn more about DR planning, see the Disaster recovery planning guide.
Your Kubernetes configuration and any persistent volumes will not be backed up unless you take explicit action. To backup and restore your Kubernetes configuration and persistent volumes on GKE clusters, you can use Backup for GKE.