This page describes maintenance windows and maintenance exclusions, which provide control over when cluster maintenance such as auto-upgrades can and cannot occur on your Google Kubernetes Engine clusters. For example, a retail business could limit maintenance to only occur on weekday evenings, and could prevent automated maintenance during a key industry sales event.
Maintenance windows and exclusions now give you fine-grained control over when automatic maintenance can occur on your clusters.
A maintenance window is an arbitrary, repeating window of time during which automatic maintenance are permitted.
A maintenance exclusion is an arbitrary non-repeating window of time during which automatic maintenance is forbidden. A cluster can have up to three maintenance exclusions at a time.
You can configure maintenance windows and maintenance exclusions separately and independently. You can configure multiple maintenance exclusions.
Examples of automatic maintenance
Google performs maintenance tasks on your clusters as needed, or when you make a configuration change that re-creates nodes or networks in the cluster For example:
- Auto-upgrades to cluster control planes (masters) in accordance with GKE's version policy
- Node auto-upgrades, if enabled
- User-initiated configuration changes that cause nodes to be re-created, such as GKE Sandbox.
- User-initiated configuration changes that fundamentally change the cluster's internal network topology, such as optimizing IP address allocation
Some of these types of maintenance, such as cluster and node upgrades, can be difficult to predict and plan for. A zonal cluster cannot be modified while its control plane is upgraded (including deploying workloads). Each of the other types of changes listed above can cause temporary disruptions while moving workloads off each node as it is re-created.
Maintenance windows allow you to control when automatic upgrades of control planes and nodes can occur, to mitigate potential transient disruptions to your workloads. Maintenance windows are useful for the following types of scenarios, among others:
- Off-peak hours: You want to minimize the chance of downtime by scheduling automatic upgrades during off-peak hours when traffic is reduced.
- On-call: You want to ensure that upgrades happen during working hours so that someone can monitor the upgrades and manage any unanticipated issues.
- Multi-cluster upgrades: You want to roll out upgrades across multiple clusters in different regions one at a time at specified intervals.
In addition to automatic upgrades, Google may occasionally need to perform other maintenance tasks, and honors a cluster's maintenance window if at all possible.
If tasks run beyond the maintenance window, GKE attempts to pause the operation, and attempts to resume it during the next maintenance window.
GKE reserves the right to roll out unplanned, emergency upgrades outside of maintenance windows. Additionally, mandatory upgrades to upgrade from deprecated or outdated software might automatically occur outside of maintenance windows.
You can configure a maintenance window for a new or existing cluster.
Caveats for maintenance windows
Maintenance windows and exclusions can cause security patches to be delayed. GKE reserves the right to override maintenance policies for critical security vulnerabilities. Before enabling maintenance windows, make sure you understand the following caveats.
Other Google Cloud maintenance
GKE clusters and workloads can also be impacted by automatic maintenance on other, dependent services, such as Compute Engine. Maintenance windows and exclusions do not affect automatic maintenance on other services.
Automated repairs and resizing
GKE performs automated repairs on control planes. This includes processes like upscaling the control plane to an appropriate size or restarting the control plane to resolve issues. Most repairs ignore maintenance windows and exclusions because failing to perform the repairs can result in non-functional clusters. Repairing control planes cannot be disabled.
Nodes also have auto-repair functionality, but can be disabled.
Node re-creation and maintenance windows
When you enable or modify features or options such as those that impact networking between the control planes and nodes, the nodes are recreated to apply the new configuration. Some examples of features that cause nodes to be recreated are:
- Shielded nodes
- Network policies
- Intranode visibility
- NodeLocal DNSCache
- Rotating the control plane's IP address
- Rotating the control plane's credentials
If you use maintenance windows and you enable or modify a feature or option that
requires nodes to be recreated, the new configuration is applied to the nodes
only during a maintenance window. If you prefer not to wait, you can
manually recreate all nodes by calling the
gcloud container clusters upgrade
command and passing the
--cluster-version flag with the same
GKE version that the node pool is already running.
You must use the
gcloud command for this workaround.
One maintenance window per cluster
You can only configure a single maintenance window per cluster. Configuring a new maintenance window overwrites the previous one.
Timezones for maintenance windows
When configuring and viewing maintenance windows, times are shown differently depending on the tool you are using:
When configuring maintenance windows
When configuring maintenance windows using the older
you cannot specify a timezone. UTC is used when using the
gcloud command or
the API, and Google Cloud Console displays times using the local timezone.
When using the more granular flags, such as
can specify the timezone as part of the value. If you omit the timezone, your
local timezone is used. Times are always stored in UTC.
When viewing maintenance windows
When viewing information about your cluster, timestamps for maintenance windows may be shown in UTC or in your local timezone, depending on how you are viewing the information:
- When using Google Cloud Console to view information about your cluster, times are always displayed in your local timezone.
- When using
gcloudto view information about your cluster, times are always shown in UTC.
In both cases, the RRULE is always in UTC. That means that if specifying, for example, days of the week, then those days are in UTC.
With maintenance exclusions, you can prevent automatic maintenance from occurring during a specific time period. For example, many retail businesses have business guidelines prohibiting infrastructure changes during the end-of-year holidays. For known high-impact events, it's recommended that you match any internal change restrictions with a maintenance exclusion starting one week before the event, and lasting the duration of the event.
You can add a maximum of three exclusions. You must allow Google adequate time to maintain your clusters in order to remain in a supported configuration.
Exclusions have no recurrence. Instead, create each instance of a periodic exclusion separately.
You can configure a maintenance exclusion for a new or existing cluster.
- Learn more about upgrading a cluster or its nodes
- Configure maintenance windows and exclusions
- Learn how to enable surge upgrades