Container Engine clusters consist of a single cluster master, which is hosted by Google, and one or more nodes running on Compute Engine instances.
Because the cluster master is managed by Google, the version of Kubernetes that is running on the master is automatically updated to the most recently released version. The cluster nodes, however, are not automatically upgraded when newer versions of the Kubernetes API are released. This allows you to upgrade on your own schedule, coordinating with other maintenance and avoiding impact during business-critical windows.
Cluster nodes, however, should be upgraded in a timely manner. Nodes that are more than two minor versions (x.X.x) behind the master version may not work properly. For example, once the cluster master is running version 1.3, nodes running 1.0 may not function correctly, so nodes should be upgraded to 1.1 or 1.2 before 1.3 is released. Minor versions are released approximately every 3 months.
Before beginning an upgrade or downgrade, there are some important details to consider.
An upgrade/downgrade works by deleting all node instances, one at a time, and replacing them with new instances running the desired Kubernetes version.
Before Container Engine deletes an instance, that instance is marked unschedulable and drained. Any pods running on the instance are evicted. If the pods are managed by a controller, the controller attempts to reschedule the pods on the remaining instances. Pods that are not managed by a controller are not restarted. Once the node re-enters the fleet at the new version, it is marked schedulable.
When a node instance begins to drain, all of the pods begin terminating at the same time. If a single node instance contains the entire set of pods managed by a controller, they may all become unavailable simultaneously, causing service downtime. The easiest way to avoid this scenario is by using a pod disruption budget. Another important aspect to draining a node is the termination of each individual pod. To understand and handle them correctly, please see the Kubernetes documentation on pod terminations.
Container Engine imposes some basic limitations on both pod disruption budgets and graceful pod terminations to prevent upgrades from taking an unreasonable amount of time. For both cases, a maximum limit of one hour will be enforced. So, after an hour of upgrading, a node will stop enforcing all pod disruption budgets, and if a single pod has taken longer than an hour to gracefully terminate, it will be forcefully shutdown.
Any data in
emptyDirvolumes in pods is deleted during an upgrade/downgrade. To preserve data across upgrade/downgrade, use a pod with a
gcePersistentDiskvolume. Note that this is different than the persistent disk created by default as a boot disk with Compute Engine instances; boot disks are deleted along with the instances as part of the upgrade/downgrade. A
gcePersistentDiskvolume is a disk created separately from your instances.
When a node instance is shut down as part of an upgrade/downgrade, its replacement will be assigned the same instance name.
Downgrades are supported up to two minor versions before the master's version (x.X.x), each at their latest patch version (x.x.X).
You can find the supported master and node versions for upgrades by running the following command:
gcloud container get-server-config [--zone=ZONE]
They are also documented in the Release Notes.
Upgrading stateful applications
Stateful applications can often be particularly fragile when dealing with disruptions. They often rely on pod ordering, and/or maintaining a minimum number of running pods. While StatefulSets themselves provide some additional upgrade specifications, it is highly recommended users also take advantage of pod disruption budgets and graceful pod termination.
The node upgrade flow
During an upgrade, Container Engine creates a new version of each node with the desired updates. Container Engine does this one node at a time: the next node in the cluster is recreated only when the new version of the previous node has been created and reports healthy status.
For each node in the cluster, Container Engine performs the following steps:
- Container Engine marks the node to be upgraded as unschedulable.
- Container Engine drains the node.
- Container Engine creates a new version of the node with the desired updates.
- When the new version of the node registers with the master, Container Engine marks the new node as schedulable.
To change your cluster nodes' version, use the
gcloud container clusters
gcloud container clusters upgrade $CLUSTER_NAME --cluster-version=$CLUSTER_VERSION
CLUSTER_NAMEis the name of the container cluster to upgrade.
CLUSTER_VERSIONhas the form
--cluster-versioncan be used to upgrade or downgrade to a specific release. Omitting this flag upgrades the container cluster to the version on the master. If used, you must specify a version within two minor releases of the master's version, at the highest patch number. Supported versions are listed in the Release Notes.
When you initiate a cluster version change, each instance in the cluster is deleted one at a time. After the instance is deleted, Container Engine adds a new instance to the cluster with the specified Kubernetes API, as well as the associated base image, Docker daemon, kubelet, and kube-proxy.
Canceling a node upgrade (ALPHA)
Container Engine now supports canceling of node upgrades. If you cancel an upgrade in progress, any nodes that have already started the upgrade process will complete it. Nodes which have not started the upgrade process will not start it. Any nodes that have already completed the upgrade process are not rolled back.
You can cancel an ongoing upgrade by running the following command in your shell or terminal window:
gcloud alpha container operations cancel $OPERATION_ID
Rolling back a node upgrade (ALPHA)
Container Engine now supports rolling back of
upgrades. If you roll back an upgrade, any nodes that have been upgraded will be
rolled back to the previous configuration. Any nodes that had not been upgraded
will be left as-is. However, note that you cannot roll back an upgrade that had
previously completed successfully; a rollback operation on a successful
upgrade is interpreted as a no-op.
You can roll back an upgrade by running the following command in your shell or terminal window:
gcloud alpha container node-pools rollback $POOL_NAME --cluster $CLUSTER_NAME
Opt-in early master upgrades
Container Engine clusters masters are automatically upgraded to the latest
supported version in a staged rollout after the version is released. If you would
like access to the latest supported version without waiting for the
automatic upgrade, you can manually upgrade your masters using the
command-line tool. Once you've upgraded your cluster's master, the nodes can be
upgraded to the same version.
To upgrade your cluster master's version use the
clusters upgrade command and
gcloud container clusters upgrade $CLUSTER_NAME --master
To specify a different master version than the default, run the
gcloud container clusters upgrade command with the
gcloud container clusters upgrade $CLUSTER_NAME --master --cluster-version=$CLUSTER_VERSION