Sequence the rollout of cluster upgrades


This page shows you how to manage GKE cluster upgrades using rollout sequencing. To learn more, see About cluster upgrades with rollout sequencing.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Required roles

Configure a rollout sequence

This document explains how to create a rollout sequence using groups of clusters organized by fleets or team scopes. You can include both Autopilot and Standard clusters.

You can create a sequence of up to three groups of clusters, and you can choose how much soak testing time you want after cluster upgrades are complete in a group (maximum 30 days).

To create a rollout sequence, your clusters must be organized into groups of either fleets or team scopes. For guidance on how to organize your clusters, see the community bank example. After they are organized into groups, you can create a rollout sequence by defining the upstream group relationships and each group's soak time. Upstream, in a rollout sequence, refers to the previous group, and downstream refers to the next group.

Organize your clusters into groups

In a rollout sequence, all clusters in all groups must be enrolled in the same release channel and be on the same minor version. If these requirements are not met and there are version discrepancies between clusters, this can cause issues with the version rollout. For more information, see Rollout eligibility.

You can create rollout sequences between fleets, or rollout sequences between a team's team scopes (Preview).

As you saw in About cluster upgrades with rollout sequencing, team scopes are an enterprise fleet-level construct for associating subsets of fleet clusters with specific application teams. You must enable GKE Enterprise to use team scopes. The following limitations apply when using or creating team scopes for rollout sequencing:

  • Team-based sequences require single-tenancy clusters: in other words, each individual cluster is only associated with a single team. Shared clusters (which are supported in general fleet team management) are not supported for rollout sequencing.

  • Each team scope must be in a different fleet to create a rollout sequence between them. Creating a rollout sequence between different team scopes within the same fleet is unsupported.

If you have already organized your clusters into groups, you can skip the following steps and proceed to Create a rollout sequence.

Fleets

To create a fleet-based rollout sequence, first you must group your clusters into fleets. You can organize your clusters by deployment environments such as Testing, Staging, and Production, as shown in the example fleet-based rollout sequence.

Register each cluster with a fleet based on your chosen grouping.

Teams

To create a team-based rollout sequence, you must group your clusters into team scopes. To do so, first you organize your clusters into fleets by deployment environments such as Testing, Staging, and Production, as shown in the example scope-based rollout sequence. Then, you can further subdivide your clusters into scopes for different teams' clusters.

  1. For each cluster in the sequence, register your cluster with a fleet. The cluster should be registered to the fleet in the project where you will create the team scope for this cluster. If you want to register a cluster to a fleet in a different host project, ensure you have set the necessary permissions for cross-project registration.
  2. Create 2-3 team scopes to organize your clusters. Create each scope in the host project of the team's respective fleet. You can have up to three team scopes in a rollout sequence.

    See the reference for gcloud alpha container fleet scopes create for a complete list of flags. With the create command, you can use the flags in the instructions to create a rollout sequence.

  3. Add each cluster to a scope.

Create a rollout sequence

A rollout sequence is organized as a linked list with up to three elements.

When you create a rollout sequence, you set the following properties for each group of clusters, either a fleet or team scope:

  • Upstream group: The upstream fleet or team scope, which qualifies new versions for the downstream group. You don't set an upstream group for the first group in a sequence.
  • Soak time: The soak time for a group is the time between when upgrades complete (or rollout has taken 30 days) and when upgrades can begin on the downstream group. To learn more, see How version qualification works in a rollout sequence.

For each of the following commands, replace SOAK_TIME with the soak time for the group you are updating.

Fleets - gcloud

The following instructions use the gcloud container fleet clusterupgrade update command, however you can set the same properties with the gcloud container fleet clusterupgrade create command.

Create a rollout sequence:

  1. Set the soak time for the first fleet in the sequence:

    gcloud container fleet clusterupgrade update \
        --default-upgrade-soaking=SOAK_TIME \
        --project=FIRST_FLEET_PROJECT_ID
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the fleet host project.

  2. Set the upstream fleet and the soak time for the second fleet in the sequence:

    gcloud container fleet clusterupgrade update \
        --upstream-fleet=FIRST_FLEET_PROJECT_ID \
        --default-upgrade-soaking=SOAK_TIME \
        --project=SECOND_FLEET_PROJECT_ID
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the first fleet's host project, and SECOND_FLEET_PROJECT_ID with the project ID of the fleet host project.

  3. Optional: If you want to have three fleets in a rollout sequence, set the upstream fleet for the third fleet in the sequence:

    gcloud container fleet clusterupgrade update \
        --upstream-fleet=SECOND_FLEET_PROJECT_ID \
        --default-upgrade-soaking=SOAK_TIME \
        --project=THIRD_FLEET_PROJECT_ID
    

    Replace SECOND_FLEET_PROJECT_ID with the project ID of the second fleet's host project, and THIRD_FLEET_PROJECT_ID with the project ID of the fleet host project.

Fleets - Terraform

This section shows you how to create a fleet-based sequence using Terraform. You can also use this resource to update the sequence. To learn more, see the reference documentation for google_gke_hub_feature.

Create a rollout sequence:

  1. Add the following block to your Terraform configuration to set the soak time for the first fleet in the sequence:

    resource "google_gke_hub_feature" "feature" {
      name = "clusterupgrade"
      location = "global"
      spec {
        clusterupgrade {
          upstream_fleets = []
          post_conditions {
            soaking = "SOAK_TIME"
          }
        }
      }
      project = "FIRST_FLEET_PROJECT_ID"
    }
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the fleet host project.

  2. Add the following block to your Terraform configuration to set the upstream fleet and the soak time for the second fleet in the sequence:

    resource "google_gke_hub_feature" "feature" {
      name = "clusterupgrade"
      location = "global"
      spec {
        clusterupgrade {
          upstream_fleets = ["FIRST_FLEET_PROJECT_ID"]
          post_conditions {
            soaking = "SOAK_TIME"
          }
        }
      }
      project = "SECOND_FLEET_PROJECT_ID"
    }
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the first fleet's host project, and SECOND_FLEET_PROJECT_ID with the project ID of the fleet host project.

  3. Optional: If you want to have three fleets in a rollout sequence, add the following block to your Terraform configuration to set the upstream fleet for the fleet in the sequence:

    resource "google_gke_hub_feature" "feature" {
      name = "clusterupgrade"
      location = "global"
      spec {
        clusterupgrade {
          upstream_fleets = ["SECOND_FLEET_PROJECT_ID"]
          post_conditions {
            soaking = "SOAK_TIME"
          }
        }
      }
      project = "THIRD_FLEET_PROJECT_ID"
    }
    

    Replace SECOND_FLEET_PROJECT_ID with the project ID of the second fleet's host project, and THIRD_FLEET_PROJECT_ID with the project ID of the fleet host project.

Teams - gcloud

You can set these properties when you create or update a team scope. The following instructions use the gcloud alpha container fleet scopes update command, however you can set the same properties when you create a team scope with the gcloud alpha container fleet scopes create command.

For each of these commands, replace the variables with the respective team scope's name or the team scope's fleet host project ID.

Create a rollout sequence:

  1. Set the soak time for the first scope in the sequence:

    gcloud alpha container fleet scopes update projects/FIRST_SCOPE_PROJECT_ID/locations/global/scopes/FIRST_SCOPE_NAME \
        --default-upgrade-soaking=SOAK_TIME \
        --project=FIRST_SCOPE_PROJECT_ID
    
  2. Set the upstream scope and the soak time for the second scope in the sequence:

    gcloud alpha container fleet scopes update projects/SECOND_SCOPE_PROJECT_ID/locations/global/scopes/SECOND_SCOPE_NAME \
        --upstream-scope=projects/FIRST_SCOPE_PROJECT_ID/locations/global/scopes/FIRST_SCOPE_NAME \
        --default-upgrade-soaking=SOAK_TIME \
        --project=SECOND_SCOPE_PROJECT_ID
    
  3. Optional: If you want to have three team scopes in a rollout sequence, set the upstream scope for the third scope in the sequence:

    gcloud alpha container fleet scopes update projects/THIRD_SCOPE_PROJECT_ID/locations/global/scopes/THIRD_SCOPE_NAME \
        --upstream-scope=projects/SECOND_SCOPE_PROJECT/locations/global/scopes/SECOND_SCOPE_NAME \
        --default-upgrade-soaking=SOAK_TIME \
        --project=THIRD_SCOPE_PROJECT_ID
    

Check status of a rollout sequence

Use these commands in the following sections to check on how upgrades are progressing in a rollout sequence. To learn more about what details are provided, see Status information for a rollout sequence

To run these commands, ensure that you have the required permissions for each fleet host project. For example, if the sequence has cross-project scopes in different fleets, you need permissions in each project to describe the sequence.

For the following commands, if you only need information about one fleet or scope in the sequence, replace the --show-linked-cluster-upgrade flag with --show-cluster-upgrade.

Fleets

Check the status of a fleet-based rollout sequence:

gcloud container fleet clusterupgrade describe \
    --show-linked-cluster-upgrade --project=FLEET_PROJECT_ID

Replace FLEET_PROJECT_ID with the project ID of the host project for any fleet in the sequence.

See the reference gcloud container fleet clusterupgrade describe for a complete list of flags.

Teams

Check the status of a team-based rollout sequence:

gcloud alpha container fleet scopes describe SCOPE_NAME \
    --show-linked-cluster-upgrade
    --project=SCOPE_PROJECT_ID

Replace SCOPE_NAME with the name of any team scope in the rollout sequence and SCOPE_PROJECT_ID with the project ID of this team scope.

See the reference for gcloud alpha container fleet scopes describe for a complete list of flags.

To see the status of individual clusters within a fleet or team scope, run the following command in the fleet host project and see the membershipStates section:

gcloud container fleet features describe clusterupgrade

Status information for a rollout sequence

When you check the status of a version rollout, you can see the progress of each group and cluster within that group.

See the following table for the potential statuses of a cluster or group:

Status For cluster For group
INELIGIBLE This cluster is ineligible for this upgrade One or more clusters in this group are ineligible for this upgrade.
PENDING The upgrade hasn't started or the upgrade is in progress for the cluster. The upgrade hasn't started on any of the clusters in the group.
IN_PROGRESS N/A The upgrade has started on at least one cluster but hasn't finished on all clusters.
SOAKING The upgrade has finished on the cluster and hasn't finished soaking. The upgrade has finished on all clusters and hasn't finished soaking.
FORCED_SOAKING The upgrade took more than the maximum upgrade time (30 days) and therefore we forced it to enter the soaking phase. The upgrade can still continue in the cluster. The upgrade took more than the maximum upgrade time (30 days) and therefore we forced it to enter the soaking phase. The upgrade can still continue in the clusters.
COMPLETE The upgrade is treated as "done", meaning that the upgrade has finished soaking on this cluster. The upgrade is treated as "done" and ready to be consumed by the downstream group, meaning that the upgrade has finished soaking.

In the output of these commands, theclusterUpgrade(s).spec and clusterUpgrade(s).state attributes contain additional information about the cluster upgrade such as soaking time, cluster upgrade overrides, and upgrade status.

Manage a rollout sequence

You can control automatic cluster upgrades with rollout sequencing in several ways, explained in the following sections.

Change the soak time for a group

You can change the default soak time for a group or change the soak time for when that group upgrades to a specific version.

Update the default soak time

To change the default soak time for a group, use the commands from the instructions to Create a rollout sequence, omitting the flags to set the upstream group.

Override the default soak time

You can change the soak time for a specific version rollout to be different than the default soak time for the group. For example, if you have already qualified a new version and are ready for upgrades to begin in the next group, you can set the soak time to zero. You can also use it if you want more time than the default soaking time to qualify a specific version.

As the soak time is set on a per-group basis, if you want to override the soak time for other groups in this sequence, update them using this same command with the fleet or scope name replaced, depending on the type of sequence.

For the instructions in this section, replace the following variables:

  • SOAK_TIME: the soak time to use other than the default (for example, "0d" if you want to skip the soak time for one version rollout).
  • UPGRADE_NAME: the name of the upgrade, can be k8s_control_plane or k8s_node.
  • VERSION: the GKE version where you want the soak time after the rollout to this group, for example, 1.25.2-gke.400.

Fleets - gcloud

Run this command in the host project of the fleet where you want to override the soak time used for the version rollout of a specific version.

Change the soak time of a fleet:

gcloud container fleet clusterupgrade update
    --add-upgrade-soaking-override=SOAK_TIME \
    --upgrade-selector=name=UPGRADE_NAME,version=VERSION

Fleets - Terraform

Add the following gke_upgrades_overrides block to your Terraform configuration within the clusterupgrade block to override the soak time used for the version rollout of a specific version:

gke_upgrade_overrides {
    upgrade {
      name = "UPGRADE_NAME"
      version = "VERSION"
    }
    post_conditions {
      soaking = "SOAK_TIME"
    }
  }

Teams - gcloud

Run this command in the host project of the team scope's fleet. Replace SCOPE_NAME with the name of the team scope for which you want to override the soak time used for the version rollout of a specific version.

Change the soak time of a team scope:

gcloud alpha container fleet scopes update SCOPE_NAME \
    --add-upgrade-soaking-override=SOAK_TIME \
    --upgrade-selector=name=UPGRADE_NAME,version=VERSION

Change order of a sequence

If you want to change the order of a sequence, use the commands from the instructions to Create a rollout sequence to update the upstream groups.

Delay the completion of group's version rollout

If you need to temporarily prevent a group from completing the rollout of a new version to its clusters, you can add a maintenance exclusion to any of the clusters that have not been upgraded to the target version. This can pause a group from proceeding to its soak time or downstream group for up to 30 days. After 30 days, the group will begin soaking.

You can also change the soak time for that group to 30 days to maximize how long the rollout sequence waits before proceeding to the next group.

If you need to further delay upgrades beginning for the next group, you can use maintenance exclusions for the clusters in the next group.

Switch between fleet-based and team-based rollout sequences

You can switch from either fleet-based sequences to team-based sequences, or team-based sequences to fleet-based sequences. The instructions assume that you are transferring between sequences organized like those illustrated in the example diagrams.

Fleets to teams

To change your clusters from a fleet-based rollout sequence to a team-based rollout sequence, do the following steps:

  1. Configure maintenance exclusions for all clusters in each of your fleets to prevent any upgrades while you are modifying your configuration.
  2. Ensure that you have enabled GKE Enterprise in your fleet host projects.
  3. In each of your fleets, create one or more team scopes for subdividing the group of clusters in that fleet.
  4. Create one or more rollout sequences between the matching team scopes in each fleet.
  5. Add your clusters to their new team scopes.
  6. Remove the maintenance exclusions that you configured for this change.

Teams to fleets

To change your clusters from a team-based rollout sequence to a fleet-based rollout sequence, do the following steps:

  1. Configure maintenance exclusions for all clusters in each of your fleets to prevent any upgrades while you are modifying your configuration.
  2. Create a rollout sequence between your fleets.
  3. Remove your clusters from their team scopes. Now these clusters are only registered to their scope's respective fleets that, in the previous step, you joined in a rollout sequence.
  4. Delete the team scopes.
  5. Remove the maintenance exclusions that you configured for this change.

Delete a sequence

To delete a sequence, you remove the upstream associations for the second and third groups (if the rollout sequence has three groups).

For fleets

Run the following command in the fleet host project of the second and third fleets in the rollout sequence:

gcloud container fleet clusterupgrade update --reset-upstream-fleet

For teams

Run the following command in the fleet host project of the second and third team scopes in the rollout sequence:

gcloud alpha container fleet scopes update SCOPE_NAME --reset-upstream-scope

Replace SCOPE_NAME with the names of the second and third scopes, respectively.

Troubleshooting

Troubleshoot rollout eligibility

If all clusters in a rollout sequence don't have the same upgrade target, GKE might not be able to proceed with cluster upgrades. Automatic upgrades cannot proceed if an upstream group does not qualify one upgrade target to pass to the downstream group. Automatic upgrades also cannot proceed if clusters in the upstream group qualify an invalid upgrade target for clusters in the downstream group.

To check if your rollout sequence has any rollout eligibility issues, check the status of the rollout sequence. If a group is ineligible, follow the instructions to see the status of individual clusters in a group.

To immediately advance cluster upgrades, remove any clusters with an INELIGIBLE status following the instructions to Advance partially eligible rollouts.

Fix eligibility in a group

In a group, if a cluster is ineligible because it is on an earlier version (for example, most of the clusters in the group are being upgraded from 1.23 to 1.24 and a cluster is on version 1.22), you can manually upgrade the cluster to 1.24 to resolve the version discrepancy.

In a group, if a cluster is ineligible because it is on a later version (for example, most of the clusters in the group are being upgraded from 1.23 to 1.24 and a cluster is on version 1.25), you cannot manually downgrade the cluster to solve the version discrepancy and need to remove the cluster.

Fix eligibility between groups

Between groups, if there is a mismatch in upgrade targets where the downstream group is on a newer version (for example, the upstream group upgraded from 1.23 to 1.24 and the clusters in the downstream group are on 1.25), you can manually upgrade the clusters in the upstream group to 1.25 to ensure that upgrades proceed.

Between groups, if there is a mismatch in upgrade targets where the downstream group is on an earlier version (for example, the upstream group upgraded from 1.24 to 1.25 and the clusters in the downstream group are on 1.23), you can manually upgrade the clusters in the downstream group to 1.24 or 1.25 to ensure that upgrades proceed.

Advance partially eligible rollouts

If cluster upgrades in a group will not finish because of issues with rollout eligibility (for example, version discrepancies within a group), you can remove clusters that are ineligible for the group's upgrade target from a group to complete the version rollout and begin the soak time or move on to the next group in the rollout sequence. You can also remove a cluster from a group for other reasons, for example if this cluster's usage is no longer related to the other clusters in the group.

Follow the instructions to unregister a cluster from a fleet or remove clusters from team scopes, depending on the type of rollout sequence.

After you have removed all clusters which are preventing a group's version rollout from being completed, the group's version rollout will complete. Confirm this by following the instructions to Check the status of a version rollout.

What's next