Sequence the rollout of cluster upgrades


This page shows you how to manage GKE cluster upgrades using rollout sequencing. To learn more, see About cluster upgrades with rollout sequencing.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Required roles

Configure a rollout sequence

This document explains how to create a rollout sequence using groups of clusters organized by fleets or team scopes. This document uses the term group to refer to both fleets and team scopes, because you can create a rollout sequence organized with either grouping method.

You can create a sequence of up to three groups of clusters, and you can choose how much soak testing time you want after cluster upgrades are complete in a group (maximum 30 days). You can include both Autopilot and Standard clusters.

To create a rollout sequence, your clusters must be organized into groups of either fleets or team scopes. For guidance on how to organize your clusters, see the community bank example. After they are organized into groups, you can create a rollout sequence by defining the upstream group relationships and each group's soak time. Upstream, in a rollout sequence, refers to the previous group, and downstream refers to the next group.

Organize your clusters into groups

In a rollout sequence, all clusters in all groups must be enrolled in the same release channel and be on the same minor version. If these requirements are not met and there are version discrepancies between clusters, this can cause issues with the version rollout. For more information, see Rollout eligibility.

You can create rollout sequences between fleets, or rollout sequences between a team's team scopes (Preview).

As you saw in About cluster upgrades with rollout sequencing, team scopes are an enterprise fleet-level construct for associating subsets of fleet clusters with specific application teams. You must enable GKE Enterprise to use team scopes. The following limitations apply when using or creating team scopes for rollout sequencing:

  • Team-based sequences require single-tenancy clusters: in other words, each individual cluster is only associated with a single team. Shared clusters (which are supported in general fleet team management) are not supported for rollout sequencing.

  • Each team scope must be in a different fleet to create a rollout sequence between them. Creating a rollout sequence between different team scopes within the same fleet is unsupported.

If you have already organized your clusters into groups, you can skip the following steps and proceed to Create a rollout sequence.

Fleets

To create a fleet-based rollout sequence, first you must group your clusters into fleets. You can organize your clusters by deployment environments such as Testing, Staging, and Production, as shown in the example fleet-based rollout sequence.

Register each cluster with a fleet based on your chosen grouping.

Teams

To create a team-based rollout sequence, you must group your clusters into team scopes. To do so, first you organize your clusters into fleets by deployment environments such as Testing, Staging, and Production, as shown in the example scope-based rollout sequence. Then, you can further subdivide your clusters into scopes for different teams' clusters.

  1. For each cluster in the sequence, register your cluster with a fleet. The cluster should be registered to the fleet in the project where you will create the team scope for this cluster. If you want to register a cluster to a fleet in a different host project, ensure you have set the necessary permissions for cross-project registration.
  2. Create 2-3 team scopes to organize your clusters. Create each scope in the host project of the team's respective fleet. You can have up to three team scopes in a rollout sequence.

    See the reference for gcloud container fleet scopes create for a complete list of flags. With the create command, you can use the flags in the instructions to create a rollout sequence.

  3. Add each cluster to a scope.

Create a rollout sequence

A rollout sequence is organized as a linked list with up to three elements.

When you create a rollout sequence, you set the following properties for each group of clusters, either a fleet or team scope:

  • Upstream group: The upstream fleet or team scope, which qualifies new versions for the downstream group. You don't set an upstream group for the first group in a sequence.
  • Soak time: The soak time for a group is the time between when upgrades complete (or rollout has taken 30 days) and when upgrades can begin on the downstream group. To learn more, see How version qualification works in a rollout sequence.

Fleets - gcloud

The following instructions use the gcloud container fleet clusterupgrade update command, however you can set the same properties with the gcloud container fleet clusterupgrade create command.

For each of the following commands, replace SOAK_TIME with the soak time for the fleet you are updating.

Create a rollout sequence:

  1. Set the soak time for the first fleet in the sequence:

    gcloud container fleet clusterupgrade update \
        --default-upgrade-soaking=SOAK_TIME \
        --project=FIRST_FLEET_PROJECT_ID
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the fleet host project.

  2. Set the upstream fleet and the soak time for the second fleet in the sequence:

    gcloud container fleet clusterupgrade update \
        --upstream-fleet=FIRST_FLEET_PROJECT_ID \
        --default-upgrade-soaking=SOAK_TIME \
        --project=SECOND_FLEET_PROJECT_ID
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the first fleet's host project, and SECOND_FLEET_PROJECT_ID with the project ID of the fleet host project.

  3. Optional: If you want to have three fleets in a rollout sequence, set the upstream fleet for the third fleet in the sequence:

    gcloud container fleet clusterupgrade update \
        --upstream-fleet=SECOND_FLEET_PROJECT_ID \
        --default-upgrade-soaking=SOAK_TIME \
        --project=THIRD_FLEET_PROJECT_ID
    

    Replace SECOND_FLEET_PROJECT_ID with the project ID of the second fleet's host project, and THIRD_FLEET_PROJECT_ID with the project ID of the fleet host project.

Fleets - console

  1. Go to the Rollout Sequencing page in the Google Cloud console.

    Go to Rollout Sequencing

  2. Click Create rollout sequence.

  3. In the Create a rollout sequence pane, select the first two fleets in the sequence:

    1. In the Fleet 1 section, select the first fleet in the sequence.
    2. In the Soak time for upstream fleet section, set the soak time for the first fleet using the Days, Hours, and Minutes fields.
    3. In the Fleet 2 section, select the second fleet in the sequence.
    4. Click Create.
  4. Optional: If you want to have three fleets in this rollout sequence, do the following additional steps:

    1. In the Rollout graph, click the element for the second fleet.
    2. Click Add downstream fleet.
    3. In the Soak time for upstream fleet section, set the soak time for the second fleet using the Days, Hours, and Minutes fields.
    4. In the Next fleet in the sequence section, select the third fleet in the sequence.
    5. Click Save.

Fleets - Terraform

This section shows you how to create a fleet-based sequence using Terraform. You can also use this resource to update the sequence. To learn more, see the reference documentation for google_gke_hub_feature.

For each of the following commands, replace SOAK_TIME with the soak time for the fleet you are updating.

Create a rollout sequence:

  1. Add the following block to your Terraform configuration to set the soak time for the first fleet in the sequence:

    resource "google_gke_hub_feature" "feature" {
      name = "clusterupgrade"
      location = "global"
      spec {
        clusterupgrade {
          upstream_fleets = []
          post_conditions {
            soaking = "SOAK_TIME"
          }
        }
      }
      project = "FIRST_FLEET_PROJECT_ID"
    }
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the fleet host project.

  2. Add the following block to your Terraform configuration to set the upstream fleet and the soak time for the second fleet in the sequence:

    resource "google_gke_hub_feature" "feature" {
      name = "clusterupgrade"
      location = "global"
      spec {
        clusterupgrade {
          upstream_fleets = ["FIRST_FLEET_PROJECT_ID"]
          post_conditions {
            soaking = "SOAK_TIME"
          }
        }
      }
      project = "SECOND_FLEET_PROJECT_ID"
    }
    

    Replace FIRST_FLEET_PROJECT_ID with the project ID of the first fleet's host project, and SECOND_FLEET_PROJECT_ID with the project ID of the fleet host project.

  3. Optional: If you want to have three fleets in a rollout sequence, add the following block to your Terraform configuration to set the upstream fleet for the fleet in the sequence:

    resource "google_gke_hub_feature" "feature" {
      name = "clusterupgrade"
      location = "global"
      spec {
        clusterupgrade {
          upstream_fleets = ["SECOND_FLEET_PROJECT_ID"]
          post_conditions {
            soaking = "SOAK_TIME"
          }
        }
      }
      project = "THIRD_FLEET_PROJECT_ID"
    }
    

    Replace SECOND_FLEET_PROJECT_ID with the project ID of the second fleet's host project, and THIRD_FLEET_PROJECT_ID with the project ID of the fleet host project.

Teams - gcloud

You can set these properties when you create or update a team scope. The following instructions use the gcloud container fleet scopes update command, however you can set the same properties when you create a team scope with the gcloud container fleet scopes create command.

For each of these commands, replace the following:

  • The variables with the respective team scope's name or the team scope's fleet host project ID.
  • The SOAK_TIME with the soak time for the team scope you are updating.

Create a rollout sequence:

  1. Set the soak time for the first scope in the sequence:

    gcloud container fleet scopes update projects/FIRST_SCOPE_PROJECT_ID/locations/global/scopes/FIRST_SCOPE_NAME \
        --default-upgrade-soaking=SOAK_TIME \
        --project=FIRST_SCOPE_PROJECT_ID
    
  2. Set the upstream scope and the soak time for the second scope in the sequence:

    gcloud container fleet scopes update projects/SECOND_SCOPE_PROJECT_ID/locations/global/scopes/SECOND_SCOPE_NAME \
        --upstream-scope=projects/FIRST_SCOPE_PROJECT_ID/locations/global/scopes/FIRST_SCOPE_NAME \
        --default-upgrade-soaking=SOAK_TIME \
        --project=SECOND_SCOPE_PROJECT_ID
    
  3. Optional: If you want to have three team scopes in a rollout sequence, set the upstream scope for the third scope in the sequence:

    gcloud container fleet scopes update projects/THIRD_SCOPE_PROJECT_ID/locations/global/scopes/THIRD_SCOPE_NAME \
        --upstream-scope=projects/SECOND_SCOPE_PROJECT/locations/global/scopes/SECOND_SCOPE_NAME \
        --default-upgrade-soaking=SOAK_TIME \
        --project=THIRD_SCOPE_PROJECT_ID
    

Check status of a rollout sequence

You can check the status of a rollout sequence with either of the following methods:

  • Monitor a visual representation of a rollout sequence in the Google Cloud console (Preview, fleet-based rollout sequence only).
  • Use the gcloud CLI or GKE Hub API to check the status of a rollout sequence.

Monitor a rollout sequence in the Google Cloud console

  1. Go to the Rollout Sequencing page in the Google Cloud console.

    Go to Rollout Sequencing

  2. View the sequence in the section Monitor your rollout sequence. If you don't see a rollout sequence, switch to a different rollout sequence, or create a rollout sequence if you haven't already done so.

How to use the console to monitor a rollout sequence

On this page, you can view the rollout sequence associated with your project's fleet. You can do the following to see the progress of a rollout sequence:

  • View the entire rollout sequence, or see the statuses of individual fleets and clusters within those fleets, as well as the soak time between fleets. You can also view the sequence where there is no active upgrade, if you want to check the configuration of the sequence.
  • Filter by upgrade type (control plane or node upgrade) and specific version (for example, 1.31.6-gke.500).

You can visually monitor your entire rollout sequence while GKE upgrades all the clusters in the sequence, qualifying a new version across environments before upgrading your production environment clusters. While monitoring, you can manage a rollout sequence with the gcloud CLI, making any changes as needed.

Switch to a different rollout sequence

This page shows the fleet-based rollout sequence if the active project in the Google Cloud console is a fleet host project for a fleet that is enrolled in a rollout sequence.

If you want to view a different rollout sequence, select a fleet host project associated with a different rollout sequence from the project picker at the top of the page.

Use the gcloud CLI

Use these commands in the following sections to check on how upgrades are progressing in a rollout sequence. To learn more about what details are provided, see Status information for a rollout sequence

To run these commands, ensure that you have the required permissions for each fleet host project. For example, if the sequence has cross-project scopes in different fleets, you need permissions in each project to describe the sequence.

For the following commands, if you only need information about one fleet or scope in the sequence, replace the --show-linked-cluster-upgrade flag with --show-cluster-upgrade.

Fleets

Check the status of a fleet-based rollout sequence:

gcloud container fleet clusterupgrade describe \
    --show-linked-cluster-upgrade --project=FLEET_PROJECT_ID

Replace FLEET_PROJECT_ID with the project ID of the host project for any fleet in the sequence.

See the reference gcloud container fleet clusterupgrade describe for a complete list of flags.

Teams

Check the status of a team-based rollout sequence:

gcloud container fleet scopes describe SCOPE_NAME \
    --show-linked-cluster-upgrade
    --project=SCOPE_PROJECT_ID

Replace SCOPE_NAME with the name of any team scope in the rollout sequence and SCOPE_PROJECT_ID with the project ID of this team scope.

See the reference for gcloud container fleet scopes describe for a complete list of flags.

To see the status of individual clusters within a fleet or team scope, run the following command in the fleet host project and see the membershipStates section:

gcloud container fleet features describe clusterupgrade

Status information for a rollout sequence

When you check the status of a version rollout, you can see the progress of each group and cluster within that group.

See the following table for the potential statuses of a cluster or group:

Status For a single cluster For a group (fleet or team scope)
INELIGIBLE This cluster is ineligible for this upgrade One or more clusters in this group are ineligible for this upgrade.
PENDING The upgrade hasn't started or the upgrade is in progress for the cluster. The upgrade hasn't started on any of the clusters in the group.
IN_PROGRESS N/A The upgrade has started on at least one cluster but hasn't finished on all clusters.
SOAKING The upgrade has finished on the cluster and hasn't finished soaking. The upgrade has finished on all clusters and hasn't finished soaking.
FORCED_SOAKING The upgrade took more than the maximum upgrade time (30 days) and therefore we forced it to enter the soaking phase. The upgrade can still continue in the cluster. The upgrade took more than the maximum upgrade time (30 days) and therefore we forced it to enter the soaking phase. The upgrade can still continue in the clusters.
COMPLETE The upgrade is treated as "done", meaning that the upgrade has finished soaking on this cluster. The upgrade is treated as "done" and ready to be consumed by the downstream group, meaning that the upgrade has finished soaking.

In the output of these commands, theclusterUpgrade(s).spec and clusterUpgrade(s).state attributes contain additional information about the cluster upgrade such as soaking time, cluster upgrade overrides, and upgrade status.

Manage a rollout sequence

You can control automatic cluster upgrades with rollout sequencing in several ways, explained in the following sections.

Change the soak time for a group

You can change the default soak time for a group or change the soak time for when that group upgrades to a specific version. The maximum is 30 days.

Update the default soak time

You can update the default soak time in the Google Cloud console (Preview, fleet-based rollout sequence only) or with the gcloud CLI.

gcloud

To change the default soak time for a group, use the gcloud CLI commands from the instructions to Create a rollout sequence, omitting the flags to set the upstream group.

Fleets - console

  1. Go to the Rollout Sequencing page in the Google Cloud console.

    Go to Rollout Sequencing

  2. View the sequence in the section Monitor your rollout sequence. If you don't see a rollout sequence, switch to a different rollout sequence, or create a rollout sequence if you haven't already done so.

  3. In the Rollout graph, click the Soak time element after the element of the fleet where you want to update the soak time.

  4. Click Edit soak time.

  5. In the section Set a new soak time, enter a new soak time using the Days, Hours, and Minutes fields.

  6. To save the settings, click Save.

Override the default soak time

You can change the soak time for a specific version rollout to be different than the default soak time for the group. For example, if you have already qualified a new version and are ready for upgrades to begin in the next group, you can set the soak time to zero. You can also use it if you want more time than the default soaking time to qualify a specific version.

As the soak time is set on a per-group basis, if you want to override the soak time for other groups in this sequence, update them using this same command with the fleet or scope name replaced, depending on the type of sequence.

For the instructions in this section, replace the following variables:

  • SOAK_TIME: the soak time to use other than the default (for example, "0d" if you want to skip the soak time for one version rollout).
  • UPGRADE_NAME: the type of upgrade, either k8s_control_plane for control plane upgrades or k8s_node for node upgrades.
  • VERSION: the GKE version where you want to override the default soak time after the version (for example, 1.25.2-gke.400) has been rolled out to this group.

Fleets - gcloud

Run this command in the host project of the fleet where you want to override the soak time used for the version rollout of a specific version.

Change the soak time of a fleet:

gcloud container fleet clusterupgrade update
    --add-upgrade-soaking-override=SOAK_TIME \
    --upgrade-selector=name=UPGRADE_NAME,version=VERSION

Fleets - Terraform

Add the following gke_upgrades_overrides block to your Terraform configuration within the clusterupgrade block to override the soak time used for the version rollout of a specific version:

gke_upgrade_overrides {
    upgrade {
      name = "UPGRADE_NAME"
      version = "VERSION"
    }
    post_conditions {
      soaking = "SOAK_TIME"
    }
  }

Teams - gcloud

Run this command in the host project of the team scope's fleet. Replace SCOPE_NAME with the name of the team scope for which you want to override the soak time used for the version rollout of a specific version.

Change the soak time of a team scope:

gcloud container fleet scopes update SCOPE_NAME \
    --add-upgrade-soaking-override=SOAK_TIME \
    --upgrade-selector=name=UPGRADE_NAME,version=VERSION

Update the groups in a rollout sequence

You can update an existing rollout sequence to add, remove, or change the order of groups in the sequence. To make these changes, update the associations between groups.

You can perform these steps in the Google Cloud console (Preview, fleet-based rollout sequence only) or with the gcloud CLI.

Fleets - gcloud

Use the gcloud container fleet clusterupgrade update command with the --upstream-fleet flag to add or change upstream fleets. Use the --reset-upstream-fleet flag to remove an upstream fleet.

You can do actions such as the following:

  • Add another fleet to the start of the rollout sequence by adding an upstream fleet to the first fleet in the sequence.
  • Change the order of the fleets in the rollout sequence by changing the upstream fleet associations.
  • Remove the first fleet in the rollout sequence by removing the upstream fleet of the second fleet.

Fleets - console

  1. Go to the Rollout Sequencing page in the Google Cloud console.

    Go to Rollout Sequencing

  2. View the sequence in the section Monitor your rollout sequence. If you don't see a rollout sequence, switch to a different rollout sequence, or create a rollout sequence if you haven't already done so.

  3. In the Rollout graph, click the elements for the existing fleets in the sequence. After you click those elements, you can do some of the following actions to make the changes:

    • Click Add downstream fleet.
    • Click Add upstream fleet.
    • Click Remove fleet.

You can do actions such as the following:

  • Add another fleet to the end of the rollout sequence by adding a downstream fleet to the last fleet in the sequence.
  • Add another fleet to the start of the rollout sequence by adding an upstream fleet to the first fleet in the sequence.
  • Change the order of the fleets in the rollout sequence by removing fleets, then adding the fleets back with a different upstream or downstream fleet.
  • Remove the first fleet in the rollout sequence.
  • Remove the last fleet in the rollout sequence.
  • Remove the middle fleet in the rollout sequence, after removing the first or last fleet in the sequence.

Teams - gcloud

Use the gcloud container fleet scopes update command with the --upstream-scope flag to add or change upstream team scopes. Use the --reset-upstream-scope flag to remove an upstream team scope.

You can do actions such as the following:

  • Add another team scope to the start of the rollout sequence by adding an upstream team scope to the first team scope in the sequence.
  • Change the order of the team scopes in the rollout sequence by changing the upstream team scope associations.
  • Remove the first team scope in the rollout sequence by removing the upstream team scope of the second team scope.

Delay the completion of group's version rollout

If you need to temporarily prevent a group from completing the rollout of a new version to its clusters, you can add a maintenance exclusion to any of the clusters that have not been upgraded to the target version. This can pause a group from proceeding to its soak time or downstream group for up to 30 days. After 30 days, the group will begin soaking.

You can also change the soak time for that group to 30 days to maximize how long the rollout sequence waits before proceeding to the next group.

If you need to further delay upgrades beginning for the next group, you can use maintenance exclusions for the clusters in the next group.

Switch between fleet-based and team-based rollout sequences

You can switch from either fleet-based sequences to team-based sequences, or team-based sequences to fleet-based sequences. The instructions assume that you are transferring between sequences organized like those illustrated in the example diagrams.

Fleets to teams

To change your clusters from a fleet-based rollout sequence to a team-based rollout sequence, do the following steps:

  1. Configure maintenance exclusions for all clusters in each of your fleets to prevent any upgrades while you are modifying your configuration.
  2. Ensure that you have enabled GKE Enterprise in your fleet host projects.
  3. In each of your fleets, create one or more team scopes for subdividing the group of clusters in that fleet.
  4. Create one or more rollout sequences between the matching team scopes in each fleet.
  5. Add your clusters to their new team scopes.
  6. Remove the maintenance exclusions that you configured for this change.

Teams to fleets

To change your clusters from a team-based rollout sequence to a fleet-based rollout sequence, do the following steps:

  1. Configure maintenance exclusions for all clusters in each of your fleets to prevent any upgrades while you are modifying your configuration.
  2. Create a rollout sequence between your fleets.
  3. Remove your clusters from their team scopes. Now these clusters are only registered to their scope's respective fleets that, in the previous step, you joined in a rollout sequence.
  4. Delete the team scopes.
  5. Remove the maintenance exclusions that you configured for this change.

Delete a sequence

To delete a sequence, you remove the upstream associations for the second and third groups (if the rollout sequence has three groups).

You can perform these steps in the Google Cloud console (Preview, fleet-based rollout sequence only) or with the gcloud CLI.

Fleets - gcloud

Run the following command in the fleet host project of the second and third fleets in the rollout sequence:

gcloud container fleet clusterupgrade update --reset-upstream-fleet

Fleets - console

  1. Go to the Rollout Sequencing page in the Google Cloud console.

    Go to Rollout Sequencing

  2. View the sequence in the section Monitor your rollout sequence. If you don't see a rollout sequence, switch to a different rollout sequence, or create a rollout sequence if you haven't already done so.

  3. In the Rollout graph, click the element for the third fleet.

  4. Click Remove fleet.

  5. To remove the fleet, click Remove.

  6. Repeat the previous three steps for the second fleet.

Teams - gcloud

Run the following command in the fleet host project of the second and third team scopes in the rollout sequence:

gcloud container fleet scopes update SCOPE_NAME --reset-upstream-scope

Replace SCOPE_NAME with the names of the second and third scopes, respectively.

Troubleshooting

Troubleshoot rollout eligibility

If all clusters in a rollout sequence don't have the same upgrade target, GKE might not be able to proceed with cluster upgrades. Automatic upgrades cannot proceed if an upstream group does not qualify one upgrade target to pass to the downstream group. Automatic upgrades also cannot proceed if clusters in the upstream group qualify an invalid upgrade target for clusters in the downstream group.

To check if your rollout sequence has any rollout eligibility issues, check the status of the rollout sequence. If a group is ineligible, follow the instructions to see the status of individual clusters in a group.

To immediately advance cluster upgrades, remove any clusters with an INELIGIBLE status following the instructions to Advance partially eligible rollouts.

Fix eligibility in a group

In a group, if a cluster is ineligible because it is on an earlier version (for example, most of the clusters in the group are being upgraded from 1.23 to 1.24 and a cluster is on version 1.22), you can manually upgrade the cluster to 1.24 to resolve the version discrepancy.

In a group, if a cluster is ineligible because it is on a later version (for example, most of the clusters in the group are being upgraded from 1.23 to 1.24 and a cluster is on version 1.25), you cannot manually downgrade the cluster to solve the version discrepancy and need to remove the cluster.

Fix eligibility between groups

Between groups, if there is a mismatch in upgrade targets where the downstream group is on a newer version (for example, the upstream group upgraded from 1.23 to 1.24 and the clusters in the downstream group are on 1.25), you can manually upgrade the clusters in the upstream group to 1.25 to ensure that upgrades proceed.

Between groups, if there is a mismatch in upgrade targets where the downstream group is on an earlier version (for example, the upstream group upgraded from 1.24 to 1.25 and the clusters in the downstream group are on 1.23), you can manually upgrade the clusters in the downstream group to 1.24 or 1.25 to ensure that upgrades proceed.

Advance partially eligible rollouts

If cluster upgrades in a group will not finish because of issues with rollout eligibility (for example, version discrepancies within a group), you can remove clusters that are ineligible for the group's upgrade target from a group to complete the version rollout and begin the soak time or move on to the next group in the rollout sequence. You can also remove a cluster from a group for other reasons, for example if this cluster's usage is no longer related to the other clusters in the group.

Follow the instructions to unregister a cluster from a fleet or remove clusters from team scopes, depending on the type of rollout sequence.

After you have removed all clusters which are preventing a group's version rollout from being completed, the group's version rollout will complete. Confirm this by following the instructions to Check the status of a version rollout.

What's next