Rotate your cluster credentials


This page explains how to perform a credential rotation in Google Kubernetes Engine (GKE) clusters.

About credential rotations in GKE

The cluster root Certificate Authority (CA) has a limited lifetime. When the CA expires, any credentials that were signed by the CA are no longer valid, including the cluster client certificate (from the MasterAuth API field), the key and certificate for the API server, and the kubelet client certificates. For details, see Cluster root CA lifetime.

You can perform a credential rotation to revoke and issue new credentials for your cluster. This operation rotates the cluster CA private key and requires re-creation of nodes to use new credentials. You must start and finish a credential rotation for your cluster before your current credentials expire. In addition to rotating credentials, credential rotation also performs an IP rotation.

When to perform a credential rotation

You should perform credential rotations regularly and in advance of your current credential expiry date. Credential rotations require node re-creation to use the new credentials, which might be disruptive to running workloads. Plan maintenance periods and perform the rotations during maintenance windows to avoid unexpected workload downtime or unresponsive API clients outside the cluster.

Find clusters with expiring or expired credentials

If your cluster's credentials will expire in the next 180 days, or your cluster's credentials have already expired, GKE delivers guidance with an insight and recommendation to explain that you must perform a credential rotation for this cluster. This guidance includes the date of the expiration of the credentials. You can view this guidance in the Google Cloud console. Or, you can view this guidance with the gcloud CLI, or the Recommender API, specifying the CLUSTER_CA_EXPIRATION subtype.

If you receive an insight and recommendation for a cluster, you must perform a credential rotation, or GKE automatically starts a credential rotation within 30 days of the current CA expiry date, as explained in the next section.

GKE automation policy to prevent cluster outages

To prevent your cluster from entering an unrecoverable state if your current credentials expire, GKE automatically starts a credential rotation within 30 days of your current CA expiry date. For example, if your cluster CA expires on January 6, 2024 and you don't rotate your credentials by December 5, 2023, GKE starts an automatic rotation on or after December 7, 2023, and completes this rotation seven days after the operation starts. This automatic rotation is a last-resort attempt to prevent a cluster outage, and has the following considerations:

  • Automatic rotations ignore any configured maintenance windows or maintenance exclusions
  • When the credential rotation completes, the expiring credentials are revoked. Kubernetes API clients outside the cluster, like kubectl in local environments, won't work until you configure the clients to use the new credentials
  • Node pool re-creations during the rotation might cause disruptions to running workloads

GKE-initiated automatic rotations are a last-resort outage prevention measure. Don't rely only on these automatic rotations—they're a preventative emergency measure to avoid complete outages.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Check credential lifetime

We recommend that you check your credential lifetime before and after you perform a credential rotation so that you know the validity of your cluster root CA.

To check the credential lifetime for a single cluster, run the following command:

gcloud container clusters describe CLUSTER_NAME \
    --region REGION_NAME \
    --format "value(masterAuth.clusterCaCertificate)" \
    | base64 --decode \
    | openssl x509 -noout -dates

The output is similar to the following:

notBefore=Mar 17 16:45:34 2023 GMT
notAfter=Mar  9 17:45:34 2053 GMT

To check the credential lifetime for all clusters in a project, run the following command:

gcloud container clusters list --project PROJECT_ID \
    | awk 'NR>1 {print "echo; echo Validity for cluster " $1 " in location " $2 ":;\
         gcloud container clusters describe --project PROJECT_ID " $1 " --location " $2 " \
         --format \"value(masterAuth.clusterCaCertificate)\" \
         | base64 --decode | openssl x509 -noout -dates"}' \
    | bash

Perform a credential rotation

Credential rotation involves the following steps:

  1. Start the rotation: the control plane starts serving on a new IP address in addition to the original IP address. New credentials are issued to workloads and the control plane.
  2. Recreate nodes: GKE recreates cluster nodes so that the nodes use the new IP address and credentials, respecting availability from maintenance windows and exclusions.
  3. Update API clients: after starting the rotation, update any cluster API clients, such as development machines using kubectl, to communicate with the control plane using the new IP address.
  4. Complete the rotation: the control plane stops serving traffic over the original IP address. Old credentials are revoked, including any existing static credentials for Kubernetes ServiceAccounts.

Start the rotation

To start a credential rotation, run the following command:

gcloud container clusters update CLUSTER_NAME \
    --region REGION_NAME \
    --start-credential-rotation

This command creates new credentials, issues these credentials to the control plane, and configures the control plane to serve on two IP addresses: the original IP address and a new IP address.

Recreate nodes

After reconfiguring the API server to serve on a new IP address, GKE automatically updates your nodes to use the new IP address and credentials if there is maintenance availability. GKE upgrades all of your nodes to the same GKE version that the nodes already run, which recreates the nodes. For more information, refer to Node pool upgrades.

By default, GKE automatically completes credential rotations seven days after you start the operation. If an active maintenance window or exclusion in your cluster prevents GKE from recreating some nodes during this seven day period, the credential rotation fails to complete.

  • If you use maintenance exclusions or maintenance windows that could result in a failed rotation, manually upgrade your cluster to force node recreation:

    gcloud container clusters upgrade CLUSTER_NAME \
        --location=LOCATION \
        --cluster-version=VERSION
    

    Replace VERSION with the same GKE version that the cluster already uses.

    For more information, see caveats for maintenance windows.

Check the progress of node pool recreation

  1. To monitor the rotation operation, run the following command:

    gcloud container operations list \
        --filter="operationType=UPGRADE_NODES AND status=RUNNING" \
        --format="value(name)"
    

    This command returns the operation ID of the node upgrade operation.

  2. To poll the operation, pass the operation ID to the following command:

    gcloud container operations wait OPERATION_ID
    

Node pools are recreated one-by-one, and each has its own operation. If you have multiple node pools, use these instructions to poll each operation.

Update API clients

After starting the credential rotation, you must update all API clients outside the cluster (such as kubectl on developer machines) to use the new credentials and point to the new IP address of the control plane.

To update your API clients, run the following command for each client:

gcloud container clusters get-credentials CLUSTER_NAME \
    --region REGION_NAME

Update Kubernetes ServiceAccount credentials

If you use static credentials for ServiceAccounts in your cluster, switch to short-lived credentials. Completing the rotation invalidates existing ServiceAccount credentials. If you don't want to use short-lived credentials, ensure that you recreate your static credentials for all ServiceAccounts in the cluster after you complete the rotation.

Complete the rotation

After updating API clients outside the cluster, complete the rotation to configure the control plane to serve only with the new credentials and the new IP address:

gcloud container clusters update CLUSTER_NAME \
    --region=REGION_NAME \
    --complete-credential-rotation

If the credential rotation fails to complete and returns an error message similar to the following, refer to troubleshooting:

ERROR: (gcloud.container.clusters.update) ResponseError: code=400, message=Node pool "test-pool-1" requires recreation.

What's next