Rotating user cluster certificate authorities

GKE on VMware uses certificates and private keys to authenticate and encrypt connections between system components in user clusters. The admin cluster creates a new set of certificate authorities (CAs) for each user cluster, and uses CA certificates to issue additional leaf certificates for system components. The admin cluster manages distribution of the public CA certificates and leaf certificate key pairs to system components to establish their secure communication.

The user cluster CA rotation feature allows you to trigger a rotation of the core system certificates in a user cluster. During a rotation, the admin cluster replaces the core system CAs for the user cluster with newly generated CAs, and distributes the new public CA certificates and leaf certificate key pairs to user cluster system components. The rotation happens incrementally, so that system components can continue to communicate during the rotation. Note, however, that workloads and nodes are restarted during the rotation.

There are three system CAs managed by the admin cluster for each user cluster:

  • The etcd CA secures communication from the API server to the etcd replicas and also traffic between etcd replicas. This CA is self-signed.
  • The cluster CA secures communication between the API server and all internal Kubernetes API clients (kubelets, controllers, schedulers). This CA is self-signed.
  • The front-proxy CA secures communication with aggregated APIs. This CA is self-signed.

Also, you might be using an org CA to sign the certificate configured by the authentication.sni option. This CA and the SNI certificate are used to serve the Kubernetes API to clients outside the cluster. You manage this CA and manually generate the SNI certificate. Neither this CA nor the SNI certificate is affected by the user cluster CA rotation feature.

Limitations

  • CA certificate rotation is limited to the etcd, cluster, and front-proxy CAs mentioned previously.

  • CA certificate rotation is limited to certificates issued automatically by GKE on VMware. It doesn't update certificates issued manually by an administrator, even if those certificates are signed by the system CAs.

  • A CA rotation restarts the API server, other control-plane processes, and each node in the cluster multiple times. Each stage of a CA rotation progresses similarly to a cluster upgrade. While the user cluster does remain operational during a CA rotation, you should expect that workloads to be restarted and rescheduled. You should also expect brief periods of control-plane downtime if your user cluster does not have a high-availability control plane.

  • You must update the user cluster kubeconfig file and authentication configuration files after a CA rotation. This is because the old cluster certificate is revoked, and the credentials in the kubeconfig file no longer work.

  • After a CA rotation is started, it cannot be paused or rolled-back.

  • A CA rotation might take considerable time to complete, depending on the size of the user cluster.

Perform a CA rotation

  1. Start the rotation:

    gkectl update credentials certificate-authorities rotate \
        --config USER_CLUSTER_CONFIG \
        --kubeconfig ADMIN_CLUSTER_KUBECONFIG
    

    Replace the following:

    • USER_CLUSTER_CONFIGE: the path of the user cluster configuration file

    • ADMIN_CLUSTER_KUBECONFIG: the path of the admin cluster kubeconfig file

    If the CA rotation starts successfully, you see a message similar to this:

    successfully started the CA rotation with CAVersion 2, use gkectl update credentials certificate-authorities status command to view the current state of CA rotation
    

    If a CA rotation is already in progress, you see an error message similar to the this:

    Exit with error:
    admission webhook "vonpremusercluster.onprem.cluster.gke.io" denied the request: requests must not modify CAVersion when cluster is not ready: ready condition is not true: ClusterCreateOrUpdate: Creating or updating user cluster control plane workloads
    
  2. View the status of the rotation:

    gkectl update credentials certificate-authorities status \
        --config USER_CLUSTER_CONFIG \
        --kubeconfig ADMIN_CLUSTER_KUBECONFIG
    

    The preceding command reports the CAVersion, which is an integer the system automatically increments to differentiate the CAs used before and after a rotation. The command also reports a status (True or False) that indicates whether the CA rotation is complete, and a message describing which CAVersion is currently in use by each component of the system.

    If the CA rotation has already completed, you see a message similar to this:

    State of CARotation with CAVersion 2 is -
    status: True,
    reason: CARotationCompleted,
    message: Control plane has CA bundle [2], certs from CA 2, CA 2 is CSR signer. Data plane has CA bundle [2], CA 2 was CSR signer at last restart.
    

    If the CA rotation is still in progress, you see a message similar to this:

    State of CARotation with CAVersion 2 is -
    status: False,
    reason: CARotationProgressed,
    message: Control plane has CA bundle [1 2], certs from CA 2, CA 1 is CSR signer. Data plane has CA bundle [1 2], CA 1 was CSR signer at last restart.
    

Update user cluster credentials

After the CA rotation completes, you must get a new user cluster kubeconfig file from the admin cluster. This is because the CA rotation revokes the CA that the old kubeconfig file was based on.

Get a new kubeconfig file:

kubectl --kubeconfig ADMIN_CLUSTER_KUBECONFIG get secret admin \
    -n USER_CLUSTER_NAME -o jsonpath='{.data.admin\.conf}' \
    | base64 --decode > USER_CLUSTER_NAME-kubeconfig

Distribute the new kubeconfig file to everyone who uses a kubeconfig file to interact with the cluster.

Update authentication configuration files

After the CA rotation completes, authentication configuration files must be updated and redistributed. Follow the linked instructions to update and redistribute these files after the CA rotation:

Control plane certificates rotation

Without rotation, both the user cluster CAs and control-plane certificates expire five years from the date the cluster was created. The user cluster's control-plane certificates are automatically rotated within ten hours of each user cluster upgrade, but the CAs are not automatically rotated. This means a CA rotation must be performed at least once every five years in addition to regular version upgrades.

To prevent a user cluster from becoming unavailable, control-plane certificates are rotated within ten hours following a user cluster upgrade. When this happens, a message appears in the user cluster's CA rotation status.

To view the last version a user cluster has been upgraded to when control-plane certificates were rotated:

gkectl update credentials certificate-authorities status \
--config USER_CLUSTER_CONFIG \
--kubeconfig ADMIN_CLUSTER_KUBECONFIG

The information appears at the end of the message field within ten hours of an upgrade. For example:

Last Leaf Certificates Rotation Version: 1.16.0-gke.0.

Troubleshooting a CA rotation

The gkectl diagnose command supports checking the expected status of a completed CA rotation against a user-cluster. For instructions on how to run gkectl diagnose on a user cluster, see Diagnosing cluster issues. If you experience issues with a CA rotation, contact Google support and provide the gkectl diagnose output.