Prevent config drift

Config Sync reduces the risk of "shadow ops" through automatic self-healing, periodic re-sync, and optional drift prevention. When Config Sync detects drift between the cluster and the source of truth, it can either be allowed and quickly reverted or completely rejected.

Self-healing watches managed resources, detects drift from the source of truth, and reverts that drift. Self-healing is always enabled.

Periodic re-sync automatically syncs an hour after the last successful sync, even if no change has been made to the source of truth. Periodic re-sync is always enabled.

While self-healing and periodic re-syncs help remediate drift, drift prevention intercepts requests to change managed objects and validates whether the change should be allowed. If the change doesn't match the source of truth, the change is rejected. Drift prevention is disabled by default. When enabled, drift prevention protects RootSync objects by default, and can also be configured to protect RepoSync objects.

To use drift prevention, you must enable the RootSync and RepoSync APIs.

Before you begin

If you previously installed the Google Cloud CLI, get the latest version by running the gcloud components update command.

Enable drift prevention

You can enable drift prevention by using gcloud CLI. You can't enable drift prevention in the Google Cloud console.

To enable drift prevention, complete the following steps:

Update your apply spec manifest to set the spec.configSync.preventDrift field to true:

applySpecVersion: 1
spec:
  configSync:
    enabled: true
    ... existing content ...
    preventDrift: true

Apply the updated manifest:
```
gcloud beta container fleet config-management apply \
    --membership=MEMBERSHIP_NAME \
    --config=MANIFEST_NAME  \
    --project=PROJECT_ID
```
Replace the following:
- MEMBERSHIP_NAME: the fleet membership name that you chose when you registered your cluster. Get the name with the gcloud container fleet memberships list command.
- MANIFEST_NAME: the name of your apply spec manifest, usually apply-spec.yaml.
- PROJECT_ID: your project ID.

Wait until the Config Sync ValidateWebhookConfiguration object is created by the ConfigManagement Operator:

kubectl get validatingwebhookconfiguration admission-webhook.configsync.gke.io

You should see output similar to the following example:

NAME                                  WEBHOOKS   AGE
admission-webhook.configsync.gke.io   0          2m15s

Commit a new change to the source of truth to be synced so that the root-reconciler Deployment can add webhooks into the Config Sync ValidatingWebhookConfiguration object. An alternative is to delete the root-reconcilier Deployment to trigger a reconciliation. The new root-reconciler Deployment would update the Config Sync ValidatingWebhookConfiguration object.

Wait until the webhook server is ready. The Config Sync admission webhook Deployment log should include serving webhook server. This can take several minutes.

kubectl logs -n config-management-system -l app=admission-webhook --tail=-1 | grep "serving webhook server"

You should see output similar to the following example:

I1201 18:05:41.805531       1 deleg.go:130] controller-runtime/webhook "level"=0 "msg"="serving webhook server"  "host"="" "port"=10250
I1201 18:07:04.626199       1 deleg.go:130] controller-runtime/webhook "level"=0 "msg"="serving webhook server"  "host"="" "port"=10250

Disable drift prevention

When you disable drift prevention, Config Sync deletes all the Config Sync admission webhook resources. Since the Config Sync ValidatingWebhookConfiguration object no longer exists, the Config Sync reconcilers no longer generate the webhook configs for managed resources.

To disable drift prevention, complete the following steps:

Update your apply spec manifest to set the spec.configSync.preventDrift field to false:

applySpecVersion: 1
spec:
  configSync:
    enabled: false
    ... existing content ...
    preventDrift: false

Apply the updated manifest:
```
gcloud beta container fleet config-management apply \
    --membership=MEMBERSHIP_NAME \
    --config=MANIFEST_NAME  \
    --project=PROJECT_ID
```
Replace the following:
- MEMBERSHIP_NAME: the fleet membership name that you chose when you registered your cluster. Get the name with the gcloud container fleet memberships list command.
- MANIFEST_NAME: the name of your apply spec manifest, usually apply-spec.yaml.
- PROJECT_ID: your project ID.

Enable the admission webhook in namespace-scoped sources

Namespace-scoped sources of truth are not fully protected by the webhook. The Config Sync reconciler for each namespace source does not have permission to read or update the ValidatingWebhookConfiguration objects at the cluster level.

This lack of permission results in an error for the namespace reconcilers logs similar to the following example:

Failed to update admission webhook: KNV2013: applying changes to
admission webhook: Insufficient permission. To fix, make sure the reconciler has
sufficient permissions.:
validatingwebhookconfigurations.admissionregistration.k8s.io "admission-
webhook.configsync.gke.io" is forbidden: User "system:serviceaccount:config-
management-system:ns-reconciler-NAMESPACE" cannot update resource
"validatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at
the cluster scope

You can ignore this error if you don't want to use the webhook protection for your namespace-scoped source of truth. However, if you want to use the webhook, grant permission to the reconciler for each namespace-scoped source of truth after you have configured syncing from more than one source of truth. You might not need to perform these steps if a RoleBinding for the ns-reconciler-NAMESPACE already exists with ClusterRole cluster-admin permissions.

In the root source of truth, declare a new ClusterRole configuration that grants permission to the Config Sync admission webhook. This ClusterRole only needs to be defined once per cluster:

# ROOT_SOURCE/cluster-roles/webhook-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: admission-webhook-role
rules:
- apiGroups: ["admissionregistration.k8s.io"]
  resources: ["validatingwebhookconfigurations"]
  resourceNames: ["admission-webhook.configsync.gke.io"]
  verbs: ["get", "update"]

For each namespace-scoped source where the admission webhook permission needs to be granted, declare a ClusterRoleBinding configuration to grant access to the admission webhook:

# ROOT_SOURCE/NAMESPACE/sync-webhook-rolebinding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: syncs-webhook
subjects:
- kind: ServiceAccount
  name: ns-reconciler-NAMESPACE
  namespace: config-management-system
roleRef:
  kind: ClusterRole
  name: admission-webhook-role
  apiGroup: rbac.authorization.k8s.io

Replace NAMESPACE with the namespace that you created your namespace-scoped source in.

Commit the changes to the root source of truth, for example, if syncing from a Git repository:

git add .
git commit -m 'Providing namespace repository the permission to update the admission webhook.'
git push

To verify, use kubectl get to make sure the ClusterRole and ClusterRoleBinding have been created:

kubectl get clusterrole admission-webhook-role
kubectl get clusterrolebindings syncs-webhook

Disable drift prevention for abandoned resources

When you delete a RootSync or RepoSync object, by default Config Sync doesn't modify the resources previously managed by that RootSync or RepoSync object. This can leave behind several labels and annotations that Config Sync uses to track these resource objects. If drift protection is enabled, this can cause any changes to the previously managed resources to be rejected.

If you didn't use deletion propagation, the resource objects left behind might still retain labels and annotations added by Config Sync.

If you want to keep these managed resources, unmanage these resources before deleting the RootSync or RepoSync objects by setting the configmanagement.gke.io/managed annotation to disabled on every managed resource declared in the source of truth. This tells Config Sync to remove its labels and annotations from the managed resources, without deleting these resources. After the sync is complete, you can remove the RootSync or RepoSync object.

If you want to delete these managed resources, you have two options:

Delete the managed resources from the source of truth. Then, Config Sync will delete the managed objects from the cluster. After the sync is complete, you can remove the RootSync or RepoSync object.
Enable deletion propagation on the RootSync or RepoSync object before deleting it. Then, Config Sync will delete the managed objects from the cluster.

If the RootSync or RepoSync object is deleted before unmanaging or deleting its managed resources, you can recreate the RootSync or RepoSync object, and it adopts the resources on the cluster that match the source of truth. Then you can unmanage or delete the resources, wait for the changes to sync, and delete the RootSync or RepoSync object again.

What's next

Learn how to troubleshoot the webhook.