Stopping and resuming syncing configs

In some situations, you might need to quickly stop Config Sync from syncing configs from your repo. One such scenario is if someone commits a syntactically valid but incorrect config to the repo, and you want to limit its effects on your running clusters while the config is removed or fixed.

This topic is for single repositories and it shows you how to quickly stop syncing, and how to resume syncing when the problem is fixed. To learn how to stop syncing for multiple repositories, see Syncing from multiple repositories.

Prerequisites

The user running the commands discussed in this topic needs the following Kubernetes RBAC permissions in the kube-system and config-management-system namespaces on all clusters where you want to stop syncing:

- apiGroups: ["extensions"]
  resources: ["deployments", "deployments/scale"]
  verbs: ["get", "update"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "watch"]

Stopping syncing

To stop syncing, run the following commands, which are provided as a single command for your convenience, but can also be run separately:

kubectl -n kube-system scale deployment config-management-operator --replicas=0 \
&& kubectl wait -n kube-system --for=delete pods -l k8s-app=config-management-operator \
&& kubectl scale deployment -n config-management-system --replicas=0 --all \
&& kubectl wait -n config-management-system --for=delete pods --all

The commands do the following, in sequence. If a command fails, the remaining commands do not run.

  1. Reduce the replicas count in the Config Sync Operator Deployment to 0.
  2. Reduce the replicas count of all Pods running in the config-management-system namespace to 0. The exact set of Pods affected varies by product version.

All deployments are still in the cluster, but no replicas of the Operator or any of the processes responsible for syncing are available, so configs are not synced from the repo.

Stopping syncing on all enrolled clusters

If you need to stop syncing on all enrolled clusters in a single Google Cloud project, rather than a single cluster at a time, you can create a script that uses the nomos status command to get the list of all enrolled clusters. The script then creates a kubectl context for each cluster using the gcloud container clusters get-credentials command and runs the above commands on each of them. The following is a naive example of such a script:

#!/bin/bash

nomos status |grep SYNCED | awk {'print $1'} |while read i; do

  gcloud container clusters get-credentials "$i"

  kubectl -n kube-system scale deployment config-management-operator --replicas=0 \
  && kubectl wait -n kube-system --for=delete pods -l k8s-app=config-management-operator \
  && kubectl scale deployment -n config-management-system --replicas=0 --all \
  && kubectl wait -n config-management-system --for=delete pods --all

done

Resuming syncing

To resume syncing, run the following command:

kubectl -n kube-system scale deployment config-management-operator --replicas=1

This command scales the Operator Deployment to 1 replica. The Operator then notices that the Pods in the config-management-system namespace Deployments are scaled incorrectly and scales them to their appropriate replica count.

Resuming syncing on all enrolled clusters

If you need to resume syncing on all enrolled clusters in a Google Cloud project, rather than a single cluster at a time, you can create a script that uses nomos status to get the list of all enrolled clusters. The script then creates a kubectl context for each cluster using the gcloud container clusters get-credentials command and runs the above command on each of them. The following is a naive example of such a script:

#!/bin/bash

nomos status |grep SYNCED | awk {'print $1'} |while read i; do

  gcloud container clusters get-credentials "$i"

  kubectl -n kube-system scale deployment config-management-operator --replicas=1

done

What's next