Stopping and resuming syncing configs

In some situations, you may need to quickly stop Anthos Config Management from syncing configs from your repo. One such scenario is if someone commits a syntactically valid but incorrect config to the repo, and you want to limit its effects on your running clusters while the config is removed or fixed.

This topic shows how to quickly stop syncing, and how to resume syncing when the problem is fixed.

Prerequisites

The user running the commands discussed in this topic needs the following Kubernetes RBAC permissions in the kube-system and config-management-system namespaces on all clusters where you want to stop syncing:

- apiGroups: ["extensions"]
  resources: ["deployments", "deployments/scale"]
  verbs: ["get", "update"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "watch"]

Stopping syncing

To stop syncing, run the following commands, which are provided as a single command for your convenience, but can also be run separately:

kubectl -n kube-system scale deployment config-management-operator --replicas=0 \
&& kubectl wait -n kube-system --for=delete pods -l k8s-app=config-management-operator \
&& kubectl -n config-management-system scale deployment syncer --replicas=0 \
&& kubectl wait -n config-management-system --for=delete pods -l app=syncer \
&& kubectl -n config-management-system scale deployment git-importer --replicas=0 \
&& kubectl wait -n config-management-system --for=delete pods -l app=git-importer

The commands do the following, in sequence. If a command fails, the remaining commands do not run.

  1. Reduce the replicas count in the Config Management Operator Deployment to 0.
  2. Reduce the replicas count in the syncer Deployment to 0.
  3. Reduce the replicas count in the git-importer Deployment to 0.

All deployments are still in the cluster, but no replicas of the Operator, git-importer, or syncer are available, so configs are not synced from the repo.

Stopping syncing on all enrolled clusters

If you need to stop syncing on all enrolled clusters in a single Google Cloud project, rather than a single cluster at a time, you can create a script that uses the nomos status command to get the list of all enrolled clusters. The script then creates a kubectl context for each cluster using the gcloud container clusters get-credentials command and runs the above commands on each of them. The following is a naive example of such a script:

#!/bin/bash

nomos status |grep SYNCED | awk {'print $1'} |while read i; do

  gcloud container clusters get-credentials "$i"

  kubectl -n kube-system scale deployment config-management-operator --replicas=0 \
  && kubectl wait -n kube-system --for=delete pods -l k8s-app=config-management-operator \
  && kubectl -n config-management-system scale deployment syncer --replicas=0 \
  && kubectl wait -n config-management-system --for=delete pods -l app=syncer \
  && kubectl -n config-management-system scale deployment git-importer --replicas=0 \
  && kubectl wait -n config-management-system --for=delete pods -l app=git-importer
done

Resuming syncing

To resume syncing, run the following command:

kubectl -n kube-system scale deployment config-management-operator --replicas=1

This command scales the Operator Deployment to 1 replica. The Operator then notices that the syncer and git-importer Deployments are scaled incorrectly and scales them to 1 replica as well.

Resuming syncing on all enrolled clusters

If you need to resume syncing on all enrolled clusters in a Google Cloud project, rather than a single cluster at a time, you can create a script that uses nomos status to get the list of all enrolled clusters. The script then creates a kubectl context for each cluster using the gcloud container clusters get-credentials command and runs the above command on each of them. The following is a naive example of such a script:

#!/bin/bash

nomos status |grep SYNCED | awk {'print $1'} |while read i; do

  gcloud container clusters get-credentials "$i"

  kubectl -n kube-system scale deployment config-management-operator --replicas=1

done

What's next?