Known issues

This page shows you how to resolve known issues with Config Sync.

You can also view the introduction to troubleshooting Config Sync and troubleshoot error messages for additional tips.

Config Sync fails to reconcile with KNV2002 error

If Config Sync is unable to reconcile with a KNV2002 error, it might be due to a known issue caused by a client-go issue. The issue causes an empty list of resources in the external.metrics.k8s.io/v1beta1 API group with an error message from the RootSync or RepoSync object, or the reconciler logs:

KNV2002: API discovery failed: APIServer error: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: received empty response for:
external.metrics.k8s.io/v1beta1

To resolve the issue, upgrade GKE or Config Sync. Both of these versions contain fixes to the client-go issue:

Horizontal Pod autoscaling not working

Config Sync manages all of the fields that you specify in the manifests in your Git repository. There can be resource fights when two controllers try to control the same field and these resource fights happen when you try to use horizontal Pod autoscaling with Config Sync.

If you want to use horizontal Pod autoscaling, you should let the horizontal Pod autoscaler manage the spec.replicas field by removing that field from all manifests in your Git repository. Otherwise, Config Sync tries to revert any changes to what's specified in the repository.

Large number of resources in Git repository

When the Git repository synced to a cluster by a RepoSync or RootSync object contains configuration for more than a few thousand resources, it can cause the ResourceGroup to exceed the etcd object size limit. When this happens, you cannot view the aggregated status for resources in your Git repository. While you won't be able to view the aggregated status, your repository might be still synced.

If you see the following error from the RootSync, RepoSync object or the reconciler logs, that means the ResourceGroup resource exceeds the etcd object size limit.

KNV2009: etcdserver: request is too large

To solve this issue, we recommend that you split your Git repository into multiple repositories. To learn more, see Break up a repository into multiple repositories.

If you are not able to break up the Git repository, in Config Sync v1.11.0 and later, you can mitigate the issue by disabling the surfacing of status data. You can do this by setting the field .spec.override.statusMode of the RootSync or RepoSync object to disabled. By doing so, Config Sync stops updating the managed resources status in the ResourceGroup object. It reduces the size of the ResourceGroup object. However, you cannot view the status for managed resources from either nomos status or gcloud alpha anthos config sync anymore.

If you installed Config Sync using the Google Cloud console or Google Cloud CLI, create an editable RootSync object so that you can set spec.override.statusMode. For details, see Configure Config Sync with kubectl commands.

If you don't see any error from the RootSync or RepoSync object, that means your Git repository is synced to the cluster. To check if the ResourceGroup resource exceeds the etcd object size limit, check both the ResourceGroup resource status and the log of the ResourceGroup controller:

  1. Check the ResourceGroup status:

    • To check the RootSync object, run the following command:

      kubectl get resourcegroup.kpt.dev root-sync -n config-management-system
      
    • To check the RepoSync object, run the following command:

      kubectl get resourcegroup.kpt.dev repo-sync -n NAMESPACE
      

      Replace NAMESPACE with the namespace that you created your namespace repository in.

    The output is similar to the following example:

    NAME        RECONCILING   STALLED   AGE
    root-sync   True          False     35m
    

    If the value in the RECONCILING column is True, it means that the ResourceGroup resource is still reconciling.

  2. Check the logs for the ResourceGroup controller:

    kubectl logs deployment/resource-group-controller-manager -c manager -n resource-group-system
    

    If you see the following error in the output, the ResourceGroup resource is too large and exceeds the etcd object size limit:

    "error":"etcdserver: request is too large"
    

To prevent the ResourceGroup from getting too large, reduce the number of resources in your Git repository. You can follow the instructions to split one root repository into multiple root repositories.

PersistentVolumeClaim is in Lost status

When upgrading a Kubernetes cluster to version between 1.22 and 1.24, there is a possibility that managed PersistentVolumeClaims could result in a Lost status. This occurs when the binding of PersistentVolumes and PersistentVolumeClaims are defined using the claimRef field within a PersistentVolume resource. Upstream Kubernetes changes made the claimRef field to be atomic, which causes this bug to occur as it prevents different field owners for the different claimRef subfields when using server-side apply. This issue was fixed in Kubernetes version 1.25.

If you encounter this issue, we recommend updating your PersistentVolume and PersistentVolumeClaim resources to use an alternative method of binding. The binding can instead be set within the spec.volumeName of the PersistentVolumeClaim resource.

The following is a minimal example of binding staging-pvc to staging-pv:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: staging-pvc
  namespace: staging
spec:
  volumeName: staging-pv
  ...

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: staging-pv
spec:
  ...

Using volumeName instead of claimRef for binding does not guarantee any binding privileges to the PersistentVolume.

ResourceGroup fields keep changing

For each Git repository synced to the cluster, the reconciliation status of all resources is aggregated in a resource called ResourceGroup. For each RootSync or RepoSync object, a ResourceGroup is generated to capture the set of resources applied to the cluster and aggregate their statuses.

Occasionally, your ResourceGroup can enter a loop that keeps updating the ResourceGroup's spec. If this happens, you might notice the following issues:

  • The metadata.generation of a ResourceGroup keeps increasing in a short period of time.
  • The ResourceGroup spec keeps changing.
  • The ResourceGroup spec doesn't include the status.resourceStatuses of the resources being synced to the cluster.

If you observe these issues, it means that some of the resources in your Git repositories were not applied to the cluster. The cause of these issues is that you are missing the permissions that you need to apply those resources.

You can verify that the permissions are missing by getting the RepoSync resource status:

kubectl get reposync repo-sync -n NAMESPACE -o yaml

Replace NAMESPACE with the namespace that you created your namespace repository in.

You can also use the nomos status command.

If you see the following messages in the status, it means that the reconciler in NAMESPACE lacks the permission needed to apply the resource:

errors:
  - code: "2009"
    errorMessage: |-
      KNV2009: deployments.apps "nginx-deployment" is forbidden: User "system:serviceaccount:config-management-system:ns-reconciler-     default" cannot get resource "deployments" in API group "apps" in the namespace "default"

      For more information, see https://g.co/cloud/acm-errors#knv2009

To fix this issue, you need to declare a RoleBinding configuration that grants the ns-reconciler-NAMESPACE service account permission to manage the failed resource in that namespace. Details on how to add a RoleBinding are included in Configure syncing from multiple repositories.

Webhook denies request to update or delete resource managed by deleted RootSync or RepoSync

Deleting a RootSync or RepoSync object does not clean up Config Sync annotations and labels, and the Config Sync admission webhook denies requests trying to modify or delete these resources if Config Sync is still enabled in the cluster.

If you want to keep these managed resources, unmanage these resources by setting the configmanagement.gke.io/managed annotation to disabled on every managed resource declared in the Git repository. This removes the Config Sync annotations and labels from managed resources, but does not delete these resources. After the sync is complete, you can remove the RootSync or RepoSync object.

If you want to delete these managed resources, delete the managed resources by modifying the RootSync or RepoSync object to sync from an empty Git directory. After the sync is complete, you can remove the RootSync or RepoSync object.

If the RootSync or RepoSync object was deleted before unmanaging or deleting managed resources, you can add the RootSync or RepoSync object back, unmanage or delete managed resources, and then delete the RootSync or RepoSync object again.

What's next