Config Sync metrics

The page describes the OpenTelemetry metrics available to monitor your Config Sync resources.

Config Sync uses OpenCensus to create and record metrics and OpenTelemetry to export its metrics to Prometheus and Cloud Monitoring. You can export OpenTelemetry metrics to another monitoring system. The following guides explain how to export metrics:

OpenTelemetry metrics

Config Sync and the Resource Group Controller collect the following metrics with Opencensus and make them available through OpenTelemetry collector. The Tags column lists Config Sync specific tags that are applicable to each metric. Metrics with tags represent multiple measurements, one for each combination of tag values.

Config Sync metrics

The following metrics are available in all supported Anthos Config Management versions.

Name Type Tags Description
api_duration_seconds Distribution operation, status The latency distribution of API server calls
apply_duration_seconds Distribution status The latency distribution of applying resources declared from source of truth to a cluster
apply_operations_total Count operation, status, controller The total number of operations that have been performed to sync resources from source of truth to a cluster
declared_resources Last Value The number of declared resources parsed from Git
internal_errors_total Count source The total number of internal errors encountered by Config Sync
last_sync_timestamp Last Value commit, status The timestamp of the most recent sync from Git
parser_duration_seconds Distribution status, trigger, source The latency distribution of different stages involved in syncing from source of truth to a cluster
pipeline_error_observed Last Value name, reconciler, component The status of RootSync and RepoSync custom resources. A value of 1 indicates a failure.
reconcile_duration_seconds Distribution status The latency distribution of reconcile events handled by the reconciler manager.
reconciler_errors Last Value component, errorclass The number of errors encountered while syncing resources from the source of the truth to a cluster.
remediate_duration_seconds Distribution status The latency distribution of remediator reconciliation events
resource_conflicts_total Count The total number of resource conflicts resulting from a mismatch between the cached resources and cluster resources
resource_fights_total Count The total number of resources that are being synced too frequently. Any result higher than zero indicates a problem. For more information, see KNV2005: ResourceFightWarning.

Resource Group Controller metrics

The Resource Group Controller is a component in Config Sync that keeps track of the managed resources and checks if each individual resource is ready or reconciled. The following metrics are available.

Name Type Tags Description
reconcile_duration_seconds Distribution stallreason The distribution of time taken to reconcile a ResourceGroup CR
resource_group_total Last Value The current number of ResourceGroup CRs
resource_count_total Sum The total number of resources tracked by all ResourceGroup CRs in the cluster
resource_count Last Value resourcegroup The total number of resources tracked by a ResourceGroup
ready_resource_count_total Sum The total number of resources ready across all ResourceGroup CRs in the cluster
ready_resource_count Last Value resourcegroup The total number of ready resources in a ResourceGroup
resource_ns_count Last Value resourcegroup The number of namespaces used by resources in a ResourceGroup
cluster_scoped_resource_count Last Value resourcegroup The number of cluster scoped resources in a ResourceGroup
crd_count Last Value resourcegroup The number of CRDs in a ResourceGroup
kcc_resource_count_total Sum The total number of Config Connector resources across all ResourceGroup CRs in the cluster
kcc_resource_count Gauge resourcegroup The total number of KCC resources in a ResourceGroup
pipeline_error_observed Last Value name, reconciler, component The status of RootSync and RepoSync custom resources. A value of 1 indicates a failure.

Config Sync metric labels

Metric labels can be used to aggregate metric data in Cloud Monitoring and Prometheus. They are selectable from the "Group By" dropdown list in the Monitoring Console.

For more information about Cloud Monitoring label and Prometheus metric label, see the Components of the metric model and Prometheus data model.

Metric labels

The following labels are used by Config Sync and Resource Group Controller metrics.

Name Values Description
operation create, patch, update, delete The type of operation performed
status success, error The execution status of an operation
reconciler rootsync, reposync The type of the Reconciler
source parser, differ, remediator The source of the internal error
trigger retry, watchUpdate, managementConflict, resync, reimport The trigger of an reconciliation event
name The name of reconciler The name of the Reconciler
component parsing, source, sync, rendering, readiness The name of component / stage the reconciliation is currently at
container reconciler, git-sync The name of the container
resource cpu, memory The type of the resource
controller applier, remediator The name of the controller in a root or namespace reconciler
type Any Kubernetes resource, for example ClusterRole, Namespace, NetworkPolicy, Role, and so on. The kind of Kubernetes API
commit ---- The hash of the latest synced commit

Resource labels

Config Sync metrics sent to Prometheus and Cloud Monitoring have the following metric labels set to identify the source Pod:

Name Description
k8s.node.name The name of the Node hosting a Kubernetes Pod
k8s.pod.namespace The namespace of the Pod
k8s.pod.uid The UID of the Pod
k8s.pod.ip The IP of the Pod
k8s.deployment.name The name of the Deployment that owns the Pod

Config Sync metrics sent to Prometheus and Cloud Monitoring from reconciler Pods also have the following metric labels set to identify the RootSync or RepoSync used to configure the reconciler:

Name Description
configsync.sync.kind The kind of resource that configures this reconciler: RootSync or RepoSync
configsync.sync.name The name of the RootSync or RepoSync that configures this reconciler
configsync.sync.namespace The namespace of the RootSync or RepoSync that configures this reconciler

Cloud Monitoring resource labels

Cloud Monitoring Resource labels are used for indexing metrics in storage, which means they have negligible effect on cardinality, unlike metric labels, where cardinality is a significant performance concern. See Monitored Resource Types for more information.

Starting from Config Sync version 1.14.0, Resource Group Controller metrics sent to Cloud Monitoring use the k8s_container resource type instead of the k8s_pod used in previous versions.

Starting from Config Sync version 1.14.1, Config Sync metrics sent to Cloud Monitoring use the k8s_container resource type instead of the k8s_pod used in previous versions.

The k8s_container resource type sets the following resource labels to identify the source Container:

Name Description
container_name The name of the Container
pod_name The name of the Pod
namespace_name The namespace of the Pod
location The region or zone of the cluster hosting the node
cluster_name The name of the cluster hosting the node
project The ID of the project hosting the cluster

Understand the pipeline_error_observed metric

The pipeline_error_observed metric is a metric that can help you quickly identify RepoSync or RootSync CRs that are not in sync or contain resources that are not reconciled to the desired state.

  • For a successful sync by a RootSync or RepoSync, the metrics with all components (rendering, source, sync, readiness) are observed with value 0.

    A screenshot of the pipeline_error_observed metric with all components observed with value 0

  • When the latest commit fails the automated rendering, the metric with the component rendering is observed with value 1.

  • When checking out the latest commit encounters error or the latest commit contains invalid configuration, the metric with the component source is observed with value 1.

  • When a resource fails to be applied to the cluster, the metric with the component sync is observed with value 1.

  • When a resource is applied, but fails to reach its desired state, the metric with the component readiness is observed with value 1. For example, a Deployment is applied to the cluster, but the corresponding Pods are not created successfully.

Checking for Config Sync errors

The following section describes additional methods to check Config Sync's status. For more information about troubleshooting, see Troubleshoot Config Sync.

Use nomos status to display errors

In addition to using metrics to monitor the status of Config Sync on your clusters, you can use the nomos status command which prints errors from all of your clusters on the command line.

Check a config's object status

Config Sync defines two custom Kubernetes objects: ClusterConfig and NamespaceConfig. These objects define a status field which contains information about the change that was last applied to the config and any errors that occurred. For instance, if there is an error in a namespace called shipping-dev, you can check the status of the corresponding NamespaceConfig.

kubectl get namespaceconfig shipping-dev -o yaml

Check an object's token annotation

You may want to know when a managed Kubernetes object was last updated by Config Sync. Each managed object is annotated with the hash of the Git commit when it was last modified, and the path to the config that contained the modification.

kubectl get clusterrolebinding namespace-readers
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    configmanagement.gke.io/source-path: cluster/namespace-reader-clusterrolebinding.yaml
    configmanagement.gke.io/token: bbb6a1e2f3db692b17201da028daff0d38797771
  name: namespace-readers
...

For more information, see labels and annotations.

What's next