Monitor Config Sync in multi-repo mode

The page describes the additional ways that you can monitor your resources when you are using Config Sync to sync from multiple repositories. Syncing from multiple repositories is enabled by default if you used the Google Cloud Console or the gcloud command-line tool to install Config Sync with a version of 1.7.0 or later. If you installed Config Sync using kubectl, set spec.enableMultiRepo to true in your ConfigManagement object.

When you enable multi-repo mode, Config Sync uses OpenCensus to create and record metrics and OpenTelemetry to export its metrics to Prometheus and Cloud Monitoring. You can also use OpenTelemety to export metrics to a custom monitoring system. This process provides with you with three ways to monitor your resources:

  1. Prometheus: To use Prometheus, see Monitoring Config Sync using Prometheus.
  2. Cloud Monitoring: To use Cloud Monitoring, see the Monitoring resources with Cloud Monitoring section.
  3. Custom monitoring system: To use a custom monitoring system, see the Configuring a custom OpenTelemetry exporter section.

Available metrics

Config Sync collects the following metrics and makes them available to OpenTelemetry. The Tags column lists all tags that are applicable to each metric. Metrics with tags represent multiple measurements, one for each combination of tag values.

Name Type Tags Description
api_duration_seconds Distribution reconciler, operation, type, status The latency distribution of API server calls
apply_duration_seconds Distribution reconciler, status The latency distribution of applier resource sync events
apply_operations_total Count reconciler, operation, type, status The total number of operations that have been performed to sync resources to source of truth
declared_resources Last Value reconciler The number of declared resources parsed from Git
internal_errors_total Count reconciler, source The total number of internal errors triggered by Config Sync
last_apply_timestamp Last Value reconciler, status The timestamp of the most recent applier resource sync event
last_sync_timestamp Last Value reconciler The timestamp of the most recent sync from Git
parse_duration_seconds Distribution reconciler, status The latency distribution of parse events
parse_errors_total Count reconciler, errorcode The total number of errors that occurred during parsing
parser_duration_seconds Distribution reconciler, status, trigger, source The latency distribution of the parse-apply-watch loop
reconcile_duration_seconds Distribution status The latency distribution of reconcile events handled by the reconciler manager.
reconciler_errors Last Value reconciler, component The number of errors in the RootSync and RepoSync reconcilers
remediate_duration_seconds Distribution reconciler, type, status The latency distribution of remediator reconciliation events
resource_conflicts_total Count reconciler, type The total number of resource conflicts resulting from a mismatch between the cached resources and cluster resources
resource_fights_total Count reconciler, operation, type The total number of resources that are being synced too frequently. Any result higher than zero indicates a problem. For more information, see KNV2005: ResourceFightWarning.
watch_manager_updates_duration_seconds Distribution reconciler, status The latency distribution of watch manager updates
watches Count reconciler, type The number of watches on the declared resources

Monitor resources with Cloud Monitoring

If Config Sync is running inside a Google Cloud environment that has a default service account, Config Sync automatically exports metrics to Cloud Monitoring.

If Workload Identity is enabled, you need to bind the Kubernetes ServiceAccount default in the namespace config-management-monitoring to a Google service account with the metric writer role with the following command:

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:PROJECT_ID.svc.id.goog[config-management-monitoring/default]" \
  GSA_NAME@PROJECT_ID.iam.gserviceaccount.com

Replace the following:

  • GSA_NAME: the Google service account with the metric writer role
  • PROJECT_ID: your project ID

This action requires iam.serviceAccounts.setIamPolicy permission on the project.

Then annotate the Kubernetes ServiceAccount using the email address of the Google service account.

kubectl annotate serviceaccount \
  --namespace config-management-monitoring \
  default \
  iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com

For examples on how to view these metrics, see the following Example debugging procedures section and the OpenCensus metrics in Cloud Monitoring article.

Example debugging procedures

The following Cloud Monitoring examples illustrate some patterns for using OpenCensus metrics to detect and diagnose problems related to Config Sync when you are using multi-repo mode.

Metric format

In Cloud Monitoring, metrics have the following format: custom.googleapis.com/opencensus/config_sync/METRIC.

This metric name is composed of the following components:

  • custom.googleapis.com: all custom metrics have this prefix
  • opencensus: this prefix is added because Config Sync uses the OpenCensus library
  • config_sync/: metrics that Config Sync exports to Cloud Monitoring have this prefix
  • METRIC: the name of the metric that you want to query

Query metrics by reconciler

RootSync and RepoSync objects are instrumented with high-level metrics that give you useful insight into how Config Sync is operating on the cluster. Almost all metrics are tagged by the reconciler name, so you can see if any errors have occurred and can set up alerts for them in Cloud Monitoring.

A reconciler is a Pod that is deployed as a Deployment. It syncs manifests from a Git repository to a cluster. When you create a RootSync object, Config Sync creates a reconciler called root-reconciler. When you create a RepoSync object, Config Sync creates a reconciler called ns-reconciler-NAMESPACE, where NAMESPACE is the namespace you created your RepoSync object in.

The following diagram shows you how reconciler Pods function:

Reconciler flow

For example, to filter by the reconciler name when you are using Cloud Monitoring, complete the following tasks:

  1. In the Google Cloud Console, go to Monitoring:

    Go to Monitoring

  2. In the Monitoring navigation pane, click Metrics explorer.

  3. In the Find resource type and metric box, add: custom.googleapis.com/opencensus/config_sync/reconciler_errors.

  4. In the Filter dropdown list, select root_reconciler. A filter fields box appears.

  5. In the filter fields box, select = in the first field and root-reconciler in the second.

  6. Click Apply.

You can now see metrics for your RootSync objects.

For more instructions on how to filter by a specific data type, see Filtering the data.

Query Config Sync operations by component and status

In multi-repo mode, importing and sourcing from a Git repository and syncing to a cluster is handled by the reconcilers. The reconciler_errors metric is labeled by component so you can see where any errors occurred.

For example, to filter by component when you are using Cloud Monitoring, complete the following tasks:

  1. In the Google Cloud Console, go to Monitoring:

    Go to Monitoring

  2. In the Monitoring navigation pane, click Metrics explorer.

  3. In the Find resource type and metric box, add: custom.googleapis.com/opencensus/config_sync/reconciler_errors.

  4. In the Filter dropdown list, select component. A filter fields box appears.

  5. In the filter fields box, select = in the first box and source in the second.

  6. Click Apply.

You can now see errors that occurred when sourcing from a Git repository for your reconcilers.

You can also check the metrics for the source and sync processes themselves by querying the following metrics and filtering by the status tag:

custom.googleapis.com/opencensus/config_sync/parse_duration_seconds
custom.googleapis.com/opencensus/config_sync/apply_duration_seconds
custom.googleapis.com/opencensus/config_sync/remediate_duration_seconds

Configure a custom OpenTelemetry exporter

If you want to send your metrics to a different monitoring system, you can modify the OpenTelemetry configuration. For a list of supported monitoring systems, see OpenTelemetry Collector Exporters and OpenTelemetry Collector-Contrib Exporters.

OpenTelemetry monitoring resources are managed in a separate config-management-monitoring namespace. To configure a custom OpenTelemetry exporter for use with Config Sync, you need to create a ConfigMap with the name otel-collector-custom in that config-management-monitoring namespace. The ConfigMap should contain a otel-collector-config.yaml key and the value should be the file contents of the custom OpenTelemetry Collector configuration. For more information on the configuration options, see the OpenTelemetry Collector configuration documentation.

The following ConfigMap is an example of a ConfigMap with a custom logging exporter:

# otel-collector-custom-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-custom
  namespace: config-management-monitoring
  labels:
    app: opentelemetry
    component: otel-collector
data:
  otel-collector-config.yaml: |
    receivers:
      opencensus:
    exporters:
      logging:
        logLevel: debug
    processors:
      batch:
    extensions:
      health_check:
    service:
      extensions: [health_check]
      pipelines:
        metrics:
          receivers: [opencensus]
          processors: [batch]
          exporters: [logging]

All custom configurations must define an opencensus receiver and metrics pipeline. The rest of the fields are optional and configurable, but we recommend that you include a batch processor and health check extension like in the example.

After you have created the ConfigMap, use kubectl to create the resource:

kubectl apply -f otel-collector-custom-cm.yaml

The OpenTelemetry Collector Deployment picks up this ConfigMap and automatically restarts to apply the custom configuration.