Monitoring Config Connector with Prometheus

You can use Prometheus to collect and show metrics from Config Connector.

Before you begin

To complete the steps in this doc, you need to install Prometheus on your cluster. You can do this by:

Scraping metrics

Prometheus gathers metrics by scraping an HTTP endpoint. This section describes the Config Connector scrape endpoints and configuring Prometheus.

Config Connector scrape endpoints

For Config Connector, service endpoints are on port 8888 at cnrm-controller-manager-service and cnrm-resource-stats-recorder-service. These services have the prometheus.io/scrape: "true" and prometheus.io/port: "8888" annotations. The services also include the labels cnrm.cloud.google.com/monitored: "true" and cnrm.cloud.google.com/system: "true". You configure Prometheus to scrape Config Connector components from these annotations and labels.

Configuring Prometheus

Before you can scrape metrics, you might need to configure Prometheus for Kubernetes Service Discovery (SD) to discover scrape targets from the Kubernetes REST API.

How you configure Prometheus to scrape metrics from Config Connector depends on how you installed Prometheus. Check the documentation for your Prometheus installation for how to update configuration.

For example, If you are using the Prometheus Operator and configure it with ServiceMonitor, you can apply the following configuration to scrape metrics from Config Connector.

  spec:
    endpoints:
    - interval: 10s
      port: metrics
    namespaceSelector:
      matchNames:
      - cnrm-system
    selector:
      matchLabels:
        cnrm.cloud.google.com/monitored: "true"
        cnrm.cloud.google.com/system: "true"

For more information on setting up ServiceMonitor, see Getting started in the Prometheus Operator documentation.

Example queries

Prometheus uses a query language called PromQL. This section contains example PromQL queries for Config Connector's metrics. You can configure Prometheus to generate alerts from any valid PromQL query.

Query reconcile requests by resource kind and status

You can see the count of failed reconcile requests by resource kind and status.

For example, to check the total count of errors for PubSubTopic resources, use the following query.

configconnector_reconcile_requests_total{group_version_kind="PubSubTopic.pubsub.cnrm.cloud.google.com", status="ERROR"}

Check the aggregate status of resources by kind and Namespace

You can see the count of resources that are in a Namespace.

For example, to check the total count of PubSubTopic resources in the default Namespace, use the following query.

configconnector_applied_resources_total{group_version_kind="PubSubTopic.pubsub.cnrm.cloud.google.com", Namespace="default"}

Query the utilization of reconcile workers per resource kind

(configconnector_reconcile_occupied_workers_total / configconnector_reconcile_workers_total)

Available metrics

Config Connector collects the following metrics and makes them available.

Name Type Labels Description
configconnector_reconcile_requests_total Counter Namespace
group_version_kind
status
The total number of reconcile requests.
configconnector_reconcile_request_duration_seconds Histogram Namespace
group_version_kind
status
Distributions of the time to complete reconcile requests.
configconnector_applied_resources_total Gauge Namespace
group_version_kind
status
The total number of resources under management.
configconnector_build_info Gauge version Current version of Config Connector.
configconnector_reconcile_workers_total Gauge group_version_kind The total number of reconcile workers aggregated by resource kind.
configconnector_reconcile_occupied_workers_total Gauge group_version_kind The total number of occupied reconcile workers aggregated by resource kind.

Enabling resource name labels

By default, metrics are aggregated on resource kind (for example, PubSubTopic). You can enable metrics to be aggregated by individual resources.

To enable aggregation by resource name, perform the following steps.

  1. Edit the cnrm-controller-manager StatefulSet object

    kubectl edit statefulset cnrm-controller-manager -n cnrm-system

    If you are using Namespaced mode, replace [NAMESPACE_NAME] with your Namespace.

    kubectl edit statefulset cnrm-controller-manager -n [NAMESPACE_NAME]
  2. Find the array spec.args and add --resource-name-label=true.

    apiVersion: apps/v1
    kind: StatefulSet
      name: cnrm-controller-manager
    spec:
      template:
        spec:
          containers:
          - name: manager
      args:
            - --resource-name-label=true
    

What's Next?

For more information on writing queries, see the Prometheus Query language documentation and Examples