Observing hierarchical workloads

Hierarchy Controller and Cloud Logging can work together to give you better observability into your workloads based on the hierarchical structure of your cluster's namespaces.

Cloud Logging lets you analyze or route logs from your workloads according to multiple criteria, such as cluster name, Kubernetes namespace, or Kubernetes Pod labels. Hierarchy Controller extends these criteria to include both hierarchical and abstract namespaces by propagating your cluster's tree labels to your Pods, making them available to Logging.

For example, consider a workload running in a namespace from the example repository.

Hierarchy Controller lets you select logs generated by any workload that is a descendant of shipping-app-backend, online, or any other abstract namespace. This includes not only the workloads in namespaces located in the Git repository such as shipping-prod, but also any Hierarchy Controller subnamespace that you might create as a descendant of those namespaces.

Enabling hierarchical observability

Hierarchical observability is provided by Hierarchy Controller. To enable hierarchical observability, do the following:

  1. In the configuration file for the Config Sync Operator, in the spec.hierarchyController object, set the value of both enabled and enablePodTreeLabels to true:

    # config-management.yaml
    
    apiVersion: configmanagement.gke.io/v1
    kind: ConfigManagement
    metadata:
      name: config-management
    spec:
      # Set to true to enable hierarchical namespaces and logging
      hierarchyController:
        enabled: true
        enablePodTreeLabels: true
      # ...other fields...
    
  2. Apply the configuration:

    kubectl apply -f config-management.yaml
    

    After about a minute, Hierarchy Controller and hierarchical observability become usable on your cluster.

Verifying the installation

When hierarchical observability is enabled, Hierarchy Controller installs a mutating admission webhook to add the tree labels onto the Pods. These labels are then ingested by Cloud Logging and are made available to analyze or route logs. These labels can also be ingested by any other system that understands Kubernetes labels.

To verify that hierarchical logging is enabled:

  1. Start a workload in any namespace, such as the following:

    kubectl run websvr --image=nginx -n default --generator=run-pod/v1
    
  2. Inspect the Pod and verify that it contains the default.tree.hnc.x-k8s.io/depth label:

    kubectl describe pod -n default websvr
    

    Output:

    Name:         websvr
    Namespace:    default
    # ...other fields...
    Labels:       default.tree.hnc.x-k8s.io/depth=0 # This is the Pod tree label
                  run=websvr
    # ...other fields...
    
  3. Clean up the workload:

    kubectl delete pod -n default websvr
    

Using hierarchical workload observability

After Pod tree labels are enabled, they can be used to improve hierarchical workload observability both inside clusters and in other Google Cloud products.

Querying Pods by hierarchy

Any Kubernetes operation that includes a label selector can be used to query Pod tree labels. For example, to view all Pods in all namespaces that are running in a descendant of the default namespace, use the following query:

kubectl get pods --all-namespaces -l default.tree.hnc.x-k8s.io/depth

Output based on the previous workload:

NAMESPACE   NAME     READY   STATUS    RESTARTS   AGE
default     websvr   1/1     Running   0          70s

Filtering logs by hierarchy

Cloud Logging supports a slightly different format for labels than Kubernetes. For example, to search for any workload running in a descendant of the default namespace, instead of searching for the Kubernetes label default.tree.hnc.x-k8s.io/depth, Cloud Logging expects a query similar to the following in the Google Cloud Console:

resource.type="k8s_container" labels.k8s-pod/default_tree_hnc_x-k8s_io/depth!=""

Alternatively, you can use a similar filter in the gcloud command-line tool:

gcloud logging read "resource.type=k8s_container AND labels.k8s-pod/default_tree_hnc_x-k8s_io/depth!=''"

Filtering usage by hierarchy

You can use Pod tree labels to attribute requests and usages from GKE usage metering to namespace trees. Hierarchical usage metering requires that regular GKE usage metering has been enabled:

  1. Enable GKE usage metering on your cluster.
  2. Confirm that the data is being ingested to BigQuery. In the Cloud Console, go to BigQuery.

    Go to BigQuery

  3. Look for gke_cluster_resource_consumption.

  4. Follow the prerequisites for enabling GKE cluster usage metering section to enable visualization for GKE usage metering.

  5. Open Google Data Studio, click Blank Report, and then select BigQuery as the data source.

  6. Select Custom query and search for your project ID. In the text box on the right, enter your customized query. For examples of the custom queries, see the following sections.

Example: total usage of every subtree

SELECT
  REGEXP_EXTRACT(label.key, r"^[a-zA-Z0-9\-]+") as subtree,
  resource_name,
  usage.unit,
  SUM(usage.amount) AS usage_amount
FROM
  `PROJECT_NAME.DATASET_NAME.TABLE_NAME`,
  UNNEST(labels) AS label
WHERE
  regexp_contains(label.key, "tree.hnc.x-k8s.io/depth")
GROUP BY
  subtree,
  resource_name,
  usage.unit
ORDER BY
  resource_name ASC,
  subtree ASC

Sample output:

subtree resource_name unit usage_amount
a cpu seconds 0.09
a1 cpu seconds 0.09
a2 cpu seconds 0
config-management-system cpu seconds 9,252.45
a memory byte-seconds 6,315,303,690,240
a1 memory byte-seconds 1,355,268,587,520
a2 memory byte-seconds 4,960,035,102,720
config-management-system memory byte-seconds 49,986,574,663,680

This output shows you the total cloud resource and usage attributed to namespace a and individually to its descendant namespaces a1 and a2.

Example: usage of a single subtree

The following example shows the total amount of usage under namespace a and all its child namespaces.

SELECT
  resource_name,
  usage.unit,
  SUM(usage.amount) AS usage_amount
FROM
  `PROJECT_NAME.DATASET_NAME.TABLE_NAME`,
  UNNEST(labels) AS label
WHERE
  label.key="SUBTREE_NAME.tree.hnc.x-k8s.io/depth"
GROUP BY
  resource_name,
  usage.unit

Sample output:

resource_name unit usage_amount
cpu seconds 0.09
memory byte-seconds 6,315,303,690,240

Example: usage of all namespaces in a subtree

SELECT
  namespace,
  resource_name,
  SUM(usage.amount) AS usage_amount
FROM
  `PROJECT_NAME.DATASET_NAME.TABLE_NAME`,
  UNNEST(labels) AS label
WHERE
  label.key="SUBTREE_NAME.tree.hnc.x-k8s.io/depth"
GROUP BY
  namespace,
  resource_name

Sample output:

namespace resource_name usage_amount
a2 memory 4,960,035,102,720
a1 memory 1,355,268,587,520
a2 cpu 0
a1 cpu 0.09

Limitations of hierarchical monitoring

The following are limitations of hierarchical monitoring.

Changes to the hierarchy are ignored

Pod tree labels are added to Pods when they are created and are not modified after the Pod starts running. This means that Pods that were started before hierarchical monitoring was enabled do not receive Pod tree labels.

In addition, any Pods whose hierarchy changes after the Pod was started—for example, by using Hierarchy Controller to change the parent of a namespace—will not have its labels updated. While hierarchy modifications are typically quite rare, if this situation occurs and is causing a problem, modify the hierarchy and then restart all Pods.

Pods are still created even if labels cannot be applied

Hierarchical monitoring does not apply to Pods running in key system namespaces such as kube-system or hnc-system. However, the webhook configuration itself has no way to exclude these namespaces. Therefore, if Hierarchy Controller encounters a problem, Pod creation in all namespaces could be impacted.

As a result, rather than risk a cluster-wide outage, if Hierarchy Controller cannot process a Pod within two seconds, the webhook fails and allows the Pod to be created without the labels. Such webhook failures can be monitored via the Kubernetes API server by looking for failures of the podlabel.hierarchycontroller.configmanagement.gke.io mutating admission webhook.

What's next