Stay organized with collections
Save and categorize content based on your preferences.
Observability refers to system monitoring, logging, alerting, and other tracking
information for viewing the status and health of infrastructure and services.
Observability components of Google Distributed Cloud (GDC) air-gapped appliance collect logs and
metrics that become visible in Grafana dashboards and that you can query to spot
operational issues.
Platform Administrators can use the Observability platform to monitor system
and user clusters and visualize logs and metrics in the Grafana user interface
(UI). Application Operators can collect monitoring and operational data in the
form of logs, metrics, and events for their applications.
The Observability platform deploys its stack components in the admin and user
clusters. The Grafana instance for Platform Administrators includes
organization-level metrics, such as CPU utilization and storage consumption, and
alerts, logs, and metrics from the operable components of admin, system, and
user clusters in GDC.
The Grafana instance for Application Operators does not include any default
dashboards or logs for your project. When you create dashboards, they are
visible only when you enable metrics collection for your project.
Platform components
The GDC monitoring and logging stacks include open source services as part of the Observability platform. These services collect logs from Kubernetes Pods, bare metal machines, network switches, and storage appliances.
Review the following table for details on each Observability component.
Component
Type
Cluster
Description
anthos-prometheus-k8s
StatefulSet
System only
Prometheus (https://prometheus.io/docs/introduction/overview ):
A time-series database for collecting and storing metrics and evaluating alerts. It adds labels as key-value pairs and collects metrics from Kubernetes nodes, Pods, bare metal machines, network switches, and storage appliances. The database stores metrics from the user cluster in the same cluster and aggregates metrics from all clusters in the admin cluster.
grafana
StatefulSet
System only
Grafana (https://grafana.com/docs/grafana/latest/):
A user interface for visualizing dashboards of metrics and alerts. View metrics that Prometheus collects and query logs from Loki. It lets users visualize dashboards of metrics and alerts.
alertmanager
StatefulSet
System only
Alertmanager (https://prometheus.io/docs/alerting/latest/alertmanager/):
A user-defined manager that sends alerts when logs or metrics indicate that system components are failing or are not operating normally. It manages Prometheus alerts routing, silencing, and aggregation.
Loki:
A secondary instance for collecting long-term logs necessary for audit purposes. It aggregates logs from all clusters.
anthos-log-forwarder
DaemonSet
All clusters
Fluent Bit (https://docs.fluentbit.io/manual):
A processor that pulls logs from various components and injects them into Loki. It gathers logs from various locations and then processes and forwards them. It runs on every node of all clusters.
anthos-audit-logs-forwarder
DaemonSet
All clusters
Fluent Bit:
A secondary instance for loading longer living logs for audit purposes.
audit-log-failure-detector
DaemonSet
All clusters
A GDC component that detects and reports audit log collection failures. It runs on every node of all clusters.
logmon-operator
Deployment
All clusters
The GDC Logmon operator that deploys Observability stack components.
GDC also leverages custom resources that GKE Enterprise developed for configuring logging and monitoring. These custom resources let you configure Prometheus scrape targets and alerting rules, Alertmanager configurations, Grafana dashboards, and Logs scrape targets.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eObservability in Google Distributed Cloud (GDC) air-gapped appliance involves monitoring, logging, and alerting to track the health of infrastructure and services, making operational issues more visible.\u003c/p\u003e\n"],["\u003cp\u003ePlatform Administrators use Grafana dashboards to monitor system and user clusters, while Application Operators collect application logs, metrics, and events.\u003c/p\u003e\n"],["\u003cp\u003eThe Observability platform deploys its components across admin and user clusters, with the Platform Administrator's Grafana instance providing organization-level metrics, alerts, and logs from multiple clusters.\u003c/p\u003e\n"],["\u003cp\u003eThe Observability platform uses open-source services, such as Prometheus, Grafana, Loki, and Alertmanager, to collect metrics and logs from Kubernetes Pods, bare metal machines, network switches, and storage appliances.\u003c/p\u003e\n"],["\u003cp\u003eGDC also uses custom resources for configuring logging and monitoring features such as Prometheus scrape targets, alerting rules, Grafana dashboards, and Logs scrape targets.\u003c/p\u003e\n"]]],[],null,["# Observability overview\n\nObservability refers to system monitoring, logging, alerting, and other tracking\ninformation for viewing the status and health of infrastructure and services.\nObservability components of Google Distributed Cloud (GDC) air-gapped appliance collect logs and\nmetrics that become visible in Grafana dashboards and that you can query to spot\noperational issues.\n\nPlatform Administrators can use the Observability platform to monitor system\nand user clusters and visualize logs and metrics in the Grafana user interface\n(UI). Application Operators can collect monitoring and operational data in the\nform of logs, metrics, and events for their applications.\n\nThe Observability platform deploys its stack components in the admin and user\nclusters. The Grafana instance for Platform Administrators includes\norganization-level metrics, such as CPU utilization and storage consumption, and\nalerts, logs, and metrics from the operable components of admin, system, and\nuser clusters in GDC.\n\nThe Grafana instance for Application Operators does not include any default\ndashboards or logs for your project. When you create dashboards, they are\nvisible only when you enable metrics collection for your project.\n\nPlatform components\n-------------------\n\nThe GDC monitoring and logging stacks include open source services as part of the Observability platform. These services collect logs from Kubernetes Pods, bare metal machines, network switches, and storage appliances.\n\nReview the following table for details on each Observability component.\n| **Important:** To access the URLs listed on this page, you must connect to the internet. The URLs are provided for use when you have such access.\n\nGDC also leverages custom resources that GKE Enterprise developed for configuring logging and monitoring. These custom resources let you configure Prometheus scrape targets and alerting rules, Alertmanager configurations, Grafana dashboards, and Logs scrape targets."]]