Starting with Milestone 77, Container-Optimized OS includes the Node Problem Detector agent. You can use this feature to monitor the system health of COS instances. Node Problem Detector monitors the instance health and reports health-related metrics to Cloud Monitoring, including capacity and error metrics that you can then visualize with Google Cloud's operations suite dashboards. Collected metrics from the default configuration are free. Google will use aggregated metrics to understand node problems and improve the reliability of Container-Optimized OS.
The agent is pre-configured with the set of metrics to export. Customizing reported metrics for the built-in agent is not supported at this time. Node Problem Detector is open-source software. You can review its source code and configurations) in their respective source repositories.
Enabling health monitoring
explains the basics of configuring a Container-Optimized OS instance. You can use
enable health monitoring with below
#cloud-config bootcmd: - systemctl start node-problem-detector
Viewing the collected metrics
Node Problem Detector reports a list of metrics against a
Compute Engine instance monitored resource.
The metrics are documented on
Monitoring metrics list, prefixed
compute.googleapis.com/guest/. You can view the collected metrics
Monitoring Metrics Explorer:
In the Google Cloud Console, go to Monitoring or use the following button:
In the Monitoring navigation pane, click Metrics explorer.
For the resource type, select Compute Engine VM instance.
Select a metric, for example "Problem Count".
You should see charts and statistics on the right side. To view the result for a specific Container-Optimized OS instance, set the filter to
"instance_id=[INSTANCE_ID]", replacing [INSTANCE_ID] with the ID for the desired instance.
Disabling health monitoring
The feature is disabled by default at boot time. If you have already enabled the
feature but want to disable it now, remove the
systemctl start node-problem-detector step in your
and then reboot the Container-Optimized OS instance.