Monitor count of processess on a VM

This document describes how to use the Google Cloud console to create an alerting policy that monitors the number of processes running on your virtual machines (VMs) that meet conditions you specify. This type of alerting policy is sometimes called a process-health alerting policy. For example, you can count the number of processes started by the root user. You can also count the number of processes whose invocation command contained a specific string. An alerting policy can notify you when the number of processes is more than, or less than, a threshold. For information about which processes can be monitored, see Processes that are monitored.

This content does not apply to log-based alerting policies. For information about log-based alerting policies, which notify you when a particular message appears in your logs, see Monitoring your logs.

Before you begin

  1. To get the permissions that you need to create and modify alerting policies by using the Google Cloud console, ask your administrator to grant you the Monitoring Editor (roles/monitoring.editor) IAM role on your project. For more information about granting roles, see Manage access.

    You might also be able to get the required permissions through custom roles or other predefined roles.

    For more information about Cloud Monitoring roles, see Control access with Identity and Access Management.

  2. Ensure that you're familiar with the general concepts of alerting policies. For information about these topics, see Alerting overview.

  3. Configure the notification channels that you want to use to receive any alerts. For redundancy purposes, we also recommend that you create multiple types of notification channels. For information about these steps, see Create and manage notification channels.

  4. Ensure that you've installed the Ops Agent on the VMs that you want to monitor. For more information, see Google Cloud Observability agents.

Create alerting policy

To create an alerting policy that monitors the count of processes running on a VM by using the Cloud Monitoring API, the filter expression must specify a time series selector. For an example of a JSON file that specifies this selector, see Process-health policy.

To create an alerting policy that monitors the count of processes running on a VM, do the following:

  1. In the navigation panel of the Google Cloud console, select Monitoring, and then select  Alerting:

    Go to Alerting

  2. Select Create policy.
  3. Select ? on the Select metric section header and then select Direct filter mode in the tooltip.

  4. Enter a Monitoring filter.

    For example, to count the number of processes that are running on Compute Engine VM instances whose name includes nginx, enter the following:

    select_process_count("monitoring.regex.full_match(\".*nginx.*\")")
    resource.type="gce_instance"
    

    For syntax information see the following resources:

  5. Complete the alerting policy. You must configure the condition trigger, notifications, documentation, and policy name, and then click Create policy.

    For more information, see Create metric-threshold alerting policies.

Processes that are monitored

Not all processes running in your system can be monitored by a process-health condition. This condition selects processes to be monitored by using a regular expression that is applied to the command line that invoked the process. When the command line field isn't available, the process can't be monitored.

One way to determine if a process can be monitored by a process-health condition is to look at the active processes. For example, on a Linux system, you can use the ps command:

    ps aux | grep nfs
    USER      PID  %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root      1598  0.0  0.0      0     0 ?        S<   Oct25   0:00 [nfsd4]
    root      1639  0.0  0.0      0     0 ?        S    Oct25   2:33 [nfsd]
    root      1640  0.0  0.0      0     0 ?        S    Oct25   2:36 [nfsd]

When a COMMAND entry is wrapped with square brackets, for example [nfsd], the command-line information for the process isn't available. In this situation, you can't use Cloud Monitoring to monitor the process.