Cloud Operations

Use Process Metrics for troubleshooting and resource attribution

#da

When you are experiencing an issue with your application or service, having deep visibility into both the infrastructure and the software powering your apps and services is critical. Most monitoring services provide insights at the Virtual Machine (VM) level, but few go further. To get a full picture of the state of your application or service, you need to know what processes are running on your infrastructure. That visibility into the processes running on your VMs is provided out of the box by the new Ops Agent and made available by default in Cloud Monitoring. Today we will cover how to access process metrics and why you should start monitoring them. 

Better visibility with process metrics

The data gathered by process metrics include CPU, memory, I/O, number of threads, and more, for any running processes and services on your VMs. When the Ops Agent or the Cloud Monitoring agent is installed, these metrics are captured at 60-second intervals and sent to Cloud Monitoring so you can visualize, analyze, track, and alert on them. A single VM may run tens or hundreds of processes, while you may have tens of thousands running across your fleet of VMs. 

As a developer, you may only care about seeing inside a single VM to troubleshoot and identify memory leaks or the source of performance issues.

As an operator or IT Admin, you may be interested in aggregate resource consumption, building baseline views of compute, storage, and networking usage across your VM fleet. Then, when those baseline consumption levels break normal behaviors, you will know when to investigate your systems.

Built for scale and ease of use

Cloud Monitoring is built on the same advanced backend that powers metrics across Google. This proven scalability means your metrics ingestion will be supported despite the extremely high cardinality. Additionally, our agents do not require any config file changes to turn on process metric monitoring.

Lastly, our goal is to provide you the observability and telemetry data where, and when, you need it. So, like the rest of the operations suite, we deliver process metrics in the context of your infrastructure, directly in the VM admin console.

Navigating to a single VM’s in-context process monitoring in GCE.gif
Navigating to a single VM’s in-context process monitoring in GCE

The navigation is simple. Once you have the Ops Agent or the Cloud Monitoring agent installed in your VMs:

  1. Go to the Compute Engine console page and click on VM Instances 

  2. Select the VM that you want to investigate

  3. In the navigation menu on the top, click Observability

  4. Click on Metrics

  5. Lastly, click on Processes

In the window on the right you will see a chart and a table with all of the processes in your VM. You can also filter by time frame and sort by name or value. You do not need to do anything, other than have the agent installed, for the process to be detected and displayed.

Fleet-wide metrics monitoring

Cloud Monitoring gives you a look across your fleet of VMs so you can identify the aggregated usage of resources by processes. This level of broad, yet granular, insight can drive your decisions around which software to run or how many VMs you need to optimally power your apps and services. Admins can perform a cost-savings analysis if they determine that certain processes are slowing down the work of a large number of VMs. The larger numbers of less powerful VMs can be replaced by fewer, more capable VMs.   

To get this fleet-wide view:

  1. Navigate to Cloud Monitoring 

  2. Click Dashboards in the left menu

  3. In the All Dashboards list, click on VM Instances

  4. Towards the top of the window, click on Processes

This provides many charts detailing the processes running across your fleet of VMs.

new Cloud Monitoring VM Fleet-wide Process view.gif
The new Cloud Monitoring VM Fleet-wide Process view in the VM Instance Dashboard

Get started today

To start identifying and monitoring your process metrics, you must first install the Ops Agent, or have installed the legacy Cloud Monitoring agent. Once that is complete, the process metrics data will automatically be ingested into Cloud Monitoring and the VM admin console.

If you have any questions, or to join the conversation with other developers, operators, DevOps, and SREs, visit the Cloud Operations page in the Google Cloud Community.