Monitoring integrity on Shielded VMs

This document describes how to use Cloud Monitoring to monitor the boot integrity of Shielded VM instances that have integrity monitoring enabled, identify the cause of an integrity validation failure, and update the integrity policy baseline.

Monitoring VM boot integrity by using Monitoring

Use Cloud Monitoring to view integrity validation events and set alerts for them, and Cloud Logging to review details of those events.

Viewing integrity validation events

To view the metrics for a monitored resource by using the Metrics Explorer, do the following:

In the Google Cloud console, go to the Metrics explorer page:
Go to Metrics explorer

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
In the toolbar of the Google Cloud console, select your Google Cloud project. For App Hub configurations, select the App Hub host project or the app-enabled folder's management project.
In the Metric element, expand the Select a metric menu, enter Boot Validation in the filter bar, and then use the submenus to select a specific resource type and metric:
1. In the Active resources menu, select VM instance.
2. In the Active metric categories menu, select Instance.
3. In the Active metrics menu, select Early Boot Validation or Late Boot Validation.
  - Early Boot Validation: Shows the pass/fail status of the early boot portion of the last boot sequence. Early boot is the boot sequence from the start of the UEFI firmware until it passes control to the bootloader.
  - Late Boot Validation: Shows the pass/fail status of the late boot portion of the last boot sequence. Late boot is the boot sequence from the bootloader until completion. This includes the loading of the operating system kernel.
4. Click Apply.
To add filters, which remove time series from the query results, use the Filter element.
To combine time series, use the menus on the Aggregation element. For example, to display the CPU utilization for your VMs, based on their zone, set the first menu to Mean and the second menu to zone.

All time series are displayed when the first menu of the Aggregation element is set to Unaggregated. The default settings for the Aggregation element are determined by the metric type you selected.
For quota and other metrics that report one sample per day, do the following:
1. In the Display pane, set the Widget type to Stacked bar chart.
2. Set the time period to at least one week.

Setting alerts on integrity validation events

Set alerts on the values of the Early Boot Validation and Late Boot Validation metrics if you want to be notified when there is a boot validation failure on your VM instance. For information about alerting, see Introduction to Alerting:

For early boot validation alerting policy settings, see Compute Engine early boot validation.
For late boot validation alerting policy settings, see Compute Engine late boot validation.

Viewing integrity validation event details

Go to the VM instances page
Click the instance ID to open the VM instance details page.
Under Logs, click Cloud Logging.
Locate the earlyBootReportEvent or lateBootReportEvent log entry that you want to review.
Expand the log entry > jsonPayload > earlyBootReportEvent or lateBootReportEvent, as appropriate. Within that section, the policyEvaluationPassed element identifies whether the given section of the boot sequence passed verification against the integrity policy baseline.
Expand the actualMeasurements section and the numbered elements within it to see the platform configuration register (PCR) values saved from the latest boot sequence. The PCR values are saved in the value elements within the numbered elements. The PCR values identify the boot components and component load order used by the latest boot sequence, and are compared to the integrity policy baseline to determine if there has been any change in the VM instance boot sequence. For more information about what the PCRs represent, see Integrity monitoring events.
Expand the policyMeasurements section to see the PCR values saved for the integrity policy baseline.

Automating responses to integrity validation events

You can automate responses to boot validation events by exporting the Cloud Logging logs and processing them in another service like Cloud Run functions. For more information, see Routing and storage overview and Automating responses to integrity validation failures.

Determining the cause of boot integrity validation failure

Go to the VM instances page
Click the instance ID to open the VM instance details page.
Under Logs, click Cloud Logging.
Locate the most recent earlyBootReportEvent and lateBootReportEvent log entries and see which one has a policyEvaluationPassed value of false.
Expand the log entry > jsonPayload > earlyBootReportEvent or lateBootReportEvent, as appropriate.
Expand the actualMeasurements and policyMeasurements sections and the numbered elements within them to see the platform configuration register (PCR) values saved from the latest boot sequence and the integrity policy baseline, respectively. The PCR values identify the boot components and component load order used by the latest boot sequence and the integrity policy baseline.
Compare the PCR values in the actualMeasurements and policyMeasurements sections to determine where the variation between the latest boot sequence and the integrity policy baseline occurred. Whichever comparison produces dissimilar values is the issue that caused validation failure. Be aware that the element numbers in these sections seldom correspond to the PCR numbers, and similarly numbered elements in actualMeasurements and policyMeasurements might represent different PCRs. For example, in the early boot sequence for both Windows and Linux, the 3 element in actualMeasurements and the 2 element in policyMeasurements both represent PCR7.

Important: The elements representing PCR13 and PCR14 in the Windows boot sequence don't appear in policyMeasurements until after the first reboot following VM instance creation. During that reboot, those PCR values are captured and added to the integrity policy baseline. Until then, boot integrity validation uses only PCR4, PCR7, and PCR11 in policyMeasurements.
Check Integrity monitoring events to determine what the changed PCR represents, and investigate whether that is an expected change.

Updating the integrity policy baseline

The initial integrity policy baseline is derived from the implicitly trusted boot image when the instance is created. Updating the baseline updates the integrity policy baseline using the current instance configuration. The VM instance must be running when you update the baseline.

You should update the baseline after any planned boot-specific changes in the instance configuration, like kernel updates or kernel driver installation, as these will cause integrity validation failures. If you have an unexpected integrity validation failure, you should investigate the reason for the failure and be prepared to stop the instance if necessary.

You must have the setShieldedInstanceIntegrityPolicy permission to be able to update the integrity policy baseline.

Use the following procedure to update the integrity policy baseline.

gcloud

Update the VM instance's integrity policy baseline by using the compute instances update command with the --shielded-learn-integrity-policy flag.

The following example resets the integrity policy baseline for the my-instance VM instance:

gcloud compute instances update INSTANCE_NAME \
    --zone=ZONE \
    --shielded-learn-integrity-policy

Replace the following:

INSTANCE_NAME: Name of the VM.
ZONE: Zone where the VM exists.

REST

Update the VM instance's integrity policy baseline by using the updateAutoLearnPolicy request body item with the setShieldedInstanceIntegrityPolicy method.

The following example resets the integrity policy baseline for a VM instance.

PATCH https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances/INSTANCE_NAME/setShieldedInstanceIntegrityPolicy?key=YOUR_API_KEY
{
  "updateAutoLearnPolicy": true
}