Monitoring Integrity on Shielded VM Instances

Use this topic to learn how to monitor the boot integrity of Shielded VM instances using Stackdriver, identify the cause of an integrity validation failure, and update the integrity policy baseline.

Before you begin

Monitoring VM boot integrity by using Stackdriver

Use Stackdriver Monitoring to view integrity validation events and set alerts on them, and Stackdriver Logging to review details of those events.

Viewing integrity validation events

  1. Go to Stackdriver Monitoring
  2. Using the drop-down menu at the top right of the Stackdriver Monitoring console, select the Stackdriver account that contains the GCP project in which your VM instance resides.
  3. In the left-hand navigation, choose Resources and then choose Metrics Explorer.
  4. In the Find resource type and metric field, type instance and select the GCE VM Instance resource type.
  5. Choose one of the following metrics:

    • Early Boot Validation shows the pass/fail status of the early boot portion of the last boot sequence. Early boot is the boot sequence from the start of the UEFI firmware until it passes control to the bootloader.
    • Late Boot Validation shows the pass/fail status of the late boot portion of the last boot sequence. Late boot is the boot sequence from the bootloader until completion. This includes the loading of the operating system kernel.
  6. Optionally, apply a filter to limit the metric information displayed, group the metric information displayed, or aggregate the metrics data. For more information, see Additional configuration.

Setting alerts on integrity validation events

Set alerts on the values of the Early Boot Validation and Late Boot Validation metrics if you want to be notified when there is a boot validation failure on your VM instance. For more information about configuring alerting, see Introduction to Alerting.

  1. Go to Stackdriver Monitoring
  2. Using the drop-down menu at the top right of the Stackdriver Monitoring console, select the Stackdriver account that contains the GCP project in which your VM instance resides.
  3. In the left-hand navigation, choose Alerting and then choose Create a Policy.
  4. Click Add Condition.
  5. On the Try our new UI for creating alerting conditions message, choose Opt in to use the new alerting conditions interface (required).
  6. Click Select for Metric Threshold/Rate Change/Absence.
  7. In the Find resource type and metric field, type instance and select the GCE VM Instance resource type.
  8. Choose one of the following metrics:

    • Early Boot Validation shows the pass/fail status of the early boot portion of the last boot sequence. Early boot is the boot sequence from the start of the UEFI firmware until it passes control to the bootloader.
    • Late Boot Validation shows the pass/fail status of the late boot portion of the last boot sequence. Late boot is the boot sequence from the bootloader until completion. This includes the loading of the operating system kernel.
  9. For Filter, filter by status=failed.

  10. For Group By, group by status.
  11. For Condition triggers if, select Any time series violates.
  12. For Condition, specify is above 0 for 1 minute.
  13. Click Save Condition.
  14. Add one or more notifications.
  15. Optionally, add documentation to help the notification recipient understand how to handle the alert.
  16. Name the policy.
  17. Choose Save Policy.

Viewing integrity validation event details

  1. Go to the VM instances page
  2. Click the instance ID to open the VM instance details page.
  3. Under Logs, click on Stackdriver Logging.
  4. Locate the earlyBootReportEvent or lateBootReportEvent log entry that you want to review.
  5. Expand the log entry > jsonPayload > earlyBootReportEvent or lateBootReportEvent, as appropriate. Within that section, the policyEvaluationPassed element identifies whether the given section of the boot sequence passed verification against the integrity policy baseline.
  6. Expand the actualMeasurements section and the numbered elements within it to see the platform configuration register (PCR) values saved from the latest boot sequence. The PCR values are saved in the pcrNum elements within the numbered elements. The PCR values identify the boot components and component load order used by the latest boot sequence, and are compared to the integrity policy baseline to determine if there has been any change in the VM instance boot sequence. For more information about what the PCRs represent, see Integrity monitoring events.
  7. Expand the policyMeasurements section to see the PCR values saved for the integrity policy baseline.

Automating responses to integrity validation events

You can automate responses to boot validation events by exporting the Stackdriver logs and processing them in another service like Cloud Functions. For more information, see Overview of Logs Export.

Determining the cause of boot integrity validation failure

  1. Go to the VM instances page
  2. Click the instance ID to open the VM instance details page.
  3. Under Logs, click on Stackdriver Logging.
  4. Locate the most recent earlyBootReportEvent and lateBootReportEvent log entries and see which one has a policyEvaluationPassed value of false.
  5. Expand the log entry > jsonPayload > earlyBootReportEvent or lateBootReportEvent, as appropriate.
  6. Expand the actualMeasurements and policyMeasurements sections and the numbered elements within them to see the platform configuration register (PCR) values saved from the latest boot sequence and the integrity policy baseline, respectively. The PCR values identify the boot components and component load order used by the latest boot sequence and the integrity policy baseline.
  7. Compare the PCR values in the actualMeasurements and policyMeasurements sections to determine where the variation between the latest boot sequence and the integrity policy baseline occurred. Whichever comparison produces dissimilar values is the issue that caused validation failure. Be aware that the element numbers in these sections seldom correspond to the PCR numbers, and similarly numbered elements in actualMeasurements and policyMeasurements might represent different PCRs. For example, in the early boot sequence for both Windows and Linux, the 3 element in actualMeasurements and the 2 element in policyMeasurements both represent PCR7.

  8. Check Integrity monitoring events to determine what the changed PCR represents, and investigate whether that is an expected change.

Updating the integrity policy baseline

The initial integrity policy baseline is derived from the implicitly trusted boot image when the instance is created. Updating the baseline updates the integrity policy baseline using the current instance configuration. The VM instance must be running when you update the baseline.

You should update the baseline after any planned boot-specific changes in the instance configuration, like kernel updates or kernel driver installation, as these will cause integrity validation failures. If you have an unexpected integrity validation failure, you should stop the VM instance and investigate the reason for the failure.

You must have the setShieldedVmIntegrityPolicy permission to be able to update the integrity policy baseline.

Use the following procedure to update the integrity policy baseline.

gcloud

Update the VM instance's integrity policy baseline by using the compute instances update command with the --shielded-vm-learn-integrity-policy flag.

The following example resets the integrity policy baseline for the my-instance VM instance:

gcloud beta compute instances update my-instance \
    --shielded-vm-learn-integrity-policy

API

Update the VM instance's integrity policy baseline by using the updateAutoLearnPolicy request body item with the setShieldedVmIntegrityPolicy method.

The following example resets the integrity policy baseline for a VM instance.

PATCH https://www.googleapis.com/compute/alpha/projects/my-project/zones/us-central1-b/instances/my-instance/setShieldedVmIntegrityPolicy?key={YOUR_API_KEY}
{
  "updateAutoLearnPolicy": true
}

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Compute Engine Documentation