Uptime checks fail even if instance works as expected

Problem

You observe that your uptime checks are failing by looking at the Cloud Monitoring Uptime Check page. They might also be receiving Cloud Monitoring Alert notifications from your Uptime Check Alerts Policies. However, the request latencies are well within your configured time out range.

monitoring.googleapis.com/uptime_check/request_latency

If you have other forms of monitoring your requests and network, you can confirm that there are either no requests from the user agent described below, or there are only successful ones.

GoogleStackdriverMonitoring-UptimeChecks


However when you look at the Uptime Check logs, you notice that there are a noticeable amount of continuous unsuccessful requests. This suggests that the requests made by the uptime checks are not reaching the endpoint you have configured.

resource.type="uptime_url"

httpRequest.status!=200

labels.check_id="<Uptime Check ID>"

 

You also notice the same error messages and frequency in your Cloud Armor Logs.

resource.type:(http_load_balancer) AND jsonPayload.enforcedSecurityPolicy.name:(<Security Policy Name>)

httpRequest.status=403

httpRequest.userAgent="GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)"

From the log entry you extracted the priority of the Cloud Armor Rule and confirmed that the source IP of the request is set to be Denied.

Environment

Any Google Cloud Platform project with Cloud Monitoring Uptime Checks monitoring an instance or a service endpoint. In addition you have Cloud Armor Security Policies that deny specific IP ranges.

Solution

  1. Go to the Uptime Check page on the Cloud Monitoring Console and download all the Uptime Check source IP addresses.
  2. Then, configure your Cloud Armor Security Policies to allow these IPs making requests to resources in your project.

Cause

Cloud Monitoring Uptime Check uses probers that send out user configured requests through the public internet. Cloud Armor Security Policies can sometimes block these requests. This can be because of a change or addition in Uptime Check IPs, or change in Cloud Armor policies. The latter can be confirmed using audit logs.

resource.type="audited_resource"

resource.labels.service="compute.googleapis.com"

resource.labels.method:"compute.v1.securityPolicies."