This document provides information about how to find log data and how to troubleshoot synthetic monitor and uptime check failures:
- Finding logs
- Troubleshoot notifications
- Troubleshoot public uptime checks
- Troubleshoot private uptime checks
- Troubleshoot synthetic monitors
Find logs
This section provides information about how to find logs for your synthetic monitors and uptime checks:
-
In the Google Cloud console, go to the Logs Explorer page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
Do any of the following:
To find all logs associated with your synthetic monitors or uptime checks, query by resource type. You can use the Resource menu or you can enter a query.
For uptime checks, in the Resource menu, select Uptime Check URL, or enter the following query into the query editor and then click Run query:
resource.type="uptime_url"
For synthetic monitors, in the Resource menu, select Cloud Run Revision, or enter the following query into the query editor and then click Run query:
resource.type="cloud_run_revision"
The find logs that contain information about the response received during a synthetic monitor or uptime check execution, do any of the following:
To query by using the ID of the synthetic monitor or uptime check, use the following format when entering the ID into the query editor, and then click Run query
labels.check_id="my-check-id"
To query for logs that contain response data for requests issued by synthetic monitors and uptime checks, enter the following query into the query editor and then click Run query
"UptimeCheckResult"
The previous query matches all log entries that include the string
"UptimeCheckResult"
.
These logs include the following:
The ID of the synthetic monitor or uptime check, which is stored in the
labels.check_id
field.For synthetic monitors, the name of your Cloud Run function, which is stored in the
resource.labels.service_name
field.When trace data is collected, the ID of an associated trace, which is stored in the
trace
field.
To verify that your service received requests from Google Cloud servers, copy the following query into the query editor and then click Run query:
"GoogleStackdriverMonitoring-UptimeChecks"
The
protoPayload.ip
field contains one of the addresses used by the uptime-check servers. For information about how to list all IP addresses, see List IP addresses.
Troubleshoot notifications
This section describes some errors you might encounter when configuring alerting policies and provides information for resolving them.
One checker failed but others didn't
You are reviewing your uptime-check metrics and notice that one checker reported a failure when all other checkers reported success.
There is no action required to resolve this situation.
When only one checker reports a failure, that failure might be result of the checker's command timing out due to a network issue. That is, instead of the command failing, the command doesn't complete within the specified timeout.
Alerting policies that use the default configuration require failures from at least two checkers before they create an incident and send a notification. A failure reported by a single checker doesn't result in a notification.
You received a notification and want to debug the failure
To identify when the failure began, do one of the following:
For uptime checks, to determine when the failure occurred, view the Uptime details page:
-
In the Google Cloud console, go to the Uptime checks page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
Find and select the uptime check.
The Passed checks chart displays the history of the checks. To identify when the uptime check first failed, you might need to modify the time range for the chart. The time-range selector is located in the toolbar of the Uptime details page.
-
For synthetic monitors, to determine when the failure occurred, view the Uptime details page:
-
In the Google Cloud console, go to the Synthetic monitoring page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
- Find and select the synthetic monitor.
-
For information about how to find associated log data, see the section of this page titled Finding logs.
You aren't notified that an uptime check failed
You configured an uptime check and are viewing the Uptime details page for that check. You notice that the Passed checks graph shows that at least one checker failed. However, you didn't receive a notification.
By default, the alerting policy is configured to create an incident and send a notification when checkers in at least two regions fail to receive a response to an uptime check. These failures must occur simultaneously.
You can edit the condition of the alerting policy so that you are notified when a single region fails to receive a response. However, we encourage you to use the default configuration, which reduces the number of notifications that you might receive due to transient failures.
To view or edit an alerting policy, do the following:
-
In the Google Cloud console, go to the notifications Alerting page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
- Click See all policies in the Policies pane.
Find the policy that you want to view or edit, and then click the name of the policy.
You can view and edit the policy from the Policy details page.
Troubleshoot public uptime checks
This section describes some errors you might encounter when using public uptime checks and provides information for resolving them.
Your public uptime checks are failing
You configure a public uptime check, but you receive an error when you perform the verification step.
The following are some possible causes of an uptime check failure:
- Connection Error - Refused: If you are using the default HTTP connection type, check that you have a web server installed that is responding to HTTP requests. A connection error can happen on a new instance if you haven't installed a web server; see the Quickstart for Compute Engine. If you use an HTTPS connection type, you might have to perform additional configuration steps. For firewall issues, see List uptime-check server IP addresses.
- Name or service not found: The hostname might be incorrect.
- 403 Forbidden: The service is returning an error code to the uptime checker. For example, the default Apache web server configuration returns this code under Amazon Linux, but it returns code 200 (Success) under some other Linux versions. See the LAMP tutorial for Amazon Linux or your web server's documentation.
- 404 Not found: The path might be incorrect.
- 408 Request timeout, or no response: The port number might be incorrect, the service might not be running, the service might be inaccessible, or the timeout might be too low. Check that your firewall allows traffic from the uptime servers; see List uptime-check server IP addresses. The timeout limit is specified as part of the Response Validation options.
To help you troubleshoot failed public uptime checks, you can configure your uptime checks to send up to 3 ICMP pings during the check. The pings can help you distinguish between failures caused, for example, by network connectivity problems and by timeouts in your application. For more information, see Use ICMP pings.
Troubleshoot private uptime checks
This section describes some errors you might encounter when using private uptime checks and provides information for resolving them.
Create of uptime check fails
Your Google Cloud project settings might prevent modification of the roles assigned to the service account that uptime checks use to manage interactions with the Service Directory service. In this situation, the creation of the uptime check fails.
This section describes how you can grant the roles that the service account requires:
Google Cloud console
When you use the Google Cloud console to create the private uptime check, the Google Cloud console issues the commands to grant the Service Directory roles to the service account.
For information about how to grant roles to a service account, see Authorize the service account.
API: Scoping project
The first time you create a private uptime check for a Service Directory service and private resources in a single Google Cloud project, the request might succeed or fail. The result depends on whether you have disabled automatic role grants for service accounts in your project:
The first uptime-check creation succeeds if your project permits automatic role grants for service accounts. A service account is created for you and is granted the necessary roles.
The first uptime-check creation fails if your project doesn't permit automatic role grants for service accounts. A service account is created, but no roles are granted.
If the creation of the uptime check fails, then do the following:
- Authorize the service account.
- Wait a few minutes for the permissions to propagate.
- Try creating the private uptime check again.
API: Monitored project
The first time you create a private uptime check that targets a Service Directory service in a monitored project or private resources in different Google Cloud project, the request fails and results in the creation of a Monitoring service account.
How you authorize the service account depends on the number of Google Cloud projects you are using and their relationships. You might have up to four projects involved:
- The project in which you defined the private uptime check.
- The monitored project in which you configured the Service Directory service.
- The project in which you configured the VPC network.
- The project in which network resources like VMs or load balancers are configured. This project has no role in the service-account authorization discussed here.
When the creation of the first uptime check fails, do the following:
- Authorize the service account.
- Wait a few minutes for the permissions to propagate.
- Try creating the private uptime check again.
Access denied
Your uptime checks are failing with VPC_ACCESS_DENIED
results. This result
means that some aspect of your network configuration or service-account
authorization isn't correct.
Check your service-account authorization for using a scoping project or a monitored project as described in Create of uptime check fails.
For more information about accessing private networks, see Configure the network project.
Anomalous results from private uptime checks
You have a Service Directory service with multiple VMs, and your service configuration contains multiple endpoints. When you shut down one of the VMs, your uptime check still indicates success.
When your service configuration contains multiple endpoints, one is chosen at random. If the VM associated with the chosen endpoint is running, the uptime check succeeds even though one of the VMs is down.
Default headers
Your uptime checks are returning errors or unexpected results. This might occur if you have overridden default header values.
When a request is sent for a private uptime check to a target endpoint, the request includes the following headers and values:
Header | Value |
---|---|
HTTP_USER_AGENT |
GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring) |
HTTP_CONNECTION |
keep-alive |
HTTP_HOST |
IP of Service Directory endpoint |
HTTP_ACCEPT_ENCODING |
gzip , deflate , br |
CONTENT_LENGTH |
Calculated from uptime post data |
If you attempt to override these values, the following might happen:
- The uptime check reports errors
- The override values are dropped and replace with the values in the table
No data visible
You don't see any data on the uptime check dashboard when your uptime check is in a different Google Cloud project than the Service Directory service.
Ensure that the Google Cloud project that contains the uptime check monitors the Google Cloud project that contains the Service Directory service.
For more information about how to list monitored projects and add additional ones, see Configure a metrics scope for multiple projects.
Troubleshoot synthetic monitors
This section provides information that you can use to help you troubleshoot your synthetic monitors.
Error message after enabling the APIs
You open the create flow for a synthetic monitor and are prompted to enable at least one API. After you enable the APIs, a message similar to the following is displayed:
An error occurred during fetching available regions: Cloud Functions API has not been used in project PROJECT_ID before or it is disabled.
The error message recommends that you verify that the API is enabled and then advises that you wait and retry the action.
To verify that the API is enabled, go to the APIs & Services page for your project:
After you've verified that the API is enabled, you can continue with the create flow. The condition resolves automatically after the API enablement propagates through the backend.
Outbound HTTP requests aren't being traced
You configure your synthetic monitor to collect trace data for output HTTP requests. Your trace data only shows one span, similar to the following screenshot:
To resolve this situation, ensure that your service account
has been granted the role of Cloud Trace Agent (roles/cloudtrace.agent
).
A role of Editor (roles/editor
) is also sufficient.
To view the roles granted to your service account, do the following:
-
In the Google Cloud console, go to the IAM page:
If you use the search bar to find this page, then select the result whose subheading is IAM & Admin.
- Select Include Google-provided role grants.
If the service account used by your synthetic monitor isn't listed, or if it hasn't been granted a role that includes the permissions in the role of Cloud Trace Agent (
roles/cloudtrace.agent
), then grant this role to your service account.If you don't know the name of your service account, then in the navigation menu, select Service Accounts.
In progress status
The Synthetic monitors page lists a synthetic monitor
with a status of In progress
. A status of In progress
means that
the synthetic monitor was recently created and there isn't any data to display,
or that the function failed to deploy.
To determine if the function failed to deploy, try the following:
Ensure that the name of your Cloud Run function doesn't contain an underscore. If an underscore is present, remove the underscore and redeploy the Cloud Run function.
Open the Synthetic monitor details page for the synthetic monitor.
If you see the following message, then delete the synthetic monitor.
Cloud Function not found for this Synthetic monitor. Please confirm it exists or delete this monitor.
The error message indicates that the function was deleted and therefore the synthetic monitor is unable to execute the function.
Open the Cloud Run functions page for the function. To open this page from the Synthetic monitor details page, click Code, and then click the function name.
If you see a message similar to the following, then the function failed to deploy.
This function has failed to deploy and will not work correctly. Please edit and redeploy
To resolve this failure, review the function code and correct the errors that are preventing the function from building or deploying.
When you create a synthetic monitor, it might take several minutes for the function to be deployed and executed.
Warning status
The Synthetic monitors lists a synthetic monitor
with a status of Warning
. A status of Warning
means that the execution
results are inconsistent. This might indicate a design issue with your
test, or it might indicate that what is being tested has inconsistent behavior.
Failing status
The Synthetic monitors lists a synthetic monitor with a status of
Failing
. To get more information about the failure reason,
view the most recent execution history.
If the error message
Request failed with status code 429
is shown, then the target of the HTTP request rejected the command. To resolve this failure, you must change the target of your synthetic monitor.The endpoint
https://www.google.com
rejects requests made by synthetic monitors.If the failure is returning an execution time of
0ms
, then the Cloud Run function might be running out of memory. To resolve this failure, edit your Cloud Run function, and then increase the memory to at least 2 GiB and set the CPU field to1
.
Delete fails for a synthetic monitor
You use the Cloud Monitoring API to delete a synthetic monitor, but the API call fails with a response similar to the following:
{ "error": { "code": 400, "message": "Request contains an invalid argument.", "status": "INVALID_ARGUMENT", "details": [ { "@type": "type.googleapis.com/google.rpc.DebugInfo", "detail": "[ORIGINAL ERROR] generic::invalid_argument: Cannot delete check 1228258045726183344. One or more alerting policies is using it.Delete the alerting policy with id projects/myproject/alertPolicies/16594654141392976482 and any other policies using this uptime check and try again." } ] } }
To resolve the failure, delete alerting policies that monitor the results of the synthetic monitor, and then delete the synthetic monitor.
Unable to edit the configuration of a broken-link checker
You created a broken-link checker by using the Google Cloud console, and you want to change the HTML elements that are tested, or you want to modify the URI timeout, retries, wait for selector, and per-link options. However, when you edit the broken-link checker, the Google Cloud console doesn't display the configuration fields.
To resolve this failure, do the following:
-
In the Google Cloud console, go to the Synthetic monitoring page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
- Locate the synthetic monitor that you want to edit, click more_vert More options, and then select Edit.
- Click Edit function.
Edit the
options
object in theindex.js
file, and then click Apply function.For information about the fields and syntax for this object, see
broken-links-ok/index.js
.Click Save.
Google Cloud console displays that saves of screenshots fail
You created a broken-link checker and configured it to save screenshots. However, the Google Cloud console is displaying one of the following warning messages along with more detailed information:
InvalidStorageLocation
StorageValidationError
BucketCreationError
ScreenshotFileUploadError
To resolve these failures, try the following:
If you see the
InvalidStorageLocation
message, then verify the existence of the Cloud Storage bucket specified in the field namedoptions.screenshot_options.storage_location
.View the logs related to your Cloud Run function. For more information, see Finding logs.
Verify that the service account being used in the corresponding Cloud Run function has an Identity and Access Management role that lets it create, access, and write to Cloud Storage buckets.