Create a broken-link checker

This document describes how to configure a periodic test of the the links contained in a URI by creating a synthetic monitor. You specify the options for the test, such as the origin URI, the number of links tested, and the number of retries, and then deploy a preconfigured Cloud Run function. To support your troubleshoot and debug efforts, synthetic monitors save detailed information about each test, including screenshots. Screenshots let you view the exact response seen by the customers of your application.

To learn more about synthetic monitors, see About synthetic monitors.

About broken-link checkers

Each broken-link checker tests the links serially, and there is an overall synthetic timeout, which is configurable.

By default, a broken-link checker does the following:

  • Searches the origin URI for HTML anchor elements with href attributes.
  • Tests the first 10 links found on the origin URI.
  • For each link, the checker issues a request and then waits at most 30 seconds for a response. When a response is received, the checker verifies that the HTTP response status is 200, which indicates a successful response. The checker doesn't perform retries.

You specify the origin URI. You can configure which HTML elements that the broken-link checker searches for, the maximum number of elements tested, the per-test timeout, and whether retries are performed. You can also configure broken-link checkers to wait for a selector to appear.

Broken-link checkers use the broken-links-ok template. The configuration for a broken-link checker is specified by the options object of the index.js file. If you create your checker by using the Google Cloud console, you're prompted for each configuration option and the Cloud Run function is updated for you. However, if you use the Cloud Monitoring API or Terraform, then you must populate this object.

After you create a broken-link checker, to modify the configuration, update the options object and redeploy the Cloud Run function.

Before you begin

  1. To get the permissions that you need to view and modify synthetic monitors by using the Google Cloud console, ask your administrator to grant you the following IAM roles on your project:

    For more information about granting roles, see Manage access to projects, folders, and organizations.

    You might also be able to get the required permissions through custom roles or other predefined roles.

  2. Enable the Cloud Monitoring API, Artifact Registry API, Cloud Build API, Cloud Functions API, Cloud Logging API, Pub/Sub API, and Cloud Run Admin API APIs.

    Enable the APIs

  3. Verify that your Google Cloud project contains the default Compute Engine service account. This service account is created when you enable the Compute Engine API and has a name similar to 12345-compute@developer.gserviceaccount.com.

    In the Google Cloud console, go to the Service Accounts page:

    Go to Service Accounts

    If you use the search bar to find this page, then select the result whose subheading is IAM & Admin.

    If the default Compute Engine service account doesn't exist, then click Create service account and complete the dialog.

  4. Ensure that the default Compute Engine service account, or the service account that you created, has been granted the role of Editor (roles/editor).

    To view the roles granted to your service account, do the following:

    1. In the Google Cloud console, go to the IAM page:

      Go to IAM

      If you use the search bar to find this page, then select the result whose subheading is IAM & Admin.

    2. Select Include Google-provided role grants.
    3. If the service account used by your synthetic monitor isn't listed, or if it hasn't been granted a role that includes the permissions in the role of Cloud Trace Agent (roles/cloudtrace.agent), then grant this role to your service account.
  5. Configure the notification channels that you want to use to receive notifications. We recommend that you create multiple types of notification channels. For more information, see Create and manage notification channels and Create and manage notification channels by API.

Create a broken-link checker

Console

When you create a synthetic monitor by using the Google Cloud console, a new Cloud Run function (2nd gen) is deployed and the monitor for that Cloud Run function is created. You can't create a synthetic monitor that monitors an existing Cloud Run function.

  1. Ensure that you've enabled the required APIs, that your project contains a default Compute Engine service account, and that this account has been granted the role of Editor (roles/editor). For more information, see Before you begin.
  2. In the Google Cloud console, go to the  Synthetic monitoring page:

    Go to Synthetic monitoring

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  3. Select Create Synthetic Monitor.
  4. For the template, select Broken link checker.
  5. Enter a name for the synthetic monitor.
  6. Optional: Update the Response Timeout, the Check Frequency, and add user-defined labels.

  7. Configure the URI and the elements to test:

    1. Click Origin URI, and enter a URI that you want tested. The value that you enter must be an HTTP or HTTPS endpoint. For example, you might enter https://mywebsite.example.com.

    2. Optional: In the Number of links to follow, update the maximum number of links that are tested. The default value of this field is 10.

    3. Optional: In the HTML element selector field, enter the HTML element that you want matched, as a comma separated list. The value that you enter is converted to a string and then passed to the Document: querySelectorAll() method.

      By default, this field is set to a, which matches anchors. You can enter values such as a, img, when you want to match both anchors and images.

    4. Optional: In the HTML attributes to follow field, enter the HTML attributes that you want matched. The comma separated values that you enter are individually passed to the getAttribute() method.

      By default, this field is set to href, which specifies the URI for the link. You can enter multiple attributes, for example, you can enter href, src. In this example, the code searches for the attribute href and then searches for the attribute src.

    5. Optional: Configure wait for selector, per-URI timeout, retries, and expected status codes:

      1. Click Show more options.
      2. To configure the broken-link checker to wait for a specific selector to appear in the URI before any links are scraped, enter the CSS selectors in the Wait for element selector field. The value that you enter is converted to a string and then passed to the page.waitForSelector() method.

        If the selector doesn't appear before the timeout expires, then the failure is recorded in the logs.

      3. Update the order in which links are selected for testing.

      4. Configure retries.

        By default, one request is sent to each link and if the initial request fails for any reason, for example, the command times out or the HTTP status code isn't 200, then the link is marked as failed.

        This field specifies the number of times the broken-link checker can issue an HTTP request to a link before marking the link as failed.

      5. Configure a timeout that applies to each URI. By default, this value is set to 30 seconds.

      6. To specify the expected status code and time out for a specific URI, click Add per-link option and complete the dialog.

  8. Optional: Configure whether screenshots of responses are collected and saved. If you use the default settings, then screenshots aren't saved. If you enable screenshot collection, then you can collect screenshots for all tests or only for failing tests. Cloud Monitoring uses the following convention to name the Cloud Storage bucket:

    gcm-PROJECT_ID-synthetics-LOCATION
    

    In the previous expression:

    • PROJECT_ID: The ID of your Google Cloud project.
    • LOCATION: The location of your Cloud Storage bucket.

    You have the option to use an existing Cloud Storage bucket.

  9. Review your configuration and ensure that it is correct and complete, and then create your Cloud Run function:

    1. Click Create Function.

      The values in the URI configuration fields are copied to the Options object in the index.js file when you click Create Function. After you click Create Function, to change the configuration, edit the Options object.

    2. Enter a display name and select a region. Names must be unique within a region.

    3. In the Runtime, build, connections and security settings section, do the following:

      • In the Connections tab, ensure that Allow all traffic is selected.

      • Review the default settings and update them when necessary.

      • In the Runtime service account field, select a service account.

    4. Click Apply function.

  10. Configure the alerting policy:

    1. Optional: Update the alerting policy name and the duration of failure before notifications are sent.

    2. Add the notification channels.

  11. Click Create.

    The Cloud Run function that you defined is built and deployed as 2nd gen, and the synthetic monitor is created.

API

The process of creating a broken-link checker checker by using the Cloud Monitoring API is identical to the process of creating any other synthetic monitor. For information about using the Cloud Monitoring API to create a synthetic monitor, see Create a synthetic monitor, and select the Cloud Monitoring tab.

Broken-link checkers use the broken-links-ok template. The configuration for a broken-link checker is specified by the options object of the index.js file.

When the options.screenshot_options structure is defined, the broken-link checker collects screenshots and saves them to a Cloud Storage bucket. If the screenshot_options.storage_location field isn't defined or if the value is an empty string, then Monitoring creates a Cloud Storage bucket and screenshots are saved to that bucket. Monitoring uses the following convention to name the Cloud Storage bucket:

gcm-PROJECT_ID-synthetics-LOCATION

In the previous expression:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The location of your Cloud Storage bucket.

Terraform

To learn how to apply or remove a Terraform configuration, see Basic Terraform commands. For more information, see the Terraform provider reference documentation.

The process of creating a broken-link checker checker by using Terraform is identical to the process of creating any other synthetic monitor. For information about using the Terraform to create a synthetic monitor, see Create a synthetic monitor, and select the Terraform tab.

Broken-link checkers use the broken-links-ok template. The configuration for a broken-link checker is specified by the options object of the index.js file.

When the options.screenshot_options structure is defined, the broken-link checker collects screenshots and saves them to a Cloud Storage bucket. If the screenshot_options.storage_location field isn't defined or if the value is an empty string, then Monitoring creates a Cloud Storage bucket and screenshots are saved to that bucket. Monitoring uses the following convention to name the Cloud Storage bucket:

gcm-PROJECT_ID-synthetics-LOCATION

In the previous expression:

  • PROJECT_ID: The ID of your Google Cloud project.
  • LOCATION: The location of your Cloud Storage bucket.

Explore results

For each execution, a broken-link checker does the following:

  • Generates a table, where each row provides information about the testing of a specific URI. The summary information includes the target URI, latency, status, and the HTML element identifier. For example, this column lists a when an HTML anchor element is tested. When the row corresponds to the origin URI, the value of HTML element identifier is -.

  • Collects metrics, trace data, and log data.

  • Collects screenshots, when configured.

For more information about how to explore the collected data, see Explore synthetic monitor results.

Troubleshoot

This section provides information that you can use to help you troubleshoot your broken-link checkers.

Unable to edit the configuration of a broken-link checker

You created a broken-link checker by using the Google Cloud console, and you want to change the HTML elements that are tested, or you want to modify the URI timeout, retries, wait for selector, and per-link options. However, when you edit the broken-link checker, the Google Cloud console doesn't display the configuration fields.

To resolve this failure, do the following:

  1. In the Google Cloud console, go to the  Synthetic monitoring page:

    Go to Synthetic monitoring

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Locate the synthetic monitor that you want to edit, click More options, and then select Edit.
  3. Click Edit function.
  4. Edit the options object in the index.js file, and then click Apply function.

    For information about the fields and syntax for this object, see broken-links-ok/index.js.

  5. Click Save.

Google Cloud console displays that saves of screenshots fail

You created a broken-link checker and configured it to save screenshots. However, the Google Cloud console is displaying one of the following warning messages along with more detailed information:

  • InvalidStorageLocation
  • StorageValidationError
  • BucketCreationError
  • ScreenshotFileUploadError

To resolve these failures, try the following:

  • If you see the InvalidStorageLocation message, then verify the existence of the Cloud Storage bucket specified in the field named options.screenshot_options.storage_location.

  • View the logs related to your Cloud Run function. For more information, see Finding logs.

  • Verify that the service account being used in the corresponding Cloud Run function has an Identity and Access Management role that lets it create, access, and write to Cloud Storage buckets.

What's next