Using the Update API

Overview

The Update API lets your client applications download hashed versions of the Web Risk lists for storage in a local or in-memory database. URLs can then be checked locally. When match is found in the local database, the client sends a request to the Web Risk servers to verify whether the URL is included on the Web Risk lists.

Updating the local database

To stay current, clients are required to periodically update the Web Risk lists in their local database. To save bandwidth, clients download the hash prefixes of URLs rather than the raw URLs. For example, if "www.badurl.com/" is on a Web Risk list, clients download the SHA256 hash prefix of that URL rather than the URL itself. In the majority of cases the hash prefixes are 4 bytes long, meaning that the average bandwidth cost of downloading a single list entry is 4 bytes before compression.

To update the Web Risk lists in the local database, send an HTTP GET request to the threatLists.computeDiff method:

  • The HTTP GET request includes the name of the list to be updated along with client constraints to account for memory and bandwidth limitations.
  • The HTTP GET response returns either a full update or a partial update. The response could also return a recommended wait time until the next compute diff operation.

Example: threatLists.computeDiff

HTTP GET request

In the following example, the diffs for the MALWARE Web Risk list are requested. For more details, see the threatLists.computeDiff query parameters and the explanations that follow the code example.

HTTP method and URL:

GET "https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW"

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
"https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW"

PowerShell

Execute the following command:

$cred = gcloud auth application-default print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri ""https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW"" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "recommendedNextDiff": "2020-01-08T19:41:45.436722194Z",
  "responseType": "RESET",
  "additions": {
    "rawHashes": [
      {
        "prefixSize": 4,
        "rawHashes": "AArQMQAMoUgAPn8lAE..."
      }
    ]
  },
  "newVersionToken": "ChAIARAGGAEiAzAwMSiAEDABEPDyBhoCGAlTcIVL",
  "checksum": {
    "sha256": "wy6jh0+MAg/V/+VdErFhZIpOW+L8ulrVwhlV61XkROI="
  }
}

Web Risk lists

The threatType field identifies the Web Risk list. In the example, the diffs for the MALWARE Web Risk list are requested.

Version token

The versionToken field holds the current client state of the Web Risk list. Version Tokens are returned in the newVersionToken field of the threatLists.computeDiff response. For initial updates, leave the versionToken field empty.

Size constraints

The maxDiffEntries field specifies the total number of updates that the client can manage (in the example, 2048). The maxDatabaseEntries field specifies the total number of entries the local database can manage (in the example, 4096). Clients should set size constraints only if they have memory or bandwidth limitations. For more information, see Update Constraints).

Supported compressions

The supportedCompressions field lists the compression types the client supports. In the example, the client supports only raw, uncompressed data. Web Risk, however, supports additional compression types. For more information, see Compression.

HTTP GET response

In this example, the response returns a partial update for the Web Risk list using the requested compression type.

Response body

The response body includes the diffs information (the response type, the additions and removals to be applied to the local database, the new version token, and a checksum). In the example, the response also includes a recommended next diff time. For more details, see the threatLists.computeDiff response body and the explanations that follow the code example.

{
  "responseType" :   "DIFF",
  "recommendedNextDiff": "2019-12-31T23:59:59.000000000Z",
  "additions": {
    "compressionType": "RAW",
    "rawHashes": [{
      "prefixSize": 4,
      "rawHashes":  "rnGLoQ=="
    }]
  },
  "removals": {
    "rawIndices": {
      "indices": [0, 2, 4]
    }
  },
  "newVersionToken": "ChAIBRADGAEiAzAwMSiAEDABEAFGpqhd",
  "checksum": {
    "sha256": "YSgoRtsRlgHDqDA3LAhM1gegEpEzs1TjzU33vqsR8iM="
  },
  "recommendedNextDiff": "2019-07-17T15:01:23.045123456Z"
}

Database diffs

The responseType field will indicate a partial (DIFF) or full update (RESET). In the example, partial diffs are returned, so the response includes both additions and removals. There could be multiple sets of additions, but only one set of removals. For more information, see Database Diffs.

New version token

The newVersionToken field holds the new version token for the newly updated Web Risk list. Clients must save the new client state for subsequent update requests (the versionToken field in the threatLists.computeDiff request.

Checksums

The checksum lets clients verify that the local database has not suffered any corruption. If the checksum does not match, the client must clear the database and reissue an update with an empty versionToken field. However, clients in this situation must still follow the time intervals for updates. For more information, see Request Frequency.

The recommendedNextDiff field indicates a timestamp until when the client should wait before sending another update request. Note that the recommended wait period may or may not be included in the response. For more details, see Request Frequency.

Checking URLs

To check if a URL is on a Web Risk list, the client must first compute the hash and hash prefix of the URL. For details, see URLs and Hashing. The client then queries the local database to determine if there is a match. If the hash prefix is not present in the local database, then the URL is considered safe (that is, not on the Web Risk lists).

If the hash prefix is present in the local database (a hash prefix collision), the client must send the hash prefix to the Web Risk servers for verification. The servers will return all full-length SHA 256 hashes that contain the given hash prefix. If one of those full-length hashes matches the full-length hash of the URL in question, then the URL is considered unsafe. If none of the full-length hashes match the full-length hash of the URL in question, then that URL is considered safe.

At no point does Google learn about the URLs you are examining. Google does learn the hash prefixes of URLs, but the hash prefixes don't provide much information about the actual URLs.

To check if a URL is on a Web Risk list, send an HTTP GET request to the hashes.search method:

  • The HTTP GET request includes the hash prefix of the URL to be checked.
  • The HTTP GET response returns the matching full-length hashes along with the positive and negative expire times.

Example: hashes.search

HTTP GET request

In the following example, the names of two Web Risk lists and a hash prefix are sent for comparison and verification. For more details, see the hashes.search query parameters and the explanations that follow the code example.

curl \
  -H "Content-Type: application/json" \
  "https://webrisk.googleapis.com/v1/hashes:search?key=YOUR_API_KEY&threatTypes=MALWARE&threatTypes=SOCIAL_ENGINEERING&hashPrefix=WwuJdQ%3D%3D"

Web Risk lists

The threatTypes field identifies the Web Risk lists. In the example, two lists are identified: MALWARE and SOCIAL_ENGINEERING.

Threat hash prefixes

The hashPrefix field contains the hash prefix of the URL that you want to check. This field must contain the exact hash prefix that is present in the local database. For example, if the local hash prefix is 4 bytes long then the hashPrefix field must be 4 bytes long. If the local hash prefix was lengthened to 7 bytes then the hashPrefix field must be 7 bytes long.

HTTP GET response

In the following example, the response returns the matching threats, containing the Web Risk lists they matched, along with the expire times.

Response body

The response body includes the match information (the list names and the full length hashes and the cache durations). For more details, see the hashes.search response body and the explanations that follow the code example.

{
  "threats": [{
      "threatTypes": ["MALWARE"],
      "hash": "WwuJdQx48jP-4lxr4y2Sj82AWoxUVcIRDSk1PC9Rf-4="
      "expireTime": "2019-07-17T15:01:23.045123456Z"
    }, {
      "threatTypes": ["MALWARE", "SOCIAL_ENGINEERING"],
      "hash": "WwuJdQxaCSH453-uytERC456gf45rFExcE23F7-hnfD="
      "expireTime": "2019-07-17T15:01:23.045123456Z"
    },
  }],
  "negativeExpireTime": "2019-07-17T15:01:23.045123456Z"
}

Matches

The threats field returns a matching full-length hashes for the hash prefix. The URLs corresponding to these hashes are considered unsafe. If no match is found for a hash prefix, nothing is returned; the URL corresponding to that hash prefix is considered safe.

Expire time

The expireTime and negativeExpireTime fields indicate until when the hashes must be considered either unsafe or safe respectively. For more details, see Caching.