Using the Update API
Overview
The Update API lets your client applications download hashed versions of the Web Risk lists for storage in a local or in-memory database. URLs can then be checked locally. When a match is found in the local database, the client sends a request to the Web Risk servers to verify whether the URL is included on the Web Risk lists.
Updating the local database
To stay current, clients are required to periodically update the Web Risk lists in their local database. To save bandwidth, clients download the hash prefixes of URLs rather than the raw URLs. For example, if "www.badurl.com/" is on a Web Risk list, clients download the SHA256 hash prefix of that URL rather than the URL itself. In the majority of cases the hash prefixes are 4 bytes long, meaning that the average bandwidth cost of downloading a single list entry is 4 bytes before compression.
To update the Web Risk lists in the local database, send an HTTP
GET
request to the
threatLists.computeDiff
method:
- The HTTP
GET
request includes the name of the list to be updated along with client constraints to account for memory and bandwidth limitations. - The HTTP
GET
response returns either a full update or a partial update. The response could also return a recommended wait time until the next compute diff operation.
Example: threatLists.computeDiff
HTTP GET request
In the following example, the diffs for the MALWARE Web Risk list are
requested. For more details, see the
threatLists.computeDiff
query parameters
and the explanations that follow the code example.
HTTP method and URL:
GET https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW&key=API_KEY
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
"https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW&key=API_KEY"
PowerShell
Execute the following command:
$headers = @{ }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW&key=API_KEY" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "recommendedNextDiff": "2020-01-08T19:41:45.436722194Z", "responseType": "RESET", "additions": { "rawHashes": [ { "prefixSize": 4, "rawHashes": "AArQMQAMoUgAPn8lAE..." } ] }, "newVersionToken": "ChAIARAGGAEiAzAwMSiAEDABEPDyBhoCGAlTcIVL", "checksum": { "sha256": "wy6jh0+MAg/V/+VdErFhZIpOW+L8ulrVwhlV61XkROI=" } }
Java
Python
Web Risk lists
The threatType
field identifies the Web Risk list. In the
example, the diffs for the MALWARE Web Risk list are requested.
Version token
The versionToken
field holds the current client state of the
Web Risk list.
Version Tokens are returned in the newVersionToken
field of the
threatLists.computeDiff response.
For initial updates, leave the versionToken
field empty.
Size constraints
The maxDiffEntries
field specifies the total number of updates that the
client can manage (in the example, 2048). The maxDatabaseEntries
field
specifies the total number of entries the local database can manage (in the
example, 4096). Clients should set size constraints to protect memory and
bandwidth limitations and to safeguard against list growth. For
more information, see Update Constraints).
Supported compressions
The supportedCompressions
field lists the compression types the client
supports. In the example, the client supports only raw, uncompressed data.
Web Risk, however, supports additional compression types. For more
information, see Compression.
HTTP GET response
In this example, the response returns a partial update for the Web Risk list using the requested compression type.
Response body
The response body includes the diffs information (the response type, the additions and removals to be applied to the local database, the new version token, and a checksum).
In the example, the response also includes a
recommended next diff time. For more details, see the
threatLists.computeDiff
response body
and the explanations that follow the code example.
{ "responseType" : "DIFF", "recommendedNextDiff": "2019-12-31T23:59:59.000000000Z", "additions": { "compressionType": "RAW", "rawHashes": [{ "prefixSize": 4, "rawHashes": "rnGLoQ==" }] }, "removals": { "rawIndices": { "indices": [0, 2, 4] } }, "newVersionToken": "ChAIBRADGAEiAzAwMSiAEDABEAFGpqhd", "checksum": { "sha256": "YSgoRtsRlgHDqDA3LAhM1gegEpEzs1TjzU33vqsR8iM=" }, "recommendedNextDiff": "2019-07-17T15:01:23.045123456Z" }
Database diffs
The responseType
field will indicate a partial (DIFF
) or full update
(RESET
). In the example, partial diffs are returned, so the response
includes both additions and removals. There could be multiple sets of additions,
but only one set of removals. For more information, see
Database Diffs.
New version token
The newVersionToken
field holds the new version token for the newly updated
Web Risk list. Clients must save the new client state for subsequent
update requests (the versionToken
field in the
threatLists.computeDiff
request.
Checksums
The checksum lets clients verify that the local database has not suffered any
corruption. If the checksum does not match, the client must clear the database
and reissue an update with an empty versionToken
field. However, clients in
this situation must still follow the time intervals for updates. For more
information, see Request Frequency.
Recommended next diff
The recommendedNextDiff
field indicates a timestamp until when the client
should wait before sending another update request. Note that the recommended
wait period may or may not be included in the response. For more details,
see Request Frequency.
Checking URLs
To check if a URL is on a Web Risk list, the client must first compute the hash and hash prefix of the URL. For details, see URLs and Hashing. The client then queries the local database to determine if there is a match. If the hash prefix is not present in the local database, then the URL is considered safe (that is, not on the Web Risk lists).
If the hash prefix is present in the local database (a hash prefix collision), the client must send the hash prefix to the Web Risk servers for verification. The servers will return all full-length SHA 256 hashes that contain the given hash prefix. If one of those full-length hashes matches the full-length hash of the URL in question, then the URL is considered unsafe. If none of the full-length hashes match the full-length hash of the URL in question, then that URL is considered safe.
At no point does Google learn about the URLs you are examining. Google does learn the hash prefixes of URLs, but the hash prefixes don't provide much information about the actual URLs.
To check if a URL is on a Web Risk list, send an HTTP GET
request
to the hashes.search
method:
- The HTTP
GET
request includes the hash prefix of the URL to be checked. - The HTTP
GET
response returns the matching full-length hashes along with the positive and negative expire times.
Example: hashes.search
HTTP GET request
In the following example, the names of two Web Risk lists and a hash
prefix are sent for comparison and verification. For more details, see the
hashes.search
query parameters
and the explanations that follow the code example.
curl \ -H "Content-Type: application/json" \ "https://webrisk.googleapis.com/v1/hashes:search?key=YOUR_API_KEY&threatTypes=MALWARE&threatTypes=SOCIAL_ENGINEERING&hashPrefix=WwuJdQ%3D%3D"
Java
Python
Web Risk lists
The threatTypes
field identifies the Web Risk lists. In the
example, two lists are identified: MALWARE
and SOCIAL_ENGINEERING
.
Threat hash prefixes
The hashPrefix
field contains the hash prefix of the URL that you want to
check. This field must contain the exact hash prefix that is present in the
local database. For example, if the local hash prefix is 4 bytes long then
the hashPrefix
field must be 4 bytes long. If the local hash prefix was
lengthened to 7 bytes then the hashPrefix
field must be 7 bytes long.
HTTP GET response
In the following example, the response returns the matching threats, containing the Web Risk lists they matched, along with the expire times.
Response body
The response body includes the match information (the list names and the full
length hashes and the cache durations). For more details, see the
hashes.search
response body
and the explanations that follow the code example.
{ "threats": [{ "threatTypes": ["MALWARE"], "hash": "WwuJdQx48jP-4lxr4y2Sj82AWoxUVcIRDSk1PC9Rf-4=" "expireTime": "2019-07-17T15:01:23.045123456Z" }, { "threatTypes": ["MALWARE", "SOCIAL_ENGINEERING"], "hash": "WwuJdQxaCSH453-uytERC456gf45rFExcE23F7-hnfD=" "expireTime": "2019-07-17T15:01:23.045123456Z" }, }], "negativeExpireTime": "2019-07-17T15:01:23.045123456Z" }
Matches
The threats
field returns a matching full-length hashes for the hash prefix.
The URLs corresponding to these hashes are considered unsafe. If no match is
found for a hash prefix, nothing is returned; the URL corresponding to that hash
prefix is considered safe.
Expire time
The expireTime
and negativeExpireTime
fields indicate until when the hashes
must be considered either unsafe or safe respectively. For more details,
see Caching.