Caching

This document applies to the following methods:

About caching

To reduce client bandwidth usage and to protect Google from traffic spikes, clients of both the Lookup API and the Update API are required to create and maintain a local cache of threat data. The Lookup API uses the cache to reduce the number of uris.search requests that clients send to Google. For the Update API, the cache is used to reduce the number of hashes.search requests that clients send to Google. The caching protocol for each API is outlined below.

Lookup API

Clients of the Lookup API should cache each returned ThreatUrl item until the the time defined in the expireTime field. Clients then need to consult the cache before making a subsequent uris.search request to the server. If the cache entry for a previously returned ThreatUrl has not yet expired, the client should assume the item is still unsafe. Caching ThreatUrl items may reduce the number of API requests made by the client.

Update API

To reduce the overall number of hashes.search requests sent to Google using the Update API, clients are required to maintain a local cache. The API establishes two types of caching, positive and negative.

Positive caching

To prevent clients from repeatedly asking about the state of a particular unsafe full hash, each returned ThreatHash contains a positive cache time (defined by the expireTime field). The full hash can be considered unsafe until this time.

Negative caching

To prevent clients from repeatedly asking about the state of a particular safe full hash, the response defines a negative cache duration for the requested prefix (defined by the negativeExpireTime field). All full hashes with the requested prefix are to be considered safe for the requested threat types until this time, except for those returned by the server as unsafe. This caching is particularly important as it prevents traffic overload that could be caused by a hash prefix collision with a safe URL that receives a lot of traffic.

Consulting the cache

When the client wants to check the state of a URL, it first computes its full hash. If the full hash's prefix is present in the local database, the client should then consult its cache before making a hashes.search request to the server.

First, clients should check for a positive cache hit. If there exists an unexpired positive cache entry for the full hash of interest, it should be considered unsafe. If the positive cache entry expired, the client must send a hashes.search request for the associated local prefix. Per the protocol, if the server returns the full hash, it is considered unsafe; otherwise, it's considered safe.

If there are no positive cache entries for the full hash, the client should check for a negative cache hit. If there exists an unexpired negative cache entry for the associated local prefix, the full hash is considered safe. If the negative cache entry expired, or it doesn't exist, the client must send a hashes.search request for the associated local prefix and interpret the response as normal.

Updating the cache

The client cache should be updated whenever a hashes.search response is received. A positive cache entry should be created or updated for the full hash per the expireTime field. The hash prefix's negative cache duration should also be created or updated per the response's negativeExpireTime field.

If a subsequent hashes.search request does not return a full hash that is currently positively cached, the client is not required to remove the positive cache entry. This is not cause for concern in practice, since positive cache durations are typically short (a few minutes) to allow for quick correction of false positives.

Example scenario

In the following example, assume h(url) is the hash prefix of the URL and H(url) is the full-length hash of the URL. That is, h(url) = SHA256(url).substr(4), H(url) = SHA256(url).

Assume a client with an empty cache visits example.com/ and sees that h(example.com/) is in the local database. The client requests the full-length hashes for hash prefix h(example.com/) and receives back the full-length hash H(example.com/) together with a positive cache expire time of 5 minutes from now and a negative cache expire time of 1 hour from now.

The positive cache duration of 5 minutes tells the client how long the full-length hash H(example.com/) must be considered unsafe without sending another hashes.search request. After 5 minutes the client must issue another hashes.search request for that prefix h(example.com/) if the client visits example.com/ again. The client should reset the hash prefix's negative cache expire time per the new response.

The negative cache duration of 1 hour tells the client how long all the other full-length hashes besides H(example.com/) that share the same prefix of h(example.com/) must be considered safe. For the duration of 1 hour, every URL such that h(URL) = h(example.com/) must be considered safe, and therefore not result in a hashes.search request (assuming that H(URL) != H(example.com/)).

If the fullHashes response contains zero matches and a negative cache expire time is set, then the client must not issue any hashes.search requests for any of the requested prefixes for the given negative cache time.

If the hashes.search response contains one or more matches, a negative cache expire time is still set for the entire response. In that case, the cache expire time of a single full hash indicates how long that the client must assume the particular full-length hash is unsafe. After the ThreatHash cache duration elapses, the client must refresh the full-length hash by issuing a hashes.search request for that hash prefix if the requested URL matches the existing full-length hash in the cache. In that case the negative cache duration does not apply. The response's negative cache duration only applies to full-length hashes that were not present in the hashes.search response. For full-length hashes that are not present in the response, the client must refrain from issuing any hashes.search requests until the negative cache duration is elapsed.