使用 Update API

概览

通过 Update API,您的客户端应用可以下载 Web Risk 列表的哈希版本,以存储在本地或内存数据库中。然后,您可以在本地检查网址。当在本地数据库中找到匹配项时,客户端会向 Web Risk 服务器发送请求,以验证该网址是否包含在 Web Risk 列表中。

更新本地数据库

要及时了解最新信息,客户端必须定期更新其本地数据库中的 Web Risk 列表。为了节省带宽,客户端应下载网址的哈希前缀,而不是原始网址。例如,如果“www.badurl.com/”位于 Web Risk 列表中,则客户端会下载该网址的 SHA256 哈希前缀,而不是网址本身。在大多数情况下,哈希前缀的长度为 4 个字节,这意味着在压缩之前下载单个列表条目的平均带宽费用为 4 个字节。

如要更新本地数据库中的 Web Risk 列表,请向 threatLists.computeDiff 方法发送 HTTP GET 请求:

  • HTTP GET 请求包含要更新列表的名称以及考虑内存和带宽限制的客户端约束。
  • HTTP GET 响应会返回完整更新或部分更新。响应还可以返回建议的等待时间,直到下一次计算差异操作。

示例:threatLists.computeDiff

HTTP GET 请求

在以下示例中,请求了 MALWARE Web Risk 列表的差异。如需了解详情,请参阅 threatLists.computeDiff 查询参数以及代码示例后面的说明。

HTTP 方法和网址:

GET https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW&key=API_KEY

如需发送请求,请选择以下方式之一:

curl

执行以下命令:

curl -X GET \
"https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW&key=API_KEY"

PowerShell

执行以下命令:

$headers = @{  }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://webrisk.googleapis.com/v1/threatLists:computeDiff?threatType=MALWARE&versionToken=Gg4IBBADIgYQgBAiAQEoAQ%3D%3D&constraints.maxDiffEntries=2048&constraints.maxDatabaseEntries=4096&constraints.supportedCompressions=RAW&key=API_KEY" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应:

{
  "recommendedNextDiff": "2020-01-08T19:41:45.436722194Z",
  "responseType": "RESET",
  "additions": {
    "rawHashes": [
      {
        "prefixSize": 4,
        "rawHashes": "AArQMQAMoUgAPn8lAE..."
      }
    ]
  },
  "newVersionToken": "ChAIARAGGAEiAzAwMSiAEDABEPDyBhoCGAlTcIVL",
  "checksum": {
    "sha256": "wy6jh0+MAg/V/+VdErFhZIpOW+L8ulrVwhlV61XkROI="
  }
}

Java


import com.google.cloud.webrisk.v1.WebRiskServiceClient;
import com.google.protobuf.ByteString;
import com.google.webrisk.v1.CompressionType;
import com.google.webrisk.v1.ComputeThreatListDiffRequest;
import com.google.webrisk.v1.ComputeThreatListDiffRequest.Constraints;
import com.google.webrisk.v1.ComputeThreatListDiffResponse;
import com.google.webrisk.v1.ThreatType;
import java.io.IOException;

public class ComputeThreatListDiff {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    // The threat list to update. Only a single ThreatType should be specified per request.
    ThreatType threatType = ThreatType.MALWARE;

    // The current version token of the client for the requested list. If the client does not have
    // a version token (this is the first time calling ComputeThreatListDiff), this may be
    // left empty and a full database snapshot will be returned.
    ByteString versionToken = ByteString.EMPTY;

    // The maximum size in number of entries. The diff will not contain more entries
    // than this value. This should be a power of 2 between 2**10 and 2**20.
    // If zero, no diff size limit is set.
    int maxDiffEntries = 1024;

    // Sets the maximum number of entries that the client is willing to have in the local database.
    // This should be a power of 2 between 2**10 and 2**20. If zero, no database size limit is set.
    int maxDatabaseEntries = 1024;

    // The compression type supported by the client.
    CompressionType compressionType = CompressionType.RAW;

    computeThreatDiffList(threatType, versionToken, maxDiffEntries, maxDatabaseEntries,
        compressionType);
  }

  // Gets the most recent threat list diffs. These diffs should be applied to a local database of
  // hashes to keep it up-to-date.
  // If the local database is empty or excessively out-of-date,
  // a complete snapshot of the database will be returned. This Method only updates a
  // single ThreatList at a time. To update multiple ThreatList databases, this method needs to be
  // called once for each list.
  public static void computeThreatDiffList(ThreatType threatType, ByteString versionToken,
      int maxDiffEntries, int maxDatabaseEntries, CompressionType compressionType)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the `webRiskServiceClient.close()` method on the client to safely
    // clean up any remaining background resources.
    try (WebRiskServiceClient webRiskServiceClient = WebRiskServiceClient.create()) {

      Constraints constraints = Constraints.newBuilder()
          .setMaxDiffEntries(maxDiffEntries)
          .setMaxDatabaseEntries(maxDatabaseEntries)
          .addSupportedCompressions(compressionType)
          .build();

      ComputeThreatListDiffResponse response = webRiskServiceClient.computeThreatListDiff(
          ComputeThreatListDiffRequest.newBuilder()
              .setThreatType(threatType)
              .setVersionToken(versionToken)
              .setConstraints(constraints)
              .build());

      // The returned response contains the following information:
      // https://cloud.google.com/web-risk/docs/reference/rpc/google.cloud.webrisk.v1#computethreatlistdiffresponse
      // Type of response: DIFF/ RESET/ RESPONSE_TYPE_UNSPECIFIED
      System.out.println(response.getResponseType());
      // List of entries to add and/or remove.
      // System.out.println(response.getAdditions());
      // System.out.println(response.getRemovals());

      // New version token to be used the next time when querying.
      System.out.println(response.getNewVersionToken());

      // Recommended next diff timestamp.
      System.out.println(response.getRecommendedNextDiff());

      System.out.println("Obtained threat list diff.");
    }
  }
}

Python

from google.cloud import webrisk_v1
from google.cloud.webrisk_v1 import ComputeThreatListDiffResponse

def compute_threatlist_diff(
    threat_type: webrisk_v1.ThreatType,
    version_token: bytes,
    max_diff_entries: int,
    max_database_entries: int,
    compression_type: webrisk_v1.CompressionType,
) -> ComputeThreatListDiffResponse:
    """Gets the most recent threat list diffs.

    These diffs should be applied to a local database of hashes to keep it up-to-date.
    If the local database is empty or excessively out-of-date,
    a complete snapshot of the database will be returned. This Method only updates a
    single ThreatList at a time. To update multiple ThreatList databases, this method needs to be
    called once for each list.

    Args:
        threat_type: The threat list to update. Only a single ThreatType should be specified per request.
            threat_type = webrisk_v1.ThreatType.MALWARE

        version_token: The current version token of the client for the requested list. If the
            client does not have a version token (this is the first time calling ComputeThreatListDiff),
            this may be left empty and a full database snapshot will be returned.

        max_diff_entries: The maximum size in number of entries. The diff will not contain more entries
            than this value. This should be a power of 2 between 2**10 and 2**20.
            If zero, no diff size limit is set.
            max_diff_entries = 1024

        max_database_entries: Sets the maximum number of entries that the client is willing to have in the local database.
            This should be a power of 2 between 2**10 and 2**20. If zero, no database size limit is set.
            max_database_entries = 1024

        compression_type: The compression type supported by the client.
            compression_type = webrisk_v1.CompressionType.RAW

    Returns:
        The response which contains the diff between local and remote threat lists. In addition to the threat list,
        the response also contains the version token and the recommended time for next diff.
    """

    webrisk_client = webrisk_v1.WebRiskServiceClient()

    constraints = webrisk_v1.ComputeThreatListDiffRequest.Constraints()
    constraints.max_diff_entries = max_diff_entries
    constraints.max_database_entries = max_database_entries
    constraints.supported_compressions = [compression_type]

    request = webrisk_v1.ComputeThreatListDiffRequest()
    request.threat_type = threat_type
    request.version_token = version_token
    request.constraints = constraints

    response = webrisk_client.compute_threat_list_diff(request)

    # The returned response contains the following information:
    # https://cloud.google.com/web-risk/docs/reference/rpc/google.cloud.webrisk.v1#computethreatlistdiffresponse
    # Type of response: DIFF/ RESET/ RESPONSE_TYPE_UNSPECIFIED
    print(response.response_type)
    # New version token to be used the next time when querying.
    print(response.new_version_token)
    # Recommended next diff timestamp.
    print(response.recommended_next_diff)

    return response

Web Risk 列表

threatType 字段用于标识 Web Risk 列表。在该示例中,请求了 MALWARE Web Risk 列表的差异。

版本令牌

versionToken 字段包含 Web Risk 列表的当前客户端状态。版本令牌在 threatLists.computeDiff 响应newVersionToken 字段中返回。 对于初始更新,请将 versionToken 字段留空。

尺寸限制

maxDiffEntries 字段指定客户端可以管理的更新总数(在此示例中为 2048)。maxDatabaseEntries 字段指定本地数据库可以管理的总条目数(在示例中为 4096)。客户端应设置大小限制,以保护内存和带宽限制,并防止名单增长。如需了解详情,请参阅更新限制条件

支持压缩

supportedCompressions 字段列出了客户端支持的压缩类型。在此示例中,客户端仅支持未压缩的原始数据。但 Web Risk 支持其他压缩类型。如需了解详情,请参阅压缩

HTTP GET 响应

在此示例中,响应使用请求的压缩类型返回 Web Risk 列表的部分更新。

响应正文

响应正文包含差异信息(响应类型、要应用于本地数据库的添加和移除项、新版本令牌以及校验和)。

在该示例中,响应还包含建议的下一个差异时间。如需了解详情,请参阅 threatLists.computeDiff 响应正文以及代码示例后面的说明。

{
  "responseType" :   "DIFF",
  "recommendedNextDiff": "2019-12-31T23:59:59.000000000Z",
  "additions": {
    "compressionType": "RAW",
    "rawHashes": [{
      "prefixSize": 4,
      "rawHashes":  "rnGLoQ=="
    }]
  },
  "removals": {
    "rawIndices": {
      "indices": [0, 2, 4]
    }
  },
  "newVersionToken": "ChAIBRADGAEiAzAwMSiAEDABEAFGpqhd",
  "checksum": {
    "sha256": "YSgoRtsRlgHDqDA3LAhM1gegEpEzs1TjzU33vqsR8iM="
  },
  "recommendedNextDiff": "2019-07-17T15:01:23.045123456Z"
}

数据库差异

responseType 字段将指示部分更新 (DIFF) 或完全更新 (RESET)。在该示例中,返回了部分差异,因此响应包括添加和删除项。可能有多组添加项,但只有一组删除项。如需了解详情,请参阅 Database Diffs

新版本令牌

newVersionToken 字段包含新更新的 Web Risk 列表的新版本令牌。客户端必须为后续更新请求保存新的客户端状态(threatLists.computeDiff 请求中的 versionToken 字段)。

校验和

校验和可让客户端验证本地数据库是否未受到任何损坏。如果校验和不匹配,则客户端必须清除数据库并使用空 versionToken 字段重新发布更新。但是,在这种情况下,客户端仍必须遵循更新的时间间隔。如需了解详情,请参阅请求频率

recommendedNextDiff 字段指示在客户端发送其他更新请求之前应等待的时间戳。请注意,响应可能包含建议的等待时间段,也可能不包含。如需了解详情,请参阅请求频率

检查网址

如要检查某个网址是否在 Web Risk 列表中,客户端必须先计算该网址的哈希值和哈希前缀。如需了解详情,请参阅网址和哈希技术。然后,客户端会查询本地数据库以确定是否存在匹配。如果本地数据库中不存在哈希前缀,则该网址会被视为安全网址(即不在 Web Risk 列表中)。

如果本地数据库中存在哈希前缀(哈希前缀冲突),则客户端必须将哈希前缀发送到 Web Risk 服务器进行验证。服务器将返回包含给定哈希前缀的所有全长 SHA 256 哈希。如果其中一个全长哈希与相关网址的全长哈希值匹配,则该网址会被视为不安全网址。 如果没有任何全长的哈希与相关网址的全长哈希匹配,则该网址会被视为安全网址。

Google 绝不会了解您正在检查的网址。Google 确实会学习网址的哈希前缀,但哈希前缀不会提供有关实际网址的大量信息。

如要检查某个网址是否在 Web Risk 列表中,请向 hashes.search 方法发送 HTTP GET 请求:

  • HTTP GET 请求包含要检查的网址的哈希前缀。
  • HTTP GET 响应会返回匹配的全长哈希值以及正负到期时间。

示例:hashes.search

HTTP GET 请求

在以下示例中,将发送两个 Web Risk 列表的名称和哈希前缀进行比较和验证。如需了解详情,请参阅 hashes.search 查询参数以及代码示例后面的说明。

curl \
  -H "Content-Type: application/json" \
  "https://webrisk.googleapis.com/v1/hashes:search?key=YOUR_API_KEY&threatTypes=MALWARE&threatTypes=SOCIAL_ENGINEERING&hashPrefix=WwuJdQ%3D%3D"

Java


import com.google.cloud.webrisk.v1.WebRiskServiceClient;
import com.google.protobuf.ByteString;
import com.google.webrisk.v1.SearchHashesRequest;
import com.google.webrisk.v1.SearchHashesResponse;
import com.google.webrisk.v1.SearchHashesResponse.ThreatHash;
import com.google.webrisk.v1.ThreatType;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
import java.util.Base64;
import java.util.List;

public class SearchHashes {

  public static void main(String[] args) throws IOException, NoSuchAlgorithmException {
    // TODO(developer): Replace these variables before running the sample.
    // A hash prefix, consisting of the most significant 4-32 bytes of a SHA256 hash.
    // For JSON requests, this field is base64-encoded. Note that if this parameter is provided
    // by a URI, it must be encoded using the web safe base64 variant (RFC 4648).
    String uri = "http://example.com";
    String encodedUri = Base64.getUrlEncoder().encodeToString(uri.getBytes(StandardCharsets.UTF_8));
    MessageDigest digest = MessageDigest.getInstance("SHA-256");
    byte[] encodedHashPrefix = digest.digest(encodedUri.getBytes(StandardCharsets.UTF_8));

    // The ThreatLists to search in. Multiple ThreatLists may be specified.
    // For the list on threat types, see: https://cloud.google.com/web-risk/docs/reference/rpc/google.cloud.webrisk.v1#threattype
    List<ThreatType> threatTypes = Arrays.asList(ThreatType.MALWARE, ThreatType.SOCIAL_ENGINEERING);

    searchHash(ByteString.copyFrom(encodedHashPrefix), threatTypes);
  }

  // Gets the full hashes that match the requested hash prefix.
  // This is used after a hash prefix is looked up in a threatList and there is a match.
  // The client side threatList only holds partial hashes so the client must query this method
  // to determine if there is a full hash match of a threat.
  public static void searchHash(ByteString encodedHashPrefix, List<ThreatType> threatTypes)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the `webRiskServiceClient.close()` method on the client to safely
    // clean up any remaining background resources.
    try (WebRiskServiceClient webRiskServiceClient = WebRiskServiceClient.create()) {

      // Set the hashPrefix and the threat types to search in.
      SearchHashesResponse response = webRiskServiceClient.searchHashes(
          SearchHashesRequest.newBuilder()
              .setHashPrefix(encodedHashPrefix)
              .addAllThreatTypes(threatTypes)
              .build());

      // Get all the hashes that match the prefix. Cache the returned hashes until the time
      // specified in threatHash.getExpireTime()
      // For more information on response type, see: https://cloud.google.com/web-risk/docs/reference/rpc/google.cloud.webrisk.v1#threathash
      for (ThreatHash threatHash : response.getThreatsList()) {
        System.out.println(threatHash.getHash());
      }
      System.out.println("Completed searching threat hashes.");
    }
  }
}

Python

from google.cloud import webrisk_v1

def search_hashes(hash_prefix: bytes, threat_type: webrisk_v1.ThreatType) -> list:
    """Gets the full hashes that match the requested hash prefix.

    This is used after a hash prefix is looked up in a threatList and there is a match.
    The client side threatList only holds partial hashes so the client must query this method
    to determine if there is a full hash match of a threat.

    Args:
        hash_prefix: A hash prefix, consisting of the most significant 4-32 bytes of a SHA256 hash.
            For JSON requests, this field is base64-encoded. Note that if this parameter is provided
            by a URI, it must be encoded using the web safe base64 variant (RFC 4648).
            Example:
                uri = "http://example.com"
                sha256 = sha256()
                sha256.update(base64.urlsafe_b64encode(bytes(uri, "utf-8")))
                hex_string = sha256.digest()

        threat_type: The ThreatLists to search in. Multiple ThreatLists may be specified.
            For the list on threat types, see:
            https://cloud.google.com/web-risk/docs/reference/rpc/google.cloud.webrisk.v1#threattype
            threat_type = [webrisk_v1.ThreatType.MALWARE, webrisk_v1.ThreatType.SOCIAL_ENGINEERING]

    Returns:
        A hash list that contain all hashes that matches the given hash prefix.
    """
    webrisk_client = webrisk_v1.WebRiskServiceClient()

    # Set the hashPrefix and the threat types to search in.
    request = webrisk_v1.SearchHashesRequest()
    request.hash_prefix = hash_prefix
    request.threat_types = [threat_type]

    response = webrisk_client.search_hashes(request)

    # Get all the hashes that match the prefix. Cache the returned hashes until the time
    # specified in threat_hash.expire_time
    # For more information on response type, see:
    # https://cloud.google.com/web-risk/docs/reference/rpc/google.cloud.webrisk.v1#threathash
    hash_list = []
    for threat_hash in response.threats:
        hash_list.append(threat_hash.hash)
    return hash_list

Web Risk 列表

threatTypes 字段用于标识 Web Risk 列表。在此示例中,识别了两个列表:MALWARESOCIAL_ENGINEERING

威胁哈希前缀

hashPrefix 字段包含您要检查的网址的哈希前缀。此字段必须包含本地数据库中存在的确切哈希前缀。例如,如果本地哈希前缀的长度为 4 个字节,则 hashPrefix 字段的长度必须为 4 个字节。如果本地哈希前缀被延长为 7 个字节,则 hashPrefix 字段的长度必须为 7 个字节。

HTTP GET 响应

在以下示例中,响应将返回匹配的威胁(包含所匹配的 Web Risk 列表)以及到期时间。

响应正文

响应正文包含匹配信息(列表名称、全长哈希和缓存时长)。如需了解详情,请参阅 hashes.search 响应正文以及代码示例后面的说明。

{
  "threats": [{
      "threatTypes": ["MALWARE"],
      "hash": "WwuJdQx48jP-4lxr4y2Sj82AWoxUVcIRDSk1PC9Rf-4="
      "expireTime": "2019-07-17T15:01:23.045123456Z"
    }, {
      "threatTypes": ["MALWARE", "SOCIAL_ENGINEERING"],
      "hash": "WwuJdQxaCSH453-uytERC456gf45rFExcE23F7-hnfD="
      "expireTime": "2019-07-17T15:01:23.045123456Z"
    },
  }],
  "negativeExpireTime": "2019-07-17T15:01:23.045123456Z"
}

匹配数

threats 字段会为哈希前缀返回匹配的全长哈希。 与这些哈希对应的网址被视为不安全。如果未找到与哈希前缀匹配的内容,则不会返回任何内容;则该哈希前缀对应的网址会被视为安全网址。

过期时间

expireTimenegativeExpireTime 字段指示直到哈希必须被视为不安全或安全为止的时间。如需了解详情,请参阅缓存