借助 Cloud Data Loss Prevention (DLP) 的内置 infoType 检测器,您可以有效地查找常见类型的敏感数据。借助自定义 infoType 检测器,您可以全面定制自己的敏感数据检测器。检查规则可通过修改所给定的 infoType 检测器的检测机制,帮助优化 Cloud DLP 返回的扫描结果。
如果要从内置 infoType 检测器返回的结果中排除或包含更多值,您可以从头开始创建新的自定义 infoType,并定义 Cloud DLP 应遵循的所有条件。或者,您也可以根据自己指定的条件,优化 Cloud DLP 的内置或自定义检测器返回的结果。您可添加检查规则来实现此目的,它们有助于减少噪声、提高精确率和召回率,或者调整扫描结果的确定性。
本主题讨论如何使用两种类型的检查规则来排除某些结果或添加额外结果,所有操作均根据您指定的自定义条件执行。本主题还将介绍几种您可能需要更改现有 infoType 检测器的场景。
这两种检查规则是:
排除规则
排除规则对以下情况非常有用:
- 您希望排除结果中因 infoType 检测器重叠而导致的重复的扫描匹配项。例如,您要扫描电子邮件地址和电话号码,但却获得两个带电话号码的电子邮件地址匹配项,例如“206-555-0764@example.org”。
- 您在扫描结果中遇到了噪声。例如,您发现在扫描合法电子邮件地址时,同一虚拟电子邮件地址(例如“example@example.com”)或网域(例如“example.com”)返回的次数过多。
- 您有一个希望从结果中排除的术语、短语或字符组合的列表。
排除规则 API 概览
Cloud DLP 在 ExclusionRule
对象中定义排除规则。在 ExclusionRule
中,您可指定下列任一对象:
Dictionary
对象,表示排除规则是常规字典规则。Regex
对象,表示排除规则是正则表达式规则。ExcludeInfoTypes
对象,包含一组 infoType 检测器。如果此处所列的任何 infoType 检测器匹配到结果,就会将结果从扫描结果中排除。
排除规则示例场景
下列各个 JSON 代码段均说明如何针对指定场景配置 Cloud DLP。
在 EMAIL_ADDRESS 检测器扫描中忽略特定电子邮件地址
以下采用多种语言的 JSON 代码段和代码说明如何通过 InspectConfig
向 Cloud DLP 指示应在使用 infoType 检测器 EMAIL_ADDRESS
的扫描中避免匹配“example@example.com”:
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"rules":[
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"example@example.com"
]
}
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_with_exclusion_dict(
project, content_string, exclusion_list=["example@example.com"]
):
"""Inspects the provided text, avoiding matches specified in the exclusion list
Uses the Data Loss Prevention API to omit matches on EMAIL_ADDRESS if they are
in the specified exclusion list.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
exclusion_list: The list of strings to ignore matches on
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a list of infoTypes for DLP to locate in `content_string`. See
# https://cloud.google.com/dlp/docs/concepts-infotypes for more information
# about supported infoTypes.
info_types_to_locate = [{"name": "EMAIL_ADDRESS"}]
# Construct a rule set that will only match on EMAIL_ADDRESS
# if the match text is not in the exclusion list.
rule_set = [
{
"info_types": info_types_to_locate,
"rules": [
{
"exclusion_rule": {
"dictionary": {
"word_list": {
"words": exclusion_list
},
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_FULL_MATCH,
}
}
],
}
]
# Construct the configuration dictionary
inspect_config = {
"info_types": info_types_to_locate,
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class InspectStringWithExclusionDict {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "Some email addresses: gary@example.com, example@example.com";
List<String> excludedMatchList = Arrays.asList("example@example.com");
inspectStringWithExclusionDict(projectId, textToInspect, excludedMatchList);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringWithExclusionDict(
String projectId, String textToInspect, List<String> excludedMatchList) throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
List<InfoType> infoTypes = new ArrayList<>();
for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Exclude matches from the specified excludedMatchList.
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setMatchingType(MatchingType.MATCHING_TYPE_FULL_MATCH)
.setDictionary(
Dictionary.newBuilder()
.setWordList(WordList.newBuilder().addAllWords(excludedMatchList)))
.build();
// Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
public class InspectStringWithExclusionDict
{
public static InspectContentResponse Inspect(string projectId, string textToInspect, List<String> excludedMatchList)
{
var dlp = DlpServiceClient.Create();
var byteContentItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem { ByteItem = byteContentItem };
var infoTypes = new string[] { "PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER" }.Select(it => new InfoType { Name = it });
var exclusionRule = new ExclusionRule
{
MatchingType = MatchingType.FullMatch,
Dictionary = new CustomInfoType.Types.Dictionary
{
WordList = new CustomInfoType.Types.Dictionary.Types.WordList
{
Words = { excludedMatchList }
}
}
};
var ruleSet = new InspectionRuleSet
{
InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
var config = new InspectConfig
{
InfoTypes = { infoTypes },
IncludeQuote = true,
RuleSet = { ruleSet }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
在 EMAIL_ADDRESS 检测器扫描中忽略以特定网域结尾的电子邮件地址
以下采用多种语言的 JSON 代码段和代码说明如何通过 InspectConfig
向 Cloud DLP 指示应在使用 infoType 检测器 EMAIL_ADDRESS
的扫描中避免匹配以“@example.com”结尾的任何电子邮件地址:
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"rules":[
{
"exclusionRule":{
"regex":{
"pattern":".+@example.com"
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_with_exclusion_regex(
project, content_string, exclusion_regex=".+@example.com"
):
"""Inspects the provided text, avoiding matches specified in the exclusion regex
Uses the Data Loss Prevention API to omit matches on EMAIL_ADDRESS if they match
the specified exclusion regex.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
exclusion_regex: The regular expression to exclude matches on
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a list of infoTypes for DLP to locate in `content_string`. See
# https://cloud.google.com/dlp/docs/concepts-infotypes for more information
# about supported infoTypes.
info_types_to_locate = [{"name": "EMAIL_ADDRESS"}]
# Construct a rule set that will only match on EMAIL_ADDRESS
# if the specified regex doesn't also match.
rule_set = [
{
"info_types": info_types_to_locate,
"rules": [
{
"exclusion_rule": {
"regex": {
"pattern": exclusion_regex,
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_FULL_MATCH,
}
}
],
}
]
# Construct the configuration dictionary
inspect_config = {
"info_types": info_types_to_locate,
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class InspectStringWithExclusionRegex {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "Some email addresses: gary@example.com, bob@example.org";
String excludedRegex = ".+@example.com";
inspectStringWithExclusionRegex(projectId, textToInspect, excludedRegex);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringWithExclusionRegex(
String projectId, String textToInspect, String excludedRegex) throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
List<InfoType> infoTypes = new ArrayList<>();
for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Exclude matches from the specified excludedMatchList.
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setMatchingType(MatchingType.MATCHING_TYPE_FULL_MATCH)
.setRegex(Regex.newBuilder().setPattern(excludedRegex))
.build();
// Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
public class InspectStringWithExclusionRegex
{
public static InspectContentResponse Inspect(string projectId, string textToInspect, string excludedRegex)
{
var dlp = DlpServiceClient.Create();
var byteContentItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem { ByteItem = byteContentItem };
var infoTypes = new string[] { "PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER" }.Select(it => new InfoType { Name = it });
var exclusionRule = new ExclusionRule
{
MatchingType = MatchingType.FullMatch,
Regex = new CustomInfoType.Types.Regex { Pattern = excludedRegex }
};
var ruleSet = new InspectionRuleSet
{
InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
var config = new InspectConfig
{
InfoTypes = { infoTypes },
IncludeQuote = true,
RuleSet = { ruleSet }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
忽略包含子字符串“TEST”的扫描匹配项
以下采用多种语言的 JSON 代码段和代码说明如何通过 InspectConfig
向 Cloud DLP 指示应从指定的 infoType 列表中排除包含令牌“TEST”的任何结果。
请注意,这相当于将“TEST”作为令牌(而非子字符串)匹配,因此,虽然“TEST@email.com”之类的内容会匹配,但“TESTER@email.com”则不会。如果需要匹配子字符串,请在排除规则中使用正则表达式,而不是字典。
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
},
{
"name":"DOMAIN_NAME"
},
{
"name":"PHONE_NUMBER"
},
{
"name":"PERSON_NAME"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
},
{
"name":"DOMAIN_NAME"
},
{
"name":"PHONE_NUMBER"
},
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"TEST"
]
}
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_with_exclusion_dict_substring(
project, content_string, exclusion_list=["TEST"]
):
"""Inspects the provided text, avoiding matches that contain excluded tokens
Uses the Data Loss Prevention API to omit matches if they include tokens
in the specified exclusion list.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
exclusion_list: The list of strings to ignore partial matches on
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a list of infoTypes for DLP to locate in `content_string`. See
# https://cloud.google.com/dlp/docs/concepts-infotypes for more information
# about supported infoTypes.
info_types_to_locate = [{"name": "EMAIL_ADDRESS"}, {"name": "DOMAIN_NAME"}]
# Construct a rule set that will only match if the match text does not
# contains tokens from the exclusion list.
rule_set = [
{
"info_types": info_types_to_locate,
"rules": [
{
"exclusion_rule": {
"dictionary": {
"word_list": {
"words": exclusion_list
},
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
}
}
],
}
]
# Construct the configuration dictionary
inspect_config = {
"info_types": info_types_to_locate,
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class InspectStringWithExclusionDictSubstring {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "Some email addresses: gary@example.com, TEST@example.com";
List<String> excludedSubstringList = Arrays.asList("TEST");
inspectStringWithExclusionDictSubstring(projectId, textToInspect, excludedSubstringList);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringWithExclusionDictSubstring(
String projectId, String textToInspect, List<String> excludedSubstringList)
throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
List<InfoType> infoTypes = new ArrayList<>();
for (String typeName :
new String[] {"EMAIL_ADDRESS", "DOMAIN_NAME", "PHONE_NUMBER", "PERSON_NAME"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Exclude partial matches from the specified excludedSubstringList.
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
.setDictionary(
Dictionary.newBuilder()
.setWordList(WordList.newBuilder().addAllWords(excludedSubstringList)))
.build();
// Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addAllInfoTypes(infoTypes)
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
public class InspectStringWithExclusionDictSubstring
{
public static InspectContentResponse Inspect(string projectId, string textToInspect, List<String> excludedSubstringList)
{
var dlp = DlpServiceClient.Create();
var byteItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem { ByteItem = byteItem };
var infoTypes = new string[]
{
"EMAIL_ADDRESS",
"DOMAIN_NAME",
"PHONE_NUMBER",
"PERSON_NAME"
}.Select(it => new InfoType { Name = it });
var exclusionRule = new ExclusionRule
{
MatchingType = MatchingType.PartialMatch,
Dictionary = new CustomInfoType.Types.Dictionary
{
WordList = new CustomInfoType.Types.Dictionary.Types.WordList
{
Words = { excludedSubstringList }
}
}
};
var ruleSet = new InspectionRuleSet
{
InfoTypes = { infoTypes },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
var config = new InspectConfig
{
InfoTypes = { infoTypes },
IncludeQuote = true,
RuleSet = { ruleSet }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
在自定义 infoType 检测器扫描中忽略包含子字符串“Jimmy”的扫描匹配项
以下采用多种语言的 JSON 代码段和代码说明如何通过 InspectConfig
向 Cloud DLP 指示应在使用指定自定义正则表达式检测器的扫描中避免匹配“Jimmy”这个名字:
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"customInfoTypes":[
{
"infoType":{
"name":"CUSTOM_NAME_DETECTOR"
},
"regex":{
"pattern":"[A-Z][a-z]{1,15}, [A-Z][a-z]{1,15}"
}
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"CUSTOM_NAME_DETECTOR"
}
],
"rules":[
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"Jimmy"
]
}
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_custom_excluding_substring(
project, content_string, exclusion_list=["jimmy"]
):
"""Inspects the provided text with a custom detector, avoiding matches on specific tokens
Uses the Data Loss Prevention API to omit matches on a custom detector
if they include tokens in the specified exclusion list.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
exclusion_list: The list of strings to ignore matches on
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a custom regex detector for names
custom_info_types = [
{
"info_type": {"name": "CUSTOM_NAME_DETECTOR"},
"regex": {"pattern": "[A-Z][a-z]{1,15}, [A-Z][a-z]{1,15}"},
}
]
# Construct a rule set that will only match if the match text does not
# contains tokens from the exclusion list.
rule_set = [
{
"info_types": [{"name": "CUSTOM_NAME_DETECTOR"}],
"rules": [
{
"exclusion_rule": {
"dictionary": {
"word_list": {
"words": exclusion_list
},
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
}
}
],
}
]
# Construct the configuration dictionary
inspect_config = {
"custom_info_types": custom_info_types,
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
public class InspectStringCustomExcludingSubstring {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "Name: Doe, John. Name: Example, Jimmy";
String customDetectorPattern = "[A-Z][a-z]{1,15}, [A-Z][a-z]{1,15}";
List<String> excludedSubstringList = Arrays.asList("Jimmy");
inspectStringCustomExcludingSubstring(
projectId, textToInspect, customDetectorPattern, excludedSubstringList);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringCustomExcludingSubstring(
String projectId,
String textToInspect,
String customDetectorPattern,
List<String> excludedSubstringList)
throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
InfoType infoType = InfoType.newBuilder().setName("CUSTOM_NAME_DETECTOR").build();
CustomInfoType customInfoType =
CustomInfoType.newBuilder()
.setInfoType(infoType)
.setRegex(Regex.newBuilder().setPattern(customDetectorPattern))
.build();
// Exclude partial matches from the specified excludedSubstringList.
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
.setDictionary(
Dictionary.newBuilder()
.setWordList(WordList.newBuilder().addAllWords(excludedSubstringList)))
.build();
// Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(infoType)
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addCustomInfoTypes(customInfoType)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
public class InspectStringCustomExcludingSubstring
{
public static InspectContentResponse Inspect(string projectId, string textToInspect, string customDetectorPattern, List<String> excludedSubstringList)
{
var dlp = DlpServiceClient.Create();
var byteContentItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem { ByteItem = byteContentItem };
var infoType = new InfoType
{
Name = "CUSTOM_NAME_DETECTOR"
};
var customInfoType = new CustomInfoType
{
InfoType = infoType,
Regex = new CustomInfoType.Types.Regex { Pattern = customDetectorPattern }
};
var exclusionRule = new ExclusionRule
{
MatchingType = MatchingType.PartialMatch,
Dictionary = new CustomInfoType.Types.Dictionary
{
WordList = new CustomInfoType.Types.Dictionary.Types.WordList
{
Words = { excludedSubstringList }
}
}
};
var ruleSet = new InspectionRuleSet
{
InfoTypes = { infoType },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
var config = new InspectConfig
{
CustomInfoTypes = { customInfoType },
IncludeQuote = true,
RuleSet = { ruleSet }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
在与自定义检测器重叠的 PERSON_NAME 检测器扫描中忽略扫描匹配项
在此场景中,如果某匹配项还在使用代码段第一部分中定义的自定义正则表达式检测器的扫描中得到匹配,则用户不希望使用内置检测器 PERSON_NAME
的 Cloud DLP 扫描返回该匹配项。
以下采用多种语言的 JSON 代码段和代码在 InspectConfig
中同时指定了自定义正则表达式检测器和排除规则。自定义正则表达式检测器用于指定要从结果中排除的姓名。排除规则指定如果针对 PERSON_NAME
的扫描返回的任何结果也被自定义正则表达式检测器匹配到,则忽略这些结果。请注意,VIP_DETECTOR
在此情况下标记为 EXCLUSION_TYPE_EXCLUDE
,因此它本身不产生结果。它只会影响 PERSON_NAME
检测器生成的结果。
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"customInfoTypes":[
{
"infoType":{
"name":"VIP_DETECTOR"
},
"regex":{
"pattern":"Larry Page|Sergey Brin"
},
"exclusionType":"EXCLUSION_TYPE_EXCLUDE"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"exclusionRule":{
"excludeInfoTypes":{
"infoTypes":[
{
"name":"VIP_DETECTOR"
}
]
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_custom_omit_overlap(
project, content_string
):
"""Matches PERSON_NAME and a custom detector,
but if they overlap only matches the custom detector
Uses the Data Loss Prevention API to omit matches on a built-in detector
if they overlap with matches from a custom detector
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a custom regex detector for names
custom_info_types = [
{
"info_type": {"name": "VIP_DETECTOR"},
"regex": {"pattern": "Larry Page|Sergey Brin"},
"exclusion_type": google.cloud.dlp_v2.CustomInfoType.ExclusionType.EXCLUSION_TYPE_EXCLUDE,
}
]
# Construct a rule set that will exclude PERSON_NAME matches
# that overlap with VIP_DETECTOR matches
rule_set = [
{
"info_types": [{"name": "PERSON_NAME"}],
"rules": [
{
"exclusion_rule": {
"exclude_info_types": {
"info_types": [{"name": "VIP_DETECTOR"}]
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_FULL_MATCH,
}
}
],
}
]
# Construct the configuration dictionary
inspect_config = {
"info_types": [{"name": "PERSON_NAME"}],
"custom_info_types": custom_info_types,
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.CustomInfoType.ExclusionType;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExcludeInfoTypes;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
public class InspectStringCustomOmitOverlap {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "Name: Jane Doe. Name: Larry Page.";
inspectStringCustomOmitOverlap(projectId, textToInspect);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringCustomOmitOverlap(String projectId, String textToInspect)
throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Construct the custom infotype.
CustomInfoType customInfoType =
CustomInfoType.newBuilder()
.setInfoType(InfoType.newBuilder().setName("VIP_DETECTOR"))
.setRegex(Regex.newBuilder().setPattern("Larry Page|Sergey Brin"))
.setExclusionType(ExclusionType.EXCLUSION_TYPE_EXCLUDE)
.build();
// Exclude matches that also match the custom infotype.
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setExcludeInfoTypes(
ExcludeInfoTypes.newBuilder().addInfoTypes(customInfoType.getInfoType()))
.setMatchingType(MatchingType.MATCHING_TYPE_FULL_MATCH)
.build();
// Construct a ruleset that applies the exclusion rule to the PERSON_NAME infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
.addCustomInfoTypes(customInfoType)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;
public class InspectStringCustomOmitOverlap
{
public static InspectContentResponse Inspect(string projectId, string textToInspect)
{
var dlp = DlpServiceClient.Create();
var byteItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem { ByteItem = byteItem };
var customInfoType = new CustomInfoType
{
InfoType = new InfoType { Name = "VIP_DETECTOR" },
Regex = new CustomInfoType.Types.Regex { Pattern = "Larry Page|Sergey Brin" },
ExclusionType = ExclusionType.Exclude
};
var exclusionRule = new ExclusionRule
{
ExcludeInfoTypes = new ExcludeInfoTypes { InfoTypes = { customInfoType.InfoType } },
MatchingType = MatchingType.FullMatch
};
var ruleSet = new InspectionRuleSet
{
InfoTypes = { new InfoType { Name = "PERSON_NAME" } },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
var config = new InspectConfig
{
InfoTypes = { new InfoType { Name = "PERSON_NAME" } },
CustomInfoTypes = { customInfoType },
IncludeQuote = true,
RuleSet = { ruleSet }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
忽略同时被 EMAIL_ADDRESS 检测器匹配到的 PERSON_NAME 检测器匹配项
以下采用多种语言的 JSON 代码段和代码说明如何通过 InspectConfig
向 Cloud DLP 指示在 PERSON_NAME
检测器的匹配项与 EMAIL_ADDRESS
检测器的匹配项重叠时,仅返回一个匹配项。这样是为了避免出现电子邮件地址(如“james@example.com”)在 PERSON_NAME
和 EMAIL_ADDRESS
检测器上都显示为匹配项的情况。
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
},
{
"name":"EMAIL_ADDRESS"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"exclusionRule":{
"excludeInfoTypes":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def omit_name_if_also_email(
project, content_string,
):
"""Matches PERSON_NAME and EMAIL_ADDRESS, but not both.
Uses the Data Loss Prevention API omit matches on PERSON_NAME if the
EMAIL_ADDRESS detector also matches.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a list of infoTypes for DLP to locate in `content_string`. See
# https://cloud.google.com/dlp/docs/concepts-infotypes for more information
# about supported infoTypes.
info_types_to_locate = [{"name": "PERSON_NAME"}, {"name": "EMAIL_ADDRESS"}]
# Construct the configuration dictionary that will only match on PERSON_NAME
# if the EMAIL_ADDRESS doesn't also match. This configuration helps reduce
# the total number of findings when there is a large overlap between different
# infoTypes.
inspect_config = {
"info_types": info_types_to_locate,
"rule_set": [
{
"info_types": [{"name": "PERSON_NAME"}],
"rules": [
{
"exclusion_rule": {
"exclude_info_types": {
"info_types": [{"name": "EMAIL_ADDRESS"}]
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
}
}
],
}
],
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.ExcludeInfoTypes;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class InspectStringOmitOverlap {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "james@example.com";
inspectStringOmitOverlap(projectId, textToInspect);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringOmitOverlap(String projectId, String textToInspect)
throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
List<InfoType> infoTypes = new ArrayList<>();
for (String typeName : new String[] {"PERSON_NAME", "EMAIL_ADDRESS"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Exclude EMAIL_ADDRESS matches
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setExcludeInfoTypes(
ExcludeInfoTypes.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS")))
.setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
.build();
// Construct a ruleset that applies the exclusion rule to the PERSON_NAME infotype.
// If a PERSON_NAME match overlaps with an EMAIL_ADDRESS match, the PERSON_NAME match will
// be excluded.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
public class InspectStringOmitOverlap
{
public static InspectContentResponse Inspect(string projectId, string textToInspect)
{
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
var dlp = DlpServiceClient.Create();
// Specify the type and content to be inspected.
var byteItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem { ByteItem = byteItem };
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
var infoTypes = new string[] { "PERSON_NAME", "EMAIL_ADDRESS" }.Select(it => new InfoType { Name = it });
// Exclude EMAIL_ADDRESS matches
var exclusionRule = new ExclusionRule
{
ExcludeInfoTypes = new ExcludeInfoTypes { InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } } },
MatchingType = MatchingType.PartialMatch
};
// Construct a ruleset that applies the exclusion rule to the PERSON_NAME infotype.
// If a PERSON_NAME match overlaps with an EMAIL_ADDRESS match, the PERSON_NAME match will
// be excluded.
var ruleSet = new InspectionRuleSet
{
InfoTypes = { new InfoType { Name = "PERSON_NAME" } },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
// Construct the configuration for the Inspect request, including the ruleset.
var config = new InspectConfig
{
InfoTypes = { infoTypes },
IncludeQuote = true,
RuleSet = { ruleSet }
};
// Construct the Inspect request to be sent by the client.
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
// Use the client to send the API request.
var response = dlp.InspectContent(request);
// Parse the response and process results
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
在 DOMAIN_NAME 检测器扫描中忽略电子邮件地址域名匹配项
以下采用多种语言的 JSON 代码段和代码说明如何通过 InspectConfig
向 Cloud DLP 指示仅返回与 EMAIL_ADDRESS
检测器扫描匹配项不重叠的 DOMAIN_NAME
检测器扫描匹配项。在此场景中,主扫描是 DOMAIN_NAME
检测器扫描。用户不希望在结果中返回属于电子邮件地址一部分的域名匹配项:
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"DOMAIN_NAME"
},
{
"name":"EMAIL_ADDRESS"
}
],
"customInfoTypes":[
{
"infoType":{
"name":"EMAIL_ADDRESS"
},
"exclusionType":"EXCLUSION_TYPE_EXCLUDE"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"DOMAIN_NAME"
}
],
"rules":[
{
"exclusionRule":{
"excludeInfoTypes":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_without_overlap(
project, content_string
):
"""Matches EMAIL_ADDRESS and DOMAIN_NAME, but DOMAIN_NAME is omitted
if it overlaps with EMAIL_ADDRESS
Uses the Data Loss Prevention API to omit matches of one infotype
that overlap with another.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a list of infoTypes for DLP to locate in `content_string`. See
# https://cloud.google.com/dlp/docs/concepts-infotypes for more information
# about supported infoTypes.
info_types_to_locate = [{"name": "DOMAIN_NAME"}, {"name": "EMAIL_ADDRESS"}]
# Define a custom info type to exclude email addresses
custom_info_types = [
{
"info_type": {"name": "EMAIL_ADDRESS"},
"exclusion_type": google.cloud.dlp_v2.CustomInfoType.ExclusionType.EXCLUSION_TYPE_EXCLUDE,
}
]
# Construct a rule set that will exclude DOMAIN_NAME matches
# that overlap with EMAIL_ADDRESS matches
rule_set = [
{
"info_types": [{"name": "DOMAIN_NAME"}],
"rules": [
{
"exclusion_rule": {
"exclude_info_types": {
"info_types": [{"name": "EMAIL_ADDRESS"}]
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
}
}
],
}
]
# Construct the configuration dictionary
inspect_config = {
"info_types": info_types_to_locate,
"custom_info_types": custom_info_types,
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.CustomInfoType.ExclusionType;
import com.google.privacy.dlp.v2.ExcludeInfoTypes;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class InspectStringWithoutOverlap {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "example.com is a domain, james@example.org is an email.";
inspectStringWithoutOverlap(projectId, textToInspect);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringWithoutOverlap(String projectId, String textToInspect)
throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
List<InfoType> infoTypes = new ArrayList<>();
for (String typeName : new String[] {"DOMAIN_NAME", "EMAIL_ADDRESS"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Define a custom info type to exclude email addresses
CustomInfoType customInfoType =
CustomInfoType.newBuilder()
.setInfoType(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
.setExclusionType(ExclusionType.EXCLUSION_TYPE_EXCLUDE)
.build();
// Exclude EMAIL_ADDRESS matches
ExclusionRule exclusionRule =
ExclusionRule.newBuilder()
.setExcludeInfoTypes(
ExcludeInfoTypes.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS")))
.setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
.build();
// Construct a ruleset that applies the exclusion rule to the DOMAIN_NAME infotype.
// If a DOMAIN_NAME match is part of an EMAIL_ADDRESS match, the DOMAIN_NAME match will
// be excluded.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("DOMAIN_NAME"))
.addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addAllInfoTypes(infoTypes)
.addCustomInfoTypes(customInfoType)
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;
public class InspectStringWithoutOverlap
{
public static InspectContentResponse Inspect(string projectId, string textToInspect)
{
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
var dlp = DlpServiceClient.Create();
// Specify the type and content to be inspected.
var byteContentItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem
{
ByteItem = byteContentItem
};
// Specify the type of info the inspection will look for.
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
var infoTypes = new string[] { "DOMAIN_NAME", "EMAIL_ADDRESS" }.Select(it => new InfoType { Name = it });
// Define a custom info type to exclude email addresses
var customInfoType = new CustomInfoType
{
InfoType = new InfoType { Name = "EMAIL_ADDRESS" },
ExclusionType = ExclusionType.Exclude
};
// Exclude EMAIL_ADDRESS matches
var exclusionRule = new ExclusionRule
{
ExcludeInfoTypes = new ExcludeInfoTypes
{
InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } }
},
MatchingType = MatchingType.PartialMatch
};
// Construct a ruleset that applies the exclusion rule to the DOMAIN_NAME infotype.
// If a DOMAIN_NAME match is part of an EMAIL_ADDRESS match, the DOMAIN_NAME match will
// be excluded.
var ruleSet = new InspectionRuleSet
{
InfoTypes = { new InfoType { Name = "DOMAIN_NAME" } },
Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
};
// Construct the configuration for the Inspect request, including the ruleset.
var config = new InspectConfig
{
InfoTypes = { infoTypes },
CustomInfoTypes = { customInfoType },
IncludeQuote = true,
RuleSet = { ruleSet }
};
// Construct the Inspect request to be sent by the client.
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = config
};
// Use the client to send the API request.
var response = dlp.InspectContent(request);
return response;
}
}
热词规则
热词规则对以下情况非常有用:
- 您想要根据匹配项与热词的邻近度来更改分配给扫描匹配项的可能性值。例如,您希望根据患者姓名与“患者”一词的邻近度,为患者姓名匹配项设置更高的可能性值。
- 检查结构化的表格数据时,您想要根据列标题名称来更改分配给匹配项的可能性值。例如,您希望为标题为
ACCOUNT_ID
的列中的US_SOCIAL_SECURITY_NUMBER
设置更高的可能性值。
热词规则 API 概览
在 Cloud DLP 的 InspectionRule
对象中,您可以指定一个 HotwordRule
对象,用于为在热词的一定邻近范围内的结果调整可能性。
InspectionRule
对象在 InspectionRuleSet
对象中分组为“规则集”,组中还有规则集所适用的 infoType 检测器列表。规则集中的规则按指定顺序进行应用。
热词规则示例场景
以下代码段说明如何针对特定场景配置 Cloud DLP。
提高旁边有热词“患者”的 PERSON_NAME 匹配项的可能性
以下采用多种语言的 JSON 代码段和代码说明了如何使用 InspectConfig
属性来扫描医疗数据库中的患者姓名。您可以使用 Cloud DLP 的内置 infoType 检测器 PERSON_NAME
,但这会导致 Cloud DLP 匹配所有人名,而不仅仅是患者姓名。为解决此问题,您可以添加一条热词规则,用于在潜在匹配项第一个字符的一定邻近字符范围内查找“患者”一词。然后,您可以向匹配该模式的结果分配“可能性极大”(very likely) 这一可能性,因为这些结果符合您的特殊条件。通过在 InspectConfig
中将 Likelihood
的最小值设为 VERY_LIKELY
,可确保结果中仅返回符合此配置的匹配项。
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"patient"
},
"proximity":{
"windowBefore":50
},
"likelihoodAdjustment":{
"fixedLikelihood":"VERY_LIKELY"
}
}
}
]
}
],
"minLikelihood":"VERY_LIKELY"
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_with_person_name_w_custom_hotword(
project, content_string, custom_hotword="patient"
):
"""Uses the Data Loss Prevention API increase likelihood for matches on
PERSON_NAME if the user specified custom hotword is present. Only
includes findings with the increased likelihood by setting a minimum
likelihood threshold of VERY_LIKELY.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
custom_hotword: The custom hotword used for likelihood boosting.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct a rule set with caller provided hotword, with a likelihood
# boost to VERY_LIKELY when the hotword are present within the 50 character-
# window preceding the PII finding.
hotword_rule = {
"hotword_regex": {"pattern": custom_hotword},
"likelihood_adjustment": {
"fixed_likelihood": google.cloud.dlp_v2.Likelihood.VERY_LIKELY
},
"proximity": {"window_before": 50},
}
rule_set = [
{
"info_types": [{"name": "PERSON_NAME"}],
"rules": [{"hotword_rule": hotword_rule}],
}
]
# Construct the configuration dictionary with the custom regex info type.
inspect_config = {
"rule_set": rule_set,
"min_likelihood": google.cloud.dlp_v2.Likelihood.VERY_LIKELY,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.HotwordRule;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.LikelihoodAdjustment;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.Proximity;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.Likelihood;
import com.google.privacy.dlp.v2.LocationName;
import com.google.protobuf.ByteString;
import java.io.IOException;
public class InspectStringCustomHotword {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "patient name: John Doe";
String customHotword = "patient";
inspectStringCustomHotword(projectId, textToInspect, customHotword);
}
// Inspects the provided text.
public static void inspectStringCustomHotword(
String projectId, String textToInspect, String customHotword) throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Increase likelihood of matches that have customHotword nearby
HotwordRule hotwordRule =
HotwordRule.newBuilder()
.setHotwordRegex(Regex.newBuilder().setPattern(customHotword))
.setProximity(Proximity.newBuilder().setWindowBefore(50))
.setLikelihoodAdjustment(
LikelihoodAdjustment.newBuilder().setFixedLikelihood(Likelihood.VERY_LIKELY))
.build();
// Construct a ruleset that applies the hotword rule to the PERSON_NAME infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME").build())
.addRules(InspectionRule.newBuilder().setHotwordRule(hotwordRule))
.build();
// Construct the configuration for the Inspect request.
InspectConfig config =
InspectConfig.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME").build())
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.setMinLikelihood(Likelihood.VERY_LIKELY)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using System;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;
public class InspectStringCustomHotword
{
public static InspectContentResponse Inspect(string projectId, string textToInspect, string customHotword)
{
var dlp = DlpServiceClient.Create();
var byteContentItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem
{
ByteItem = byteContentItem
};
var hotwordRule = new DetectionRule.Types.HotwordRule
{
HotwordRegex = new Regex { Pattern = customHotword },
Proximity = new DetectionRule.Types.Proximity { WindowBefore = 50 },
LikelihoodAdjustment = new DetectionRule.Types.LikelihoodAdjustment { FixedLikelihood = Likelihood.VeryLikely }
};
var infoType = new InfoType { Name = "PERSON_NAME" };
var inspectionRuleSet = new InspectionRuleSet
{
InfoTypes = { infoType },
Rules = { new InspectionRule { HotwordRule = hotwordRule } }
};
var inspectConfig = new InspectConfig
{
InfoTypes = { infoType },
IncludeQuote = true,
RuleSet = { inspectionRuleSet },
MinLikelihood = Likelihood.VeryLikely
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = inspectConfig
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
如需详细了解热词,请参阅自定义匹配可能性。
多种检查规则场景
以下采用多种语言的 InspectConfig
JSON 代码段和代码说明了如何同时应用排除规则和热词规则。此代码段的规则集同时包含热词规则以及字典和正则表达式排除规则。请注意,这四条规则在 rules
元素的数组中指定。
协议
如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门。
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"patient"
},
"proximity":{
"windowBefore":10
},
"likelihoodAdjustment":{
"fixedLikelihood":"VERY_LIKELY"
}
}
},
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"doctor"
},
"proximity":{
"windowBefore":10
},
"likelihoodAdjustment":{
"fixedLikelihood":"UNLIKELY"
}
}
},
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"Quasimodo"
]
}
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
},
{
"exclusionRule":{
"regex":{
"pattern":"REDACTED"
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Python
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
def inspect_string_multiple_rules(
project, content_string
):
"""Uses the Data Loss Prevention API to modify likelihood for matches on
PERSON_NAME combining multiple hotword and exclusion rules.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
Returns:
None; the response from the API is printed to the terminal.
"""
# Import the client library.
import google.cloud.dlp
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Construct hotword rules
patient_rule = {
"hotword_regex": {"pattern": "patient"},
"proximity": {"window_before": 10},
"likelihood_adjustment": {
"fixed_likelihood": google.cloud.dlp_v2.Likelihood.VERY_LIKELY
},
}
doctor_rule = {
"hotword_regex": {"pattern": "doctor"},
"proximity": {"window_before": 10},
"likelihood_adjustment": {
"fixed_likelihood": google.cloud.dlp_v2.Likelihood.UNLIKELY
},
}
# Construct exclusion rules
quasimodo_rule = {
"dictionary": {
"word_list": {
"words": ["quasimodo"]
},
},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
}
redacted_rule = {
"regex": {"pattern": "REDACTED"},
"matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
}
# Construct the rule set, combining the above rules
rule_set = [
{
"info_types": [{"name": "PERSON_NAME"}],
"rules": [
{"hotword_rule": patient_rule},
{"hotword_rule": doctor_rule},
{"exclusion_rule": quasimodo_rule},
{"exclusion_rule": redacted_rule},
],
}
]
# Construct the configuration dictionary
inspect_config = {
"info_types": [{"name": "PERSON_NAME"}],
"rule_set": rule_set,
"include_quote": True,
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
print(f"Quote: {finding.quote}")
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Java
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.HotwordRule;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.LikelihoodAdjustment;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.Proximity;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.Likelihood;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
public class InspectStringMultipleRules {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "patient: Jane Doe";
inspectStringMultipleRules(projectId, textToInspect);
}
// Inspects the provided text, avoiding matches specified in the exclusion list.
public static void inspectStringMultipleRules(String projectId, String textToInspect)
throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Construct hotword rules
HotwordRule patientRule =
HotwordRule.newBuilder()
.setHotwordRegex(Regex.newBuilder().setPattern("patient"))
.setProximity(Proximity.newBuilder().setWindowBefore(10))
.setLikelihoodAdjustment(
LikelihoodAdjustment.newBuilder().setFixedLikelihood(Likelihood.VERY_LIKELY))
.build();
HotwordRule doctorRule =
HotwordRule.newBuilder()
.setHotwordRegex(Regex.newBuilder().setPattern("doctor"))
.setProximity(Proximity.newBuilder().setWindowBefore(10))
.setLikelihoodAdjustment(
LikelihoodAdjustment.newBuilder().setFixedLikelihood(Likelihood.UNLIKELY))
.build();
// Construct exclusion rules
ExclusionRule quasimodoRule =
ExclusionRule.newBuilder()
.setDictionary(
Dictionary.newBuilder().setWordList(WordList.newBuilder().addWords("Quasimodo")))
.setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
.build();
ExclusionRule redactedRule =
ExclusionRule.newBuilder()
.setRegex(Regex.newBuilder().setPattern("REDACTED"))
.setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
.build();
// Construct a ruleset that applies the rules to the PERSON_NAME infotype.
InspectionRuleSet ruleSet =
InspectionRuleSet.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
.addRules(InspectionRule.newBuilder().setHotwordRule(patientRule))
.addRules(InspectionRule.newBuilder().setHotwordRule(doctorRule))
.addRules(InspectionRule.newBuilder().setExclusionRule(quasimodoRule))
.addRules(InspectionRule.newBuilder().setExclusionRule(redactedRule))
.build();
// Construct the configuration for the Inspect request, including the ruleset.
InspectConfig config =
InspectConfig.newBuilder()
.addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
.setIncludeQuote(true)
.addRuleSet(ruleSet)
.build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
C#
如需了解如何安装和使用 Cloud DLP 客户端库,请参阅 Cloud DLP 客户端库。
using System;
using System.Text.RegularExpressions;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;
public class InspectStringMultipleRules
{
public static InspectContentResponse Inspect(string projectId, string textToInspect)
{
var dlp = DlpServiceClient.Create();
var byteContentItem = new ByteContentItem
{
Type = ByteContentItem.Types.BytesType.TextUtf8,
Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
};
var contentItem = new ContentItem
{
ByteItem = byteContentItem
};
var patientRule = new DetectionRule.Types.HotwordRule
{
HotwordRegex = new CustomInfoType.Types.Regex { Pattern = "patient" },
Proximity = new DetectionRule.Types.Proximity { WindowBefore = 10 },
LikelihoodAdjustment = new DetectionRule.Types.LikelihoodAdjustment { FixedLikelihood = Likelihood.VeryLikely }
};
var doctorRule = new DetectionRule.Types.HotwordRule
{
HotwordRegex = new CustomInfoType.Types.Regex { Pattern = "doctor" },
Proximity = new DetectionRule.Types.Proximity { WindowBefore = 10 },
LikelihoodAdjustment = new DetectionRule.Types.LikelihoodAdjustment { FixedLikelihood = Likelihood.Unlikely }
};
// Construct exclusion rules
var quasimodoRule = new ExclusionRule
{
Dictionary = new Dictionary { WordList = new Dictionary.Types.WordList { Words = { "Quasimodo" } } },
MatchingType = MatchingType.PartialMatch
};
var redactedRule = new ExclusionRule
{
Regex = new CustomInfoType.Types.Regex { Pattern = "REDACTED" },
MatchingType = MatchingType.PartialMatch
};
var infoType = new InfoType { Name = "PERSON_NAME" };
var inspectionRuleSet = new InspectionRuleSet
{
InfoTypes = { infoType },
Rules =
{
new InspectionRule { HotwordRule = patientRule },
new InspectionRule { HotwordRule = doctorRule},
new InspectionRule { ExclusionRule = quasimodoRule },
new InspectionRule { ExclusionRule = redactedRule }
}
};
var inspectConfig = new InspectConfig
{
InfoTypes = { infoType },
IncludeQuote = true,
RuleSet = { inspectionRuleSet }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = contentItem,
InspectConfig = inspectConfig
};
var response = dlp.InspectContent(request);
Console.WriteLine($"Findings: {response.Result.Findings.Count}");
foreach (var f in response.Result.Findings)
{
Console.WriteLine("\tQuote: " + f.Quote);
Console.WriteLine("\tInfo type: " + f.InfoType.Name);
Console.WriteLine("\tLikelihood: " + f.Likelihood);
}
return response;
}
}
重叠的 infoType 检测器
可以定义与内置 infoType 检测器同名的自定义 infoType 检测器。如热词规则示例场景部分中的示例所示,如果您创建与内置 infoType 同名的自定义 infoType 检测器,则新 infoType 检测器检测到的任何结果都会添加到内置检测器检测到的结果中。仅当 InspectConfig
对象的 infoType 列表中指定了内置 infoType 时,才会出现此情况。
创建新的自定义 infoType 检测器时,请对示例内容进行充分测试,确保其按预期工作。