敏感数据保护可以检测文本内容中的敏感数据并对其进行分类。在给定文本输入的情况下,DLP API 会返回文本中找到的任何 infoTypes 的详细信息、可能性值和偏移信息。
最佳做法
确定扫描范围和优先级
请务必标识您的资源并指定哪些资源的扫描优先级最高。刚开始时,可能有大量积压的数据需要分类,而且无法立即扫描所有数据。首先选择风险最高的数据,例如经常访问、广泛访问或未知的数据。
缩短延迟时间
延迟时间受以下几个因素影响:要扫描的数据量、要扫描的存储区以及启用的 infoType 类型和数量。
如需帮助减少作业延迟时间,您可以尝试以下方法:
- 启用抽样。
- 避免启用不需要的 infoType。虽然在某些情况下很有用,但一些 infoType(包括
PERSON_NAME
、FEMALE_NAME
、MALE_NAME
、FIRST_NAME
、LAST_NAME
、DATE_OF_BIRTH
、LOCATION
、STREET_ADDRESS
、ORGANIZATION_NAME
),如果包含在内,可能导致请求的运行速度要慢很多。 - 始终明确指定 infoType。请勿使用空的 infoType 列表。
- 请考虑将要检查的数据整理成包含行和列的表(如果可能),以减少网络往返次数。
限制首次扫描的范围
为达到最佳效果,请限制前几次扫描的范围,而不是扫描所有数据。从一些请求开始。如果您要微调要启用的检测器,以及可能需要哪些排除规则来减少假正例,则您的发现结果将更有意义。如果您不需要所有 infoType,请避免启用所有 infoType,因为误报或无法使用的结果可能会让评估风险变得更加困难。虽然在某些情况下很有用,但某些 infoType(如 DATE
、TIME
、DOMAIN_NAME
和 URL
)会检测到广泛的发现结果,因此可能不适用于启用。
本地、混合和多云端扫描
如果要扫描的数据位于本地或 Google Cloud 外部,请使用 API 方法 content.inspect
和 content.deidentify
扫描内容,以对发现结果进行分类并假名化内容,而无需永久保留本地存储空间之外的内容。
检查文本字符串
以下是多种语言的 JSON 和代码示例,演示了如何使用 DLP API 检查文本字符串是否存在敏感数据。
C#
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.InspectConfig.Types;
public class InspectString
{
public static InspectContentResponse Inspect(
string projectId,
string dataValue,
string minLikelihood,
int maxFindings,
bool includeQuote,
IEnumerable<InfoType> infoTypes,
IEnumerable<CustomInfoType> customInfoTypes)
{
var inspectConfig = new InspectConfig
{
MinLikelihood = (Likelihood)Enum.Parse(typeof(Likelihood), minLikelihood, true),
Limits = new FindingLimits
{
MaxFindingsPerRequest = maxFindings
},
IncludeQuote = includeQuote,
InfoTypes = { infoTypes },
CustomInfoTypes = { customInfoTypes }
};
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = new ContentItem
{
Value = dataValue
},
InspectConfig = inspectConfig
};
var dlp = DlpServiceClient.Create();
var response = dlp.InspectContent(request);
PrintResponse(includeQuote, response);
return response;
}
private static void PrintResponse(bool includeQuote, InspectContentResponse response)
{
var findings = response.Result.Findings;
if (findings.Any())
{
Console.WriteLine("Findings:");
foreach (var finding in findings)
{
if (includeQuote)
{
Console.WriteLine($" Quote: {finding.Quote}");
}
Console.WriteLine($" InfoType: {finding.InfoType}");
Console.WriteLine($" Likelihood: {finding.Likelihood}");
}
}
else
{
Console.WriteLine("No findings.");
}
}
}
Go
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
import (
"context"
"fmt"
"io"
dlp "cloud.google.com/go/dlp/apiv2"
"cloud.google.com/go/dlp/apiv2/dlppb"
)
// inspectString inspects the a given string, and prints results.
func inspectString(w io.Writer, projectID, textToInspect string) error {
// projectID := "my-project-id"
// textToInspect := "My name is Gary and my email is gary@example.com"
ctx := context.Background()
// Initialize client.
client, err := dlp.NewClient(ctx)
if err != nil {
return err
}
defer client.Close() // Closing the client safely cleans up background resources.
// Create and send the request.
req := &dlppb.InspectContentRequest{
Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
Item: &dlppb.ContentItem{
DataItem: &dlppb.ContentItem_Value{
Value: textToInspect,
},
},
InspectConfig: &dlppb.InspectConfig{
InfoTypes: []*dlppb.InfoType{
{Name: "PHONE_NUMBER"},
{Name: "EMAIL_ADDRESS"},
{Name: "CREDIT_CARD_NUMBER"},
},
IncludeQuote: true,
},
}
resp, err := client.InspectContent(ctx, req)
if err != nil {
return err
}
// Process the results.
result := resp.Result
fmt.Fprintf(w, "Findings: %d\n", len(result.Findings))
for _, f := range result.Findings {
fmt.Fprintf(w, "\tQuote: %s\n", f.Quote)
fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
}
return nil
}
Java
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class InspectString {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String textToInspect = "My name is Gary and my email is gary@example.com";
inspectString(projectId, textToInspect);
}
// Inspects the provided text.
public static void inspectString(String projectId, String textToInspect) throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteContentItem byteItem =
ByteContentItem.newBuilder()
.setType(BytesType.TEXT_UTF8)
.setData(ByteString.copyFromUtf8(textToInspect))
.build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
List<InfoType> infoTypes = new ArrayList<>();
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Construct the configuration for the Inspect request.
InspectConfig config =
InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
Node.js
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');
// Instantiates a client
const dlp = new DLP.DlpServiceClient();
// The project ID to run the API call under
// const projectId = 'my-project';
// The string to inspect
// const string = 'My name is Gary and my email is gary@example.com';
// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';
// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;
// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];
// The customInfoTypes of information to match
// const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
// { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}];
// Whether to include the matching string
// const includeQuote = true;
async function inspectString() {
// Construct item to inspect
const item = {value: string};
// Construct request
const request = {
parent: `projects/${projectId}/locations/global`,
inspectConfig: {
infoTypes: infoTypes,
customInfoTypes: customInfoTypes,
minLikelihood: minLikelihood,
includeQuote: includeQuote,
limits: {
maxFindingsPerRequest: maxFindings,
},
},
item: item,
};
// Run request
const [response] = await dlp.inspectContent(request);
const findings = response.result.findings;
if (findings.length > 0) {
console.log('Findings:');
findings.forEach(finding => {
if (includeQuote) {
console.log(`\tQuote: ${finding.quote}`);
}
console.log(`\tInfo type: ${finding.infoType.name}`);
console.log(`\tLikelihood: ${finding.likelihood}`);
});
} else {
console.log('No findings.');
}
}
inspectString();
PHP
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\Likelihood;
/**
* @param string $projectId
* @param string $textToInspect
*/
function inspect_string(string $projectId, string $textToInspect): void
{
// Instantiate a client.
$dlp = new DlpServiceClient();
// Construct request
$parent = "projects/$projectId/locations/global";
$item = (new ContentItem())
->setValue($textToInspect);
$inspectConfig = (new InspectConfig())
// The infoTypes of information to match
->setInfoTypes([
(new InfoType())->setName('PHONE_NUMBER'),
(new InfoType())->setName('EMAIL_ADDRESS'),
(new InfoType())->setName('CREDIT_CARD_NUMBER')
])
// Whether to include the matching string
->setIncludeQuote(true);
// Run request
$response = $dlp->inspectContent([
'parent' => $parent,
'inspectConfig' => $inspectConfig,
'item' => $item
]);
// Print the results
$findings = $response->getResult()->getFindings();
if (count($findings) == 0) {
print('No findings.' . PHP_EOL);
} else {
print('Findings:' . PHP_EOL);
foreach ($findings as $finding) {
print(' Quote: ' . $finding->getQuote() . PHP_EOL);
print(' Info type: ' . $finding->getInfoType()->getName() . PHP_EOL);
$likelihoodString = Likelihood::name($finding->getLikelihood());
print(' Likelihood: ' . $likelihoodString . PHP_EOL);
}
}
}
Python
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
from typing import List
import google.cloud.dlp
def inspect_string(
project: str,
content_string: str,
info_types: List[str],
custom_dictionaries: List[str] = None,
custom_regexes: List[str] = None,
min_likelihood: str = None,
max_findings: str = None,
include_quote: bool = True,
) -> None:
"""Uses the Data Loss Prevention API to analyze strings for protected data.
Args:
project: The Google Cloud project id to use as a parent resource.
content_string: The string to inspect.
info_types: A list of strings representing info types to look for.
A full list of info type categories can be fetched from the API.
min_likelihood: A string representing the minimum likelihood threshold
that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
max_findings: The maximum number of findings to report; 0 = no maximum.
include_quote: Boolean for whether to display a quote of the detected
information in the results.
Returns:
None; the response from the API is printed to the terminal.
"""
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Prepare info_types by converting the list of strings into a list of
# dictionaries (protos are also accepted).
info_types = [{"name": info_type} for info_type in info_types]
# Prepare custom_info_types by parsing the dictionary word lists and
# regex patterns.
if custom_dictionaries is None:
custom_dictionaries = []
dictionaries = [
{
"info_type": {"name": f"CUSTOM_DICTIONARY_{i}"},
"dictionary": {"word_list": {"words": custom_dict.split(",")}},
}
for i, custom_dict in enumerate(custom_dictionaries)
]
if custom_regexes is None:
custom_regexes = []
regexes = [
{
"info_type": {"name": f"CUSTOM_REGEX_{i}"},
"regex": {"pattern": custom_regex},
}
for i, custom_regex in enumerate(custom_regexes)
]
custom_info_types = dictionaries + regexes
# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
"info_types": info_types,
"custom_info_types": custom_info_types,
"min_likelihood": min_likelihood,
"include_quote": include_quote,
"limits": {"max_findings_per_request": max_findings},
}
# Construct the `item`.
item = {"value": content_string}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
try:
if finding.quote:
print(f"Quote: {finding.quote}")
except AttributeError:
pass
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Ruby
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
# project_id = "Your Google Cloud project ID"
# content = "The text to inspect"
# max_findings = "Maximum number of findings to report per request (0 = server maximum)"
require "google/cloud/dlp"
dlp = Google::Cloud::Dlp.dlp_service
inspect_config = {
# The types of information to match
info_types: [{ name: "PERSON_NAME" }, { name: "US_STATE" }],
# Only return results above a likelihood threshold (0 for all)
min_likelihood: :POSSIBLE,
# Limit the number of findings (0 for no limit)
limits: { max_findings_per_request: max_findings },
# Whether to include the matching string in the response
include_quote: true
}
# The item to inspect
item_to_inspect = { value: content }
# Run request
parent = "projects/#{project_id}/locations/global"
response = dlp.inspect_content parent: parent,
inspect_config: inspect_config,
item: item_to_inspect
# Print the results
if response.result.findings.empty?
puts "No findings"
else
response.result.findings.each do |finding|
puts "Quote: #{finding.quote}"
puts "Info type: #{finding.info_type.name}"
puts "Likelihood: #{finding.likelihood}"
end
end
REST
要详细了解如何将 DLP API 与 JSON 结合使用,请参阅 JSON 快速入门。
JSON 输入:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:inspect?key={YOUR_API_KEY}
{
"item":{
"value":"My phone number is (415) 555-0890"
},
"inspectConfig":{
"includeQuote":true,
"minLikelihood":"POSSIBLE",
"infoTypes":{
"name":"PHONE_NUMBER"
}
}
}
JSON 输出:
{
"result":{
"findings":[
{
"quote":"(415) 555-0890",
"infoType":{
"name":"PHONE_NUMBER"
},
"likelihood":"VERY_LIKELY",
"location":{
"byteRange":{
"start":"19",
"end":"33"
},
"codepointRange":{
"start":"19",
"end":"33"
}
},
"createTime":"2018-11-13T19:29:15.412Z"
}
]
}
}
检查文本文件
以下代码示例演示了如何检查文本文件中是否存在敏感内容。
C#
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;
using static Google.Cloud.Dlp.V2.ByteContentItem.Types;
public class DlpInspectFile
{
public static IEnumerable<Finding> InspectFile(string projectId, string filePath, BytesType fileType)
{
// Instantiate a client.
var dlp = DlpServiceClient.Create();
// Get the bytes from the file.
ByteString fileBytes;
using (Stream f = new FileStream(filePath, FileMode.Open))
{
fileBytes = ByteString.FromStream(f);
}
// Construct a request.
var request = new InspectContentRequest
{
Parent = new LocationName(projectId, "global").ToString(),
Item = new ContentItem
{
ByteItem = new ByteContentItem()
{
Data = fileBytes,
Type = fileType
}
},
InspectConfig = new InspectConfig
{
// The info types of information to match
InfoTypes =
{
new InfoType { Name = "PHONE_NUMBER" },
new InfoType { Name = "EMAIL_ADDRESS" },
new InfoType { Name = "CREDIT_CARD_NUMBER" }
},
// The minimum likelihood before returning a match
MinLikelihood = Likelihood.Unspecified,
// Whether to include the matching string
IncludeQuote = true,
Limits = new InspectConfig.Types.FindingLimits
{
// The maximum number of findings to report per request
// (0 = server maximum)
MaxFindingsPerRequest = 0
}
}
};
// Execute request
var response = dlp.InspectContent(request);
// Inspect response
var findings = response.Result.Findings;
if (findings.Any())
{
Console.WriteLine("Findings:");
foreach (var finding in findings)
{
Console.WriteLine($"Quote: {finding.Quote}");
Console.WriteLine($"InfoType: {finding.InfoType}");
Console.WriteLine($"Likelihood: {finding.Likelihood}");
}
}
else
{
Console.WriteLine("No findings.");
}
return findings;
}
}
Go
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
import (
"context"
"fmt"
"io"
"io/ioutil"
dlp "cloud.google.com/go/dlp/apiv2"
"cloud.google.com/go/dlp/apiv2/dlppb"
)
// inspectTextFile inspects a text file at a given filePath, and prints results.
func inspectTextFile(w io.Writer, projectID, filePath string) error {
// projectID := "my-project-id"
// filePath := "path/to/image.png"
ctx := context.Background()
// Initialize client.
client, err := dlp.NewClient(ctx)
if err != nil {
return err
}
defer client.Close() // Closing the client safely cleans up background resources.
// Gather the resources for the request.
data, err := ioutil.ReadFile(filePath)
if err != nil {
return err
}
// Create and send the request.
req := &dlppb.InspectContentRequest{
Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
Item: &dlppb.ContentItem{
DataItem: &dlppb.ContentItem_ByteItem{
ByteItem: &dlppb.ByteContentItem{
Type: dlppb.ByteContentItem_TEXT_UTF8,
Data: data,
},
},
},
InspectConfig: &dlppb.InspectConfig{
InfoTypes: []*dlppb.InfoType{
{Name: "PHONE_NUMBER"},
{Name: "EMAIL_ADDRESS"},
{Name: "CREDIT_CARD_NUMBER"},
},
IncludeQuote: true,
},
}
resp, err := client.InspectContent(ctx, req)
if err != nil {
return fmt.Errorf("InspectContent: %w", err)
}
// Process the results.
fmt.Fprintf(w, "Findings: %d\n", len(resp.Result.Findings))
for _, f := range resp.Result.Findings {
fmt.Fprintf(w, "\tQuote: %s\n", f.Quote)
fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
}
return nil
}
Java
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.protobuf.ByteString;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class InspectTextFile {
public static void main(String[] args) throws Exception {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String filePath = "path/to/file.txt";
inspectTextFile(projectId, filePath);
}
// Inspects the specified text file.
public static void inspectTextFile(String projectId, String filePath) throws IOException {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (DlpServiceClient dlp = DlpServiceClient.create()) {
// Specify the type and content to be inspected.
ByteString fileBytes = ByteString.readFrom(new FileInputStream(filePath));
ByteContentItem byteItem =
ByteContentItem.newBuilder().setType(BytesType.TEXT_UTF8).setData(fileBytes).build();
ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();
// Specify the type of info the inspection will look for.
List<InfoType> infoTypes = new ArrayList<>();
// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
infoTypes.add(InfoType.newBuilder().setName(typeName).build());
}
// Construct the configuration for the Inspect request.
InspectConfig config =
InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();
// Construct the Inspect request to be sent by the client.
InspectContentRequest request =
InspectContentRequest.newBuilder()
.setParent(LocationName.of(projectId, "global").toString())
.setItem(item)
.setInspectConfig(config)
.build();
// Use the client to send the API request.
InspectContentResponse response = dlp.inspectContent(request);
// Parse the response and process results
System.out.println("Findings: " + response.getResult().getFindingsCount());
for (Finding f : response.getResult().getFindingsList()) {
System.out.println("\tQuote: " + f.getQuote());
System.out.println("\tInfo type: " + f.getInfoType().getName());
System.out.println("\tLikelihood: " + f.getLikelihood());
}
}
}
}
Node.js
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');
// Import other required libraries
const fs = require('fs');
const mime = require('mime');
// Instantiates a client
const dlp = new DLP.DlpServiceClient();
// The project ID to run the API call under
// const projectId = 'my-project';
// The path to a local file to inspect. Can be a text, JPG, or PNG file.
// const filepath = 'path/to/image.png';
// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';
// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;
// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];
// The customInfoTypes of information to match
// const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
// { infoType: { name: 'REGEX_TYPE' }, regex: {pattern: '\\(\\d{3}\\) \\d{3}-\\d{4}'}}];
// Whether to include the matching string
// const includeQuote = true;
async function inspectFile() {
// Construct file data to inspect
const fileTypeConstant =
['image/jpeg', 'image/bmp', 'image/png', 'image/svg'].indexOf(
mime.getType(filepath)
) + 1;
const fileBytes = Buffer.from(fs.readFileSync(filepath)).toString('base64');
const item = {
byteItem: {
type: fileTypeConstant,
data: fileBytes,
},
};
// Construct request
const request = {
parent: `projects/${projectId}/locations/global`,
inspectConfig: {
infoTypes: infoTypes,
customInfoTypes: customInfoTypes,
minLikelihood: minLikelihood,
includeQuote: includeQuote,
limits: {
maxFindingsPerRequest: maxFindings,
},
},
item: item,
};
// Run request
const [response] = await dlp.inspectContent(request);
const findings = response.result.findings;
if (findings.length > 0) {
console.log('Findings:');
findings.forEach(finding => {
if (includeQuote) {
console.log(`\tQuote: ${finding.quote}`);
}
console.log(`\tInfo type: ${finding.infoType.name}`);
console.log(`\tLikelihood: ${finding.likelihood}`);
});
} else {
console.log('No findings.');
}
}
PHP
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\ByteContentItem;
use Google\Cloud\Dlp\V2\ByteContentItem\BytesType;
use Google\Cloud\Dlp\V2\Likelihood;
/**
* @param string $projectId
* @param string $filepath
*/
function inspect_text_file(string $projectId, string $filepath): void
{
// Instantiate a client.
$dlp = new DlpServiceClient();
// Get the bytes of the file
$fileBytes = (new ByteContentItem())
->setType(BytesType::TEXT_UTF8)
->setData(file_get_contents($filepath));
// Construct request
$parent = "projects/$projectId/locations/global";
$item = (new ContentItem())
->setByteItem($fileBytes);
$inspectConfig = (new InspectConfig())
// The infoTypes of information to match
->setInfoTypes([
(new InfoType())->setName('PHONE_NUMBER'),
(new InfoType())->setName('EMAIL_ADDRESS'),
(new InfoType())->setName('CREDIT_CARD_NUMBER')
])
// Whether to include the matching string
->setIncludeQuote(true);
// Run request
$response = $dlp->inspectContent([
'parent' => $parent,
'inspectConfig' => $inspectConfig,
'item' => $item
]);
// Print the results
$findings = $response->getResult()->getFindings();
if (count($findings) == 0) {
print('No findings.' . PHP_EOL);
} else {
print('Findings:' . PHP_EOL);
foreach ($findings as $finding) {
print(' Quote: ' . $finding->getQuote() . PHP_EOL);
print(' Info type: ' . $finding->getInfoType()->getName() . PHP_EOL);
$likelihoodString = Likelihood::name($finding->getLikelihood());
print(' Likelihood: ' . $likelihoodString . PHP_EOL);
}
}
}
Python
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
import mimetypes
from typing import List
from typing import Optional
import google.cloud.dlp
def inspect_file(
project: str,
filename: str,
info_types: List[str],
min_likelihood: str = None,
custom_dictionaries: List[str] = None,
custom_regexes: List[str] = None,
max_findings: Optional[int] = None,
include_quote: bool = True,
mime_type: str = None,
) -> None:
"""Uses the Data Loss Prevention API to analyze a file for protected data.
Args:
project: The Google Cloud project id to use as a parent resource.
filename: The path to the file to inspect.
info_types: A list of strings representing info types to look for.
A full list of info type categories can be fetched from the API.
min_likelihood: A string representing the minimum likelihood threshold
that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
max_findings: The maximum number of findings to report; 0 = no maximum.
include_quote: Boolean for whether to display a quote of the detected
information in the results.
mime_type: The MIME type of the file. If not specified, the type is
inferred via the Python standard library's mimetypes module.
Returns:
None; the response from the API is printed to the terminal.
"""
# Instantiate a client.
dlp = google.cloud.dlp_v2.DlpServiceClient()
# Prepare info_types by converting the list of strings into a list of
# dictionaries (protos are also accepted).
if not info_types:
info_types = ["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"]
info_types = [{"name": info_type} for info_type in info_types]
# Prepare custom_info_types by parsing the dictionary word lists and
# regex patterns.
if custom_dictionaries is None:
custom_dictionaries = []
dictionaries = [
{
"info_type": {"name": f"CUSTOM_DICTIONARY_{i}"},
"dictionary": {"word_list": {"words": custom_dict.split(",")}},
}
for i, custom_dict in enumerate(custom_dictionaries)
]
if custom_regexes is None:
custom_regexes = []
regexes = [
{
"info_type": {"name": f"CUSTOM_REGEX_{i}"},
"regex": {"pattern": custom_regex},
}
for i, custom_regex in enumerate(custom_regexes)
]
custom_info_types = dictionaries + regexes
# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
"info_types": info_types,
"custom_info_types": custom_info_types,
"min_likelihood": min_likelihood,
"include_quote": include_quote,
"limits": {"max_findings_per_request": max_findings},
}
# If mime_type is not specified, guess it from the filename.
if mime_type is None:
mime_guess = mimetypes.MimeTypes().guess_type(filename)
mime_type = mime_guess[0]
# Select the content type index from the list of supported types.
supported_content_types = {
None: 0, # "Unspecified"
"image/jpeg": 1,
"image/bmp": 2,
"image/png": 3,
"image/svg": 4,
"text/plain": 5,
}
content_type_index = supported_content_types.get(mime_type, 0)
# Construct the item, containing the file's byte data.
with open(filename, mode="rb") as f:
item = {"byte_item": {"type_": content_type_index, "data": f.read()}}
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Call the API.
response = dlp.inspect_content(
request={"parent": parent, "inspect_config": inspect_config, "item": item}
)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
try:
print(f"Quote: {finding.quote}")
except AttributeError:
pass
print(f"Info type: {finding.info_type.name}")
print(f"Likelihood: {finding.likelihood}")
else:
print("No findings.")
Ruby
如需了解如何安装和使用用于敏感数据保护的客户端库,请参阅敏感数据保护客户端库。
如需向敏感数据保护服务进行身份验证,请设置应用默认凭据。如需了解详情,请参阅为本地开发环境设置身份验证。
# project_id = "Your Google Cloud project ID"
# filename = "The file path to the file to inspect"
# max_findings = "Maximum number of findings to report per request (0 = server maximum)"
require "google/cloud/dlp"
dlp = Google::Cloud::Dlp.dlp_service
inspect_config = {
# The types of information to match
info_types: [{ name: "PERSON_NAME" }, { name: "PHONE_NUMBER" }],
# Only return results above a likelihood threshold (0 for all)
min_likelihood: :POSSIBLE,
# Limit the number of findings (0 for no limit)
limits: { max_findings_per_request: max_findings },
# Whether to include the matching string in the response
include_quote: true
}
# The item to inspect
file = File.open filename, "rb"
item_to_inspect = { byte_item: { type: :BYTES_TYPE_UNSPECIFIED, data: file.read } }
# Run request
parent = "projects/#{project_id}/locations/global"
response = dlp.inspect_content parent: parent,
inspect_config: inspect_config,
item: item_to_inspect
# Print the results
if response.result.findings.empty?
puts "No findings"
else
response.result.findings.each do |finding|
puts "Quote: #{finding.quote}"
puts "Info type: #{finding.info_type.name}"
puts "Likelihood: #{finding.likelihood}"
end
end
后续步骤
- 完成使用敏感数据保护遮盖敏感数据 Codelab。
- 了解如何检查图片是否存在敏感数据。