민감한 정보의 텍스트 검사

Cloud DLP(Data Loss Prevention)는 텍스트 내의 민감한 정보를 감지 및 분류할 수 있습니다. 텍스트 입력이 제공되면 Cloud DLP API는 해당 텍스트에서 발견된 모든 infoType, 가능성 값, 오프셋 정보의 세부정보를 반환합니다.

권장사항

스캔 식별 및 우선순위 지정

리소스를 식별하고 스캔 우선순위가 가장 높은 항목을 지정하는 것이 중요합니다. 시작하기 전에 분류가 필요한 대량의 데이터 백로그가 있을 수 있으며 이 경우 즉시 스캔할 수 없습니다. 처음에는 자주 액세스하는 데이터, 폭넓게 액세스하는 데이터, 알 수 없는 데이터 등 위험도가 가장 높은 데이터를 선택합니다.

지연 시간 감소

지연 시간은 스캔할 데이터 양, 스캔하는 스토리지 저장소, 사용 설정된 infoType의 유형과 수의 영향을 받습니다.

작업 지연 시간을 줄이는 데 도움이 되는 방법은 다음과 같습니다.

  • 샘플링을 사용 설정합니다.
  • 모두 필요하지 않은 경우 모든 infoType을 사용 설정하지 않습니다. 특정 시나리오에서는 유용하지만 요청에 PERSON_NAME, FEMALE_NAME, MALE_NAME, FIRST_NAME, LAST_NAME, DATE, DATE_OF_BIRTH, TIME, LOCATION, STREET_ADDRESS, MEDICAL_TERM, ORGANIZATION_NAME, ALL_BASIC 같은 infoType이 포함되어 있으면 포함되지 않은 요청보다 훨씬 느리게 실행될 수 있습니다.
  • 가능하면 행과 열이 있는 로 검사할 데이터를 구성하여 네트워크 왕복을 줄이세요.

첫 번째 스캔의 범위 제한

최상의 결과를 얻으려면 모든 데이터를 스캔하는 대신 첫 번째 작업의 스캔 범위를 제한합니다. 몇 가지 요청으로 시작합니다. 사용 설정할 감지기와 거짓양성을 줄이기 위해 필요한 제외 규칙을 세부적으로 조정하면 결과가 더 의미가 있습니다. 거짓양성이나 쓸모 없는 발견 항목으로 인해 위험을 평가하기 어려울 수 있으므로 모두 필요한 경우가 아니라면 모든 infoType을 사용 설정하지는 마세요. 특정 시나리오에서는 유용하지만 DATE, TIME, DOMAIN_NAME, URL 같은 일부 infoType은 광범위한 결과를 탐지하므로 사용 설정하는 것이 유용하지 않을 수 있습니다.

온프레미스, 하이브리드, 멀티 클라우드 스캔

스캔할 데이터가 온프레미스 또는 Google Cloud 외부에 있는 경우 API 메서드를 사용합니다. content.inspectcontent.deidentify 콘텐츠를 검사하여 결과를 식별하고 로컬 저장소 외부에 콘텐츠를 보존하지 않고 콘텐츠를 가명처리합니다.

텍스트 문자열 검사

다음은 Cloud DLP API를 사용하여 텍스트 문자열에서 민감한 정보를 검사하는 방법을 보여주는 샘플 JSON 및 여러 언어로 된 코드입니다.

프로토콜

JSON과 Cloud DLP API 사용법에 대한 자세한 내용은 JSON 빠른 시작을 참조하세요.

JSON 입력:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:inspect?key={YOUR_API_KEY}

{
  "item":{
    "value":"My phone number is (415) 555-0890"
  },
  "inspectConfig":{
    "includeQuote":true,
    "minLikelihood":"POSSIBLE",
    "infoTypes":{
      "name":"PHONE_NUMBER"
    }
  }
}

JSON 출력:

{
  "result":{
    "findings":[
      {
        "quote":"(415) 555-0890",
        "infoType":{
          "name":"PHONE_NUMBER"
        },
        "likelihood":"VERY_LIKELY",
        "location":{
          "byteRange":{
            "start":"19",
            "end":"33"
          },
          "codepointRange":{
            "start":"19",
            "end":"33"
          }
        },
        "createTime":"2018-11-13T19:29:15.412Z"
      }
    ]
  }
}

자바


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class InspectString {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "My name is Gary and my email is gary@example.com";
    inspectString(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void inspectString(String projectId, String textToInspect) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      List<InfoType> infoTypes = new ArrayList<>();
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Construct the configuration for the Inspect request.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The string to inspect
// const string = 'My name is Gary and my email is gary@example.com';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { infoType: { name: 'REGEX_TYPE' }, regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// Whether to include the matching string
// const includeQuote = true;

// Construct item to inspect
const item = {value: string};

// Construct request
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  inspectConfig: {
    infoTypes: infoTypes,
    customInfoTypes: customInfoTypes,
    minLikelihood: minLikelihood,
    includeQuote: includeQuote,
    limits: {
      maxFindingsPerRequest: maxFindings,
    },
  },
  item: item,
};

// Run request
try {
  const [response] = await dlp.inspectContent(request);
  const findings = response.result.findings;
  if (findings.length > 0) {
    console.log('Findings:');
    findings.forEach(finding => {
      if (includeQuote) {
        console.log(`\tQuote: ${finding.quote}`);
      }
      console.log(`\tInfo type: ${finding.infoType.name}`);
      console.log(`\tLikelihood: ${finding.likelihood}`);
    });
  } else {
    console.log('No findings.');
  }
} catch (err) {
  console.log(`Error in inspectString: ${err.message || err}`);
}

Python

def inspect_string(
    project,
    content_string,
    info_types,
    custom_dictionaries=None,
    custom_regexes=None,
    min_likelihood=None,
    max_findings=None,
    include_quote=True,
):
    """Uses the Data Loss Prevention API to analyze strings for protected data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to inspect.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        include_quote: Boolean for whether to display a quote of the detected
            information in the results.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    info_types = [{"name": info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [
        {
            "info_type": {"name": "CUSTOM_DICTIONARY_{}".format(i)},
            "dictionary": {"word_list": {"words": custom_dict.split(",")}},
        }
        for i, custom_dict in enumerate(custom_dictionaries)
    ]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [
        {
            "info_type": {"name": "CUSTOM_REGEX_{}".format(i)},
            "regex": {"pattern": custom_regex},
        }
        for i, custom_regex in enumerate(custom_regexes)
    ]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        "info_types": info_types,
        "custom_info_types": custom_info_types,
        "min_likelihood": min_likelihood,
        "include_quote": include_quote,
        "limits": {"max_findings_per_request": max_findings},
    }

    # Construct the `item`.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Call the API.
    response = dlp.inspect_content(parent, inspect_config, item)

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            try:
                if finding.quote:
                    print("Quote: {}".format(finding.quote))
            except AttributeError:
                pass
            print("Info type: {}".format(finding.info_type.name))
            print("Likelihood: {}".format(finding.likelihood))
    else:
        print("No findings.")

Go

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// inspectString inspects the a given string, and prints results.
func inspectString(w io.Writer, projectID, textToInspect string) error {
	// projectID := "my-project-id"
	// textToInspect := "My name is Gary and my email is gary@example.com"
	ctx := context.Background()

	// Initialize client.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close() // Closing the client safely cleans up background resources.

	// Create and send the request.
	req := &dlppb.InspectContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: textToInspect,
			},
		},
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: []*dlppb.InfoType{
				{Name: "PHONE_NUMBER"},
				{Name: "EMAIL_ADDRESS"},
				{Name: "CREDIT_CARD_NUMBER"},
			},
			IncludeQuote: true,
		},
	}
	resp, err := client.InspectContent(ctx, req)
	if err != nil {
		return err
	}

	// Process the results.
	result := resp.Result
	fmt.Fprintf(w, "Findings: %d\n", len(result.Findings))
	for _, f := range result.Findings {
		fmt.Fprintf(w, "\tQoute: %s\n", f.Quote)
		fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
		fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
	}
	return nil
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\Likelihood;

/** Uncomment and populate these variables in your code */
// $projectId = 'YOUR_PROJECT_ID';
// $textToInspect = 'My name is Gary and my email is gary@example.com';

// Instantiate a client.
$dlp = new DlpServiceClient();

// Construct request
$parent = $dlp->projectName($projectId);
$item = (new ContentItem())
    ->setValue($textToInspect);
$inspectConfig = (new InspectConfig())
    // The infoTypes of information to match
    ->setInfoTypes([
        (new InfoType())->setName('PHONE_NUMBER'),
        (new InfoType())->setName('EMAIL_ADDRESS'),
        (new InfoType())->setName('CREDIT_CARD_NUMBER')
    ])
    // Whether to include the matching string
    ->setIncludeQuote(true);

// Run request
$response = $dlp->inspectContent($parent, [
    'inspectConfig' => $inspectConfig,
    'item' => $item
]);

// Print the results
$findings = $response->getResult()->getFindings();
if (count($findings) == 0) {
    print('No findings.' . PHP_EOL);
} else {
    print('Findings:' . PHP_EOL);
    foreach ($findings as $finding) {
        print('  Quote: ' . $finding->getQuote() . PHP_EOL);
        print('  Info type: ' . $finding->getInfoType()->getName() . PHP_EOL);
        $likelihoodString = Likelihood::name($finding->getLikelihood());
        print('  Likelihood: ' . $likelihoodString . PHP_EOL);
    }
}

Ruby

# project_id   = "Your Google Cloud project ID"
# content      = "The text to inspect"
# max_findings = "Maximum number of findings to report per request (0 = server maximum)"

require "google/cloud/dlp"

dlp = Google::Cloud::Dlp.dlp_service
inspect_config = {
  # The types of information to match
  info_types:     [{ name: "PERSON_NAME" }, { name: "US_STATE" }],

  # Only return results above a likelihood threshold (0 for all)
  min_likelihood: :POSSIBLE,

  # Limit the number of findings (0 for no limit)
  limits:         { max_findings_per_request: max_findings },

  # Whether to include the matching string in the response
  include_quote:  true
}

# The item to inspect
item_to_inspect = { value: content }

# Run request
parent = "projects/#{project_id}/locations/global"
response = dlp.inspect_content parent:         parent,
                               inspect_config: inspect_config,
                               item:           item_to_inspect

# Print the results
if response.result.findings.empty?
  puts "No findings"
else
  response.result.findings.each do |finding|
    puts "Quote:      #{finding.quote}"
    puts "Info type:  #{finding.info_type.name}"
    puts "Likelihood: #{finding.likelihood}"
  end
end

C#


using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.InspectConfig.Types;

public class InspectString
{
    public static InspectContentResponse Inspect(
        string projectId,
        string dataValue,
        string minLikelihood,
        int maxFindings,
        bool includeQuote,
        IEnumerable<InfoType> infoTypes,
        IEnumerable<CustomInfoType> customInfoTypes)
    {
        var inspectConfig = new InspectConfig
        {
            MinLikelihood = (Likelihood)Enum.Parse(typeof(Likelihood), minLikelihood, true),
            Limits = new FindingLimits
            {
                MaxFindingsPerRequest = maxFindings
            },
            IncludeQuote = includeQuote,
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoTypes }
        };
        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = new ContentItem
            {
                Value = dataValue
            },
            InspectConfig = inspectConfig
        };

        var dlp = DlpServiceClient.Create();
        var response = dlp.InspectContent(request);

        PrintResponse(includeQuote, response);

        return response;
    }

    private static void PrintResponse(bool includeQuote, InspectContentResponse response)
    {
        var findings = response.Result.Findings;
        if (findings.Any())
        {
            Console.WriteLine("Findings:");
            foreach (var finding in findings)
            {
                if (includeQuote)
                {
                    Console.WriteLine($"  Quote: {finding.Quote}");
                }
                Console.WriteLine($"  InfoType: {finding.InfoType}");
                Console.WriteLine($"  Likelihood: {finding.Likelihood}");
            }
        }
        else
        {
            Console.WriteLine("No findings.");
        }
    }
}

텍스트 파일 검사

아래 코드 샘플은 텍스트 파일에서 민감한 콘텐츠를 검사하는 방법을 보여줍니다.

자바


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.LocationName;
import com.google.protobuf.ByteString;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class InspectTextFile {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String filePath = "path/to/file.txt";
    inspectTextFile(projectId, filePath);
  }

  // Inspects the specified text file.
  public static void inspectTextFile(String projectId, String filePath) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteString fileBytes = ByteString.readFrom(new FileInputStream(filePath));
      ByteContentItem byteItem =
          ByteContentItem.newBuilder().setType(BytesType.TEXT_UTF8).setData(fileBytes).build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      List<InfoType> infoTypes = new ArrayList<>();
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      for (String typeName : new String[]{"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Construct the configuration for the Inspect request.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Import other required libraries
const fs = require('fs');
const mime = require('mime');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The path to a local file to inspect. Can be a text, JPG, or PNG file.
// const filepath = 'path/to/image.png';

// The minimum likelihood required before returning a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The maximum number of findings to report per request (0 = server maximum)
// const maxFindings = 0;

// The infoTypes of information to match
// const infoTypes = [{ name: 'PHONE_NUMBER' }, { name: 'EMAIL_ADDRESS' }, { name: 'CREDIT_CARD_NUMBER' }];

// The customInfoTypes of information to match
// const customInfoTypes = [{ infoType: { name: 'DICT_TYPE' }, dictionary: { wordList: { words: ['foo', 'bar', 'baz']}}},
//   { infoType: { name: 'REGEX_TYPE' }, regex: '\\(\\d{3}\\) \\d{3}-\\d{4}'}];

// Whether to include the matching string
// const includeQuote = true;

// Construct file data to inspect
const fileTypeConstant =
  ['image/jpeg', 'image/bmp', 'image/png', 'image/svg'].indexOf(
    mime.getType(filepath)
  ) + 1;
const fileBytes = Buffer.from(fs.readFileSync(filepath)).toString('base64');
const item = {
  byteItem: {
    type: fileTypeConstant,
    data: fileBytes,
  },
};

// Construct request
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  inspectConfig: {
    infoTypes: infoTypes,
    customInfoTypes: customInfoTypes,
    minLikelihood: minLikelihood,
    includeQuote: includeQuote,
    limits: {
      maxFindingsPerRequest: maxFindings,
    },
  },
  item: item,
};

// Run request
try {
  const [response] = await dlp.inspectContent(request);
  const findings = response.result.findings;
  if (findings.length > 0) {
    console.log('Findings:');
    findings.forEach(finding => {
      if (includeQuote) {
        console.log(`\tQuote: ${finding.quote}`);
      }
      console.log(`\tInfo type: ${finding.infoType.name}`);
      console.log(`\tLikelihood: ${finding.likelihood}`);
    });
  } else {
    console.log('No findings.');
  }
} catch (err) {
  console.log(`Error in inspectFile: ${err.message || err}`);
}

Python



def inspect_file(
    project,
    filename,
    info_types,
    min_likelihood=None,
    custom_dictionaries=None,
    custom_regexes=None,
    max_findings=None,
    include_quote=True,
    mime_type=None,
):
    """Uses the Data Loss Prevention API to analyze a file for protected data.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        filename: The path to the file to inspect.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        max_findings: The maximum number of findings to report; 0 = no maximum.
        include_quote: Boolean for whether to display a quote of the detected
            information in the results.
        mime_type: The MIME type of the file. If not specified, the type is
            inferred via the Python standard library's mimetypes module.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    import mimetypes

    # Import the client library.
    import google.cloud.dlp

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    if not info_types:
        info_types = ["FIRST_NAME", "LAST_NAME", "EMAIL_ADDRESS"]
    info_types = [{"name": info_type} for info_type in info_types]

    # Prepare custom_info_types by parsing the dictionary word lists and
    # regex patterns.
    if custom_dictionaries is None:
        custom_dictionaries = []
    dictionaries = [
        {
            "info_type": {"name": "CUSTOM_DICTIONARY_{}".format(i)},
            "dictionary": {"word_list": {"words": custom_dict.split(",")}},
        }
        for i, custom_dict in enumerate(custom_dictionaries)
    ]
    if custom_regexes is None:
        custom_regexes = []
    regexes = [
        {
            "info_type": {"name": "CUSTOM_REGEX_{}".format(i)},
            "regex": {"pattern": custom_regex},
        }
        for i, custom_regex in enumerate(custom_regexes)
    ]
    custom_info_types = dictionaries + regexes

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        "info_types": info_types,
        "custom_info_types": custom_info_types,
        "min_likelihood": min_likelihood,
        "limits": {"max_findings_per_request": max_findings},
    }

    # If mime_type is not specified, guess it from the filename.
    if mime_type is None:
        mime_guess = mimetypes.MimeTypes().guess_type(filename)
        mime_type = mime_guess[0]

    # Select the content type index from the list of supported types.
    supported_content_types = {
        None: 0,  # "Unspecified"
        "image/jpeg": 1,
        "image/bmp": 2,
        "image/png": 3,
        "image/svg": 4,
        "text/plain": 5,
    }
    content_type_index = supported_content_types.get(mime_type, 0)

    # Construct the item, containing the file's byte data.
    with open(filename, mode="rb") as f:
        item = {"byte_item": {"type": content_type_index, "data": f.read()}}
    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Call the API.
    response = dlp.inspect_content(parent, inspect_config, item)

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            try:
                print("Quote: {}".format(finding.quote))
            except AttributeError:
                pass
            print("Info type: {}".format(finding.info_type.name))
            print("Likelihood: {}".format(finding.likelihood))
    else:
        print("No findings.")

Go

import (
	"context"
	"fmt"
	"io"
	"io/ioutil"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// inspectTextFile inspects a text file at a given filePath, and prints results.
func inspectTextFile(w io.Writer, projectID, filePath string) error {
	// projectID := "my-project-id"
	// filePath := "path/to/image.png"
	ctx := context.Background()

	// Initialize client.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}
	defer client.Close() // Closing the client safely cleans up background resources.

	// Gather the resources for the request.
	data, err := ioutil.ReadFile(filePath)
	if err != nil {
		return err
	}

	// Create and send the request.
	req := &dlppb.InspectContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_ByteItem{
				ByteItem: &dlppb.ByteContentItem{
					Type: dlppb.ByteContentItem_TEXT_UTF8,
					Data: data,
				},
			},
		},
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: []*dlppb.InfoType{
				{Name: "PHONE_NUMBER"},
				{Name: "EMAIL_ADDRESS"},
				{Name: "CREDIT_CARD_NUMBER"},
			},
			IncludeQuote: true,
		},
	}
	resp, err := client.InspectContent(ctx, req)
	if err != nil {
		return fmt.Errorf("InspectContent: %v", err)
	}

	// Process the results.
	fmt.Fprintf(w, "Findings: %d\n", len(resp.Result.Findings))
	for _, f := range resp.Result.Findings {
		fmt.Fprintf(w, "\tQoute: %s\n", f.Quote)
		fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
		fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
	}
	return nil
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\ByteContentItem;
use Google\Cloud\Dlp\V2\ByteContentItem\BytesType;
use Google\Cloud\Dlp\V2\Likelihood;

/** Uncomment and populate these variables in your code */
// $projectId = 'YOUR_PROJECT_ID';
// $filepath = 'path/to/image.png';

// Instantiate a client.
$dlp = new DlpServiceClient();

// Get the bytes of the file
$fileBytes = (new ByteContentItem())
    ->setType(BytesType::TEXT_UTF8)
    ->setData(file_get_contents($filepath));

// Construct request
$parent = $dlp->projectName($projectId);
$item = (new ContentItem())
    ->setByteItem($fileBytes);
$inspectConfig = (new InspectConfig())
    // The infoTypes of information to match
    ->setInfoTypes([
        (new InfoType())->setName('PHONE_NUMBER'),
        (new InfoType())->setName('EMAIL_ADDRESS'),
        (new InfoType())->setName('CREDIT_CARD_NUMBER')
    ])
    // Whether to include the matching string
    ->setIncludeQuote(true);

// Run request
$response = $dlp->inspectContent($parent, [
    'inspectConfig' => $inspectConfig,
    'item' => $item
]);

// Print the results
$findings = $response->getResult()->getFindings();
if (count($findings) == 0) {
    print('No findings.' . PHP_EOL);
} else {
    print('Findings:' . PHP_EOL);
    foreach ($findings as $finding) {
        print('  Quote: ' . $finding->getQuote() . PHP_EOL);
        print('  Info type: ' . $finding->getInfoType()->getName() . PHP_EOL);
        $likelihoodString = Likelihood::name($finding->getLikelihood());
        print('  Likelihood: ' . $likelihoodString . PHP_EOL);
    }
}

Ruby

# project_id   = "Your Google Cloud project ID"
# filename     = "The file path to the file to inspect"
# max_findings = "Maximum number of findings to report per request (0 = server maximum)"

require "google/cloud/dlp"

dlp = Google::Cloud::Dlp.dlp_service
inspect_config = {
  # The types of information to match
  info_types:     [{ name: "PERSON_NAME" }, { name: "PHONE_NUMBER" }],

  # Only return results above a likelihood threshold (0 for all)
  min_likelihood: :POSSIBLE,

  # Limit the number of findings (0 for no limit)
  limits:         { max_findings_per_request: max_findings },

  # Whether to include the matching string in the response
  include_quote:  true
}

# The item to inspect
file = File.open filename, "rb"
item_to_inspect = { byte_item: { type: :BYTES_TYPE_UNSPECIFIED, data: file.read } }

# Run request
parent = "projects/#{project_id}/locations/global"
response = dlp.inspect_content parent:         parent,
                               inspect_config: inspect_config,
                               item:           item_to_inspect

# Print the results
if response.result.findings.empty?
  puts "No findings"
else
  response.result.findings.each do |finding|
    puts "Quote:      #{finding.quote}"
    puts "Info type:  #{finding.info_type.name}"
    puts "Likelihood: #{finding.likelihood}"
  end
end

C#


using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;
using static Google.Cloud.Dlp.V2.ByteContentItem.Types;

public class DlpInspectFile
{
    public static IEnumerable<Finding> InspectFile(string projectId, string filePath, BytesType fileType)
    {
        // Instantiate a client.
        var dlp = DlpServiceClient.Create();

        // Get the bytes from the file.
        ByteString fileBytes;
        using (Stream f = new FileStream(filePath, FileMode.Open))
        {
            fileBytes = ByteString.FromStream(f);
        }

        // Construct a request.
        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = new ContentItem
            {
                ByteItem = new ByteContentItem()
                {
                    Data = fileBytes,
                    Type = fileType
                }
            },
            InspectConfig = new InspectConfig
            {
                // The info types of information to match
                InfoTypes =
                {
                    new InfoType { Name = "PHONE_NUMBER" },
                    new InfoType { Name = "EMAIL_ADDRESS" },
                    new InfoType { Name = "CREDIT_CARD_NUMBER" }
                },
                // The minimum likelihood before returning a match
                MinLikelihood = Likelihood.Unspecified,
                // Whether to include the matching string
                IncludeQuote = true,
                Limits = new InspectConfig.Types.FindingLimits
                {
                    // The maximum number of findings to report per request
                    // (0 = server maximum)
                    MaxFindingsPerRequest = 0
                }
            }
        };

        // Execute request
        var response = dlp.InspectContent(request);

        // Inspect response
        var findings = response.Result.Findings;
        if (findings.Any())
        {
            Console.WriteLine("Findings:");
            foreach (var finding in findings)
            {
                Console.WriteLine($"Quote: {finding.Quote}");
                Console.WriteLine($"InfoType: {finding.InfoType}");
                Console.WriteLine($"Likelihood: {finding.Likelihood}");
            }
        }
        else
        {
            Console.WriteLine("No findings.");
        }
        return findings;
    }
}