Regulären benutzerdefinierten Wörterbuchdetektor erstellen

Benutzerdefinierte Wörterbücher bieten die einfache, aber leistungsstarke Möglichkeit, eine Liste von Wörtern oder Wortgruppen abzugleichen. Sie können ein benutzerdefiniertes Wörterbuch als Detektor oder als Ausnahmeliste für integrierte Detektoren verwenden. Sie können auch benutzerdefinierte Wörterbücher verwenden, um integrierte infoType-Detektoren so zu ergänzen, dass zusätzliche Ergebnisse gefunden werden.

In diesem Abschnitt wird beschrieben, wie Sie einen regulären benutzerdefinierten Wörterbuchdetektor aus einer Liste von Wörtern erstellen.

Anatomie eines benutzerdefinierten infoType-Wörterbuchdetektors

Wie in der API-Übersicht dargestellt, definieren Sie zum Erstellen eines benutzerdefinierten infoType-Wörterbuchdetektors ein Objekt CustomInfoType, das Folgendes enthält:

  • Der Name, den Sie dem benutzerdefinierten infoType-Detektor innerhalb eines InfoType-Objekts geben möchten.
  • Einen optionalen Wert Likelihood. Wenn Sie dieses Feld weglassen, wird bei Übereinstimmungen mit den Wörterbuchelementen die Standardwahrscheinlichkeit VERY_LIKELY zurückgegeben.
  • Optionale Objekte DetectionRule oder Hotword-Regeln. Diese Regeln passen die Wahrscheinlichkeit von Ergebnissen innerhalb einer gegebenen Nähe von angegebenen Hotwords an. Weitere Informationen zu Hotword-Regeln finden Sie im Abschnitt Übereinstimmungswahrscheinlichkeit anpassen.
  • Einen optionalen Wert SensitivityScore. Wenn Sie dieses Feld weglassen, wird bei Übereinstimmungen mit den Wörterbuchelementen die Standardvertraulichkeitsstufe HIGH zurückgegeben.

    Sensitivitätsbewertungen werden in Datenprofilen verwendet. Beim Erstellen von Nutzerprofilen verwendet der Schutz sensibler Daten die Sensibilitätsbewertungen der infoTypes, um das Sensibilitätsniveau zu berechnen.

  • Ein Dictionary, entweder als WordList mit einer Liste von Wörtern, nach denen gesucht werden soll, oder als CloudStoragePath zu einer einzelnen Textdatei mit einer durch Zeilenumbruch getrennten Liste von Wörtern, nach denen gesucht werden soll.

Ein benutzerdefinierter infoType-Detektor für Wörterbücher, der alle optionalen Komponenten enthält, sieht als JSON-Objekt so aus: Diese JSON-Datei enthält einen Pfad zu einer Wörterbuchtextdatei, die in Cloud Storage gespeichert ist. Eine Inline-Wortliste finden Sie weiter unten in diesem Thema im Abschnitt Beispiele.

          "score": "SENSITIVITY_SCORE"
          "path": "gs://PATH_TO_TXT_FILE"

Details zum Wörterbuchabgleich

Im Folgenden finden Sie Hinweise dazu, wie der Schutz sensibler Daten den Abgleich mit Wörtern und Wortgruppen im Wörterbuch vornimmt. Diese Punkte gelten sowohl für reguläre als auch für große benutzerdefinierte Wörterbücher:

  • Bei Wörtern in Wörterbüchern wird nicht zwischen Groß- und Kleinschreibung unterschieden. Wenn Ihr Wörterbuch Abby enthält, wird es auf abby, ABBY, Abby usw. abgestimmt.
  • Alle Zeichen, ob in Wörterbüchern oder zu scannenden Inhalten, die nicht zu den in der Unicode Basic Multilingual Plane enthaltenen Buchstaben, Ziffern und anderen alphabetischen Zeichen zählen, werden beim Scannen auf Übereinstimmungen wie Leerzeichen behandelt. Wenn Ihr Wörterbuch nach Abby Abernathy sucht, findet es abby abernathy, Abby, Abernathy, Abby (ABERNATHY) und so weiter als Übereinstimmungen.
  • Die Zeichen in direkter Nachbarschaft zu einer Übereinstimmung müssen von einem anderen Typ sein (Buchstaben oder Ziffern) als die benachbarten Zeichen innerhalb des Worts. Wenn Ihr Wörterbuch nach Abi sucht, werden die ersten drei Zeichen von Abi904, aber nicht von Abigail als Übereinstimmung erkannt.
  • Wörter in Wörterbüchern, die Zeichen aus der Supplementary Multilingual Plane des Unicode-Standards enthalten, können zu unerwarteten Ergebnissen führen. Beispiele für solche Zeichen sind Emojis, wissenschaftliche Symbole und historische Schriften.

Buchstaben, Ziffern und andere alphabetische Zeichen sind so definiert:

  • Buchstaben: Zeichen mit den allgemeinen Kategorien Lu, Ll, Lt, Lm oder Lo in der Unicode-Spezifikation
  • Ziffern: Zeichen mit der allgemeinen Kategorie Nd in der Unicode-Spezifikation
  • Andere alphabetische Zeichen: Zeichen mit der allgemeinen Kategorie Nl in der Unicode-Spezifikation oder mit der beitragenden Eigenschaft Other_Alphabetic gemäß Unicode-Standard


Einfache Wortliste

Angenommen, Sie haben Daten, die angeben, in welchem Krankenhauszimmer ein Patient während eines Besuchs behandelt wurde. Diese Orte können in einem bestimmten Datensatz als vertraulich angesehen werden, sie werden jedoch nicht von den integrierten Detektoren des Schutzes sensibler Daten erfasst.

Die Zimmer wurden aufgelistet als:

  • "RM-Orange"
  • "RM-Yellow"
  • "RM-Green"

Informationen zum Installieren und Verwenden der Clientbibliothek für den Schutz sensibler Daten finden Sie unter Clientbibliotheken für den Schutz sensibler Daten.

Richten Sie die Standardanmeldedaten für Anwendungen ein, um sich bei Sensitive Data Protection zu authentifizieren. Weitere Informationen finden Sie unter Authentifizierung für eine lokale Entwicklungsumgebung einrichten.

using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithSimpleWordList
    public static DeidentifyContentResponse Deidentify(string projectId, string text)
        // Instantiate a client.
        var dlp = DlpServiceClient.Create();

        var contentItem = new ContentItem { Value = text };

        var wordList = new CustomInfoType.Types.Dictionary.Types.WordList
            Words = { new string[] { "RM-GREEN", "RM-YELLOW", "RM-ORANGE" } }

        var infoType = new InfoType
            Name = "CUSTOM_ROOM_ID"

        var customInfoType = new CustomInfoType
            InfoType = infoType,
            Dictionary = new CustomInfoType.Types.Dictionary
                WordList = wordList

        var inspectConfig = new InspectConfig
            CustomInfoTypes =
        var primitiveTransformation = new PrimitiveTransformation
            ReplaceWithInfoTypeConfig = new ReplaceWithInfoTypeConfig { }

        var transformation = new InfoTypeTransformations.Types.InfoTypeTransformation
            InfoTypes = { infoType },
            PrimitiveTransformation = primitiveTransformation

        var deidentifyConfig = new DeidentifyConfig
            InfoTypeTransformations = new InfoTypeTransformations
                Transformations = { transformation }

        var request = new DeidentifyContentRequest
            Parent = new LocationName(projectId, "global").ToString(),
            InspectConfig = inspectConfig,
            DeidentifyConfig = deidentifyConfig,
            Item = contentItem

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Inspect the results.
        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;

import (

	dlp ""

// deidentifyWithWordList matches against a custom simple word list to de-identify sensitive
// data based on the input
func deidentifyWithWordList(w io.Writer, projectID, input string, infoTypeName string, wordList []string) error {
	// projectID := "my-project-id"
	// input := "Patient was seen in RM-YELLOW then transferred to rm green."
	// wordList := []string{"RM-GREEN", "RM-YELLOW", "RM-ORANGE"}

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify what content you want the service to DeIdentify.
	item := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: input,

	// Specify the word list custom info type the inspection will look for.
	infoType := &dlppb.InfoType{
		Name: infoTypeName,

	var customInfoType = &dlppb.CustomInfoType{
		InfoType: infoType,
		Type: &dlppb.CustomInfoType_Dictionary_{
			Dictionary: &dlppb.CustomInfoType_Dictionary{
				Source: &dlppb.CustomInfoType_Dictionary_WordList_{
					// Construct the word list to be detected
					WordList: &dlppb.CustomInfoType_Dictionary_WordList{
						Words: wordList,

	// Define type of de-identification as replacement.
	primitiveTransformation := &dlppb.PrimitiveTransformation{
		Transformation: &dlppb.PrimitiveTransformation_ReplaceWithInfoTypeConfig{
			ReplaceWithInfoTypeConfig: &dlppb.ReplaceWithInfoTypeConfig{},

	infoTypeTransformation := &dlppb.InfoTypeTransformations_InfoTypeTransformation{
		InfoTypes:               []*dlppb.InfoType{infoType},
		PrimitiveTransformation: primitiveTransformation,

	infoTypeTransformations := &dlppb.InfoTypeTransformations{
		// Associate de-identification type with info type.
		Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{

	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			CustomInfoTypes: []*dlppb.CustomInfoType{
		// Construct the configuration for the de-identify request and list all desired transformations.
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: infoTypeTransformations,
		// The item to analyze.
		Item: item,

	// Send the request.
	resp, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return err

	// Print the result.
	fmt.Fprintf(w, "output : %v", resp.GetItem().GetValue())
	return nil

public class DeIdentifyWithSimpleWordList {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "Patient was seen in RM-YELLOW then transferred to rm green.";
    deidentifyWithSimpleWordList(projectId, textToDeIdentify);

  public static void deidentifyWithSimpleWordList(String projectId, String textToDeIdentify)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify what content you want the service to DeIdentify.
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Construct the word list to be detected
      Dictionary wordList =

      // Specify the word list custom info type the inspection will look for.
      InfoType infoType = InfoType.newBuilder().setName("CUSTOM_ROOM_ID").build();
      CustomInfoType customInfoType =
      InspectConfig inspectConfig =

      // Define type of deidentification as replacement.
      PrimitiveTransformation primitiveTransformation =

      // Associate deidentification type with info type.
      InfoTypeTransformation transformation =

      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig deidentifyConfig =

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
              .setParent(LocationName.of(projectId, "global").toString())

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
          "Text after replace with infotype config: " + response.getItem().getValue());

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// TODO(developer): Replace these variables before running the sample.
// const projectId = "your-project-id";

// The string to de-identify
// const textToInspect = 'Patient was seen in RM-YELLOW then transferred to rm green.';

// Words to look for during inspection
// const words = ['RM-GREEN', 'RM-YELLOW', 'RM-ORANGE'];

// Name of the custom info type
// const customInfoTypeName = 'CUSTOM_ROOM_ID';

async function deIdentifyWithSimpleWordList() {
  // Initialize client that will be used to send requests. This client only needs to be created
  // once, and can be reused for multiple requests. After completing all of your requests, call
  // the "close" method on the client to safely clean up any remaining background resources.
  const dlp = new DLP.DlpServiceClient();

  // Construct the word list to be detected
  const wordList = {
    words: words,

  // Specify the word list custom info type the inspection will look for.
  const infoType = {
    name: customInfoTypeName,
  const customInfoType = {
    dictionary: {

  // Construct de-identify configuration
  const deidentifyConfig = {
    infoTypeTransformations: {
      transformations: [
          primitiveTransformation: {
            replaceWithInfoTypeConfig: {},

  // Construct inspect configuration
  const inspectConfig = {
    customInfoTypes: [customInfoType],

  // Construct Item
  const item = {
    value: textToInspect,
  // Combine configurations into a request for the service.
  const request = {
    parent: `projects/${projectId}/locations/global`,

  // Send the request and receive response from the service
  const [response] = await dlp.deidentifyContent(request);
  // Print the results
    `Text after replace with infotype config: ${response.item.value}`


use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CustomInfoType;
use Google\Cloud\Dlp\V2\CustomInfoType\Dictionary;
use Google\Cloud\Dlp\V2\CustomInfoType\Dictionary\WordList;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DeidentifyContentRequest;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\ReplaceWithInfoTypeConfig;

 * De-identify sensitive data with a simple word list
 * Matches against a custom simple word list to de-identify sensitive data.
 * @param string $callingProjectId  The Google Cloud project id to use as a parent resource.
 * @param string $string            The string to deidentify (will be treated as text).

function deidentify_simple_word_list(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $string = 'Patient was seen in RM-YELLOW then transferred to rm green.'
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$callingProjectId/locations/global";

    $content = (new ContentItem())

    // Construct the word list to be detected
    $wordList = (new Dictionary())
        ->setWordList((new WordList())
            ->setWords(['RM-GREEN', 'RM-YELLOW', 'RM-ORANGE']));

    // The infoTypes of information to mask
    $custoMRoomIdinfoType = (new InfoType())
    $customInfoType = (new CustomInfoType())

    // Create the configuration object
    $inspectConfig = (new InspectConfig())

    // Create the information transform configuration objects
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setReplaceWithInfoTypeConfig(new ReplaceWithInfoTypeConfig());

    $infoTypeTransformation = (new InfoTypeTransformation())

    $infoTypeTransformations = (new InfoTypeTransformations())

    // Create the deidentification configuration object
    $deidentifyConfig = (new DeidentifyConfig())

    // Run request
    $deidentifyContentRequest = (new DeidentifyContentRequest())
    $response = $dlp->deidentifyContent($deidentifyContentRequest);

    // Print the results
    printf('Deidentified content: %s', $response->getItem()->getValue());

def deidentify_with_simple_word_list(
    project: str,
    input_str: str,
    custom_info_type_name: str,
    word_list: list[str],
) -> None:
    """Uses the Data Loss Prevention API to de-identify sensitive data in a
      string by matching against custom word list.

        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        custom_info_type_name: The name of the custom info type to use.
        word_list: The list of strings to match against.

    # Instantiate a client.
    dlp =

    # Prepare custom_info_types by parsing word lists
    word_list = {"words": word_list}
    custom_info_types = [
            "info_type": {"name": custom_info_type_name},
            "dictionary": {"word_list": word_list},

    # Construct the configuration dictionary
    inspect_config = {
        "custom_info_types": custom_info_types,

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {"primitive_transformation": {"replace_with_info_type_config": {}}}

    # Construct the `item`.
    item = {"value": input_str}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API
    response = dlp.deidentify_content(
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": item,

    print(f"De-identified Content: {response.item.value}")

Im folgenden JSON-Beispiel wird ein benutzerdefiniertes Wörterbuch erstellt, mit dem Sie benutzerdefinierte Zimmernummern de-identifizieren können.



    "value":"Patient was seen in RM-YELLOW then transferred to rm green."



Wenn wir die JSON-Eingabe an content:deidentify senden, wird die folgende JSON-Antwort zurückgegeben:

    "value":"Patient was seen in [CUSTOM_ROOM_ID] then transferred to [CUSTOM_ROOM_ID]."


Der Schutz sensibler Daten hat die in der Nachricht WordList des benutzerdefinierten Wörterbuchs angegebenen Zimmernummern korrekt identifiziert. Beachten Sie, dass Elemente auch dann übereinstimmen, wenn die Groß- und Kleinschreibung variiert und der Bindestrich (-) fehlt, wie im zweiten Beispiel bei "rm green".


Angenommen, Sie haben Log-Daten, die Kunden-IDs wie E-Mail-Adressen enthalten, und Sie möchten diese Informationen entfernen. Diese Logs enthalten jedoch auch die E-Mail-Adressen interner Entwickler, die Sie dabei nicht löschen möchten.

using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithExceptionList
    public static DeidentifyContentResponse Deidentify(string projectId, string text)
        // Instantiate a client.
        var dlp = DlpServiceClient.Create();

        var contentItem = new ContentItem { Value = text };

        var wordList = new CustomInfoType.Types.Dictionary.Types.WordList
            Words = { new string[] { "", "" } }

        var exclusionRule = new ExclusionRule
            MatchingType = MatchingType.FullMatch,
            Dictionary = new CustomInfoType.Types.Dictionary
                WordList = wordList

        var infoType = new InfoType { Name = "EMAIL_ADDRESS" };

        var inspectionRuleSet = new InspectionRuleSet
            InfoTypes = { infoType },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }

        var inspectConfig = new InspectConfig
            InfoTypes = { infoType },
            RuleSet = { inspectionRuleSet }
        var primitiveTransformation = new PrimitiveTransformation
            ReplaceWithInfoTypeConfig = new ReplaceWithInfoTypeConfig { }

        var transformation = new InfoTypeTransformations.Types.InfoTypeTransformation
            InfoTypes = { infoType },
            PrimitiveTransformation = primitiveTransformation

        var deidentifyConfig = new DeidentifyConfig
            InfoTypeTransformations = new InfoTypeTransformations
                Transformations = { transformation }

        var request = new DeidentifyContentRequest
            Parent = new LocationName(projectId, "global").ToString(),
            InspectConfig = inspectConfig,
            DeidentifyConfig = deidentifyConfig,
            Item = contentItem

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Inspect the results.
        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;

import (

	dlp ""

// deidentifyExceptionList creates an exception list for a regular custom dictionary detector.
func deidentifyExceptionList(w io.Writer, projectID, input string) error {
	// projectID := "my-project-id"
	// input := " accessed customer record of"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %w", err)

	// Closing the client safely cleans up background resousrces.
	defer client.Close()

	// Specify what content you want the service to DeIdentify.
	item := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: input,

	// Specify the word list custom info type and build-in info type the inspection will look for.
	infoTypes := []*dlppb.InfoType{
		{Name: "EMAIL_ADDRESS"},

	dictionary := &dlppb.CustomInfoType_Dictionary{
		Source: &dlppb.CustomInfoType_Dictionary_WordList_{
			WordList: &dlppb.CustomInfoType_Dictionary_WordList{
				Words: []string{"", ""},

	exclusionRule := &dlppb.ExclusionRule{
		MatchingType: dlppb.MatchingType_MATCHING_TYPE_FULL_MATCH,
		Type: &dlppb.ExclusionRule_Dictionary{
			Dictionary: dictionary,

	inspectRuleSet := &dlppb.InspectionRuleSet{
		InfoTypes: infoTypes,
		Rules: []*dlppb.InspectionRule{
				Type: &dlppb.InspectionRule_ExclusionRule{
					ExclusionRule: exclusionRule,

	// Construct the configuration for the de-id request and list all desired transformations.
	primitiveTransformation := &dlppb.PrimitiveTransformation{
		Transformation: &dlppb.PrimitiveTransformation_ReplaceWithInfoTypeConfig{},

	infoTypeTransformation := &dlppb.InfoTypeTransformations{
		Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
				PrimitiveTransformation: primitiveTransformation,

	deIdentifyConfig := &dlppb.DeidentifyConfig{
		Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
			InfoTypeTransformations: infoTypeTransformation,

	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent:           fmt.Sprintf("projects/%s/locations/global", projectID),
		DeidentifyConfig: deIdentifyConfig,
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes:       infoTypes,
			CustomInfoTypes: []*dlppb.CustomInfoType{},
			RuleSet:         []*dlppb.InspectionRuleSet{inspectRuleSet},
		// The item to analyze.
		Item: item,

	// Send the request.
	resp, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return err

	// Print the result.
	fmt.Fprintf(w, "output : %v", resp.GetItem().GetValue())
	return nil


public class DeIdentifyWithExceptionList {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = " accessed customer record of";
    deIdentifyWithExceptionList(projectId, textToDeIdentify);

  public static void deIdentifyWithExceptionList(String projectId, String textToDeIdentify)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify what content you want the service to DeIdentify.
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Construct the custom word list to be detected.
      Dictionary wordList =

      // Construct the custom dictionary detector associated with the word list.
      InfoType developerEmail = InfoType.newBuilder().setName("DEVELOPER_EMAIL").build();
      CustomInfoType customInfoType =

      ExclusionRule exclusionRule =

      InspectionRule inspectionRule =

      // Specify the word list custom info type and build-in info type the inspection will look for.
      InfoType emailAddress = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();

      InspectionRuleSet inspectionRuleSet =

      InspectConfig inspectConfig =

      // Define type of deidentification as replacement.
      PrimitiveTransformation primitiveTransformation =

      // Associate de-identification type with info type.
      InfoTypeTransformation transformation =

      // Construct the configuration for the de-id request and list all desired transformations.
      DeidentifyConfig deidentifyConfig =

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
              .setParent(LocationName.of(projectId, "global").toString())

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
          "Text after replace with infotype config: " + response.getItem().getValue());

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
const dlp = new DLP.DlpServiceClient();

// TODO(developer): Replace these variables before running the sample.
// const projectId = "your-project-id";

// The string to deidentify
// const textToInspect = ' accessed customer record of';

// Words to exclude for during inspection
// const words = ['', ''];

// The infoTypes of information to match
// See for more information
// about supported infoTypes.
// const infoTypes = [{ name: 'EMAIL_ADDRESS' }];

async function deIdentifyWithExceptionList() {
  // Construct item to inspect
  const item = {value: textToInspect};

  // Construct the custom dictionary detector associated with the word list.
  const wordListDict = {
    wordList: {
      words: words,

  // Construct a rule set that will only match if the match text does not
  // contains tokens from the exclusion list.
  const ruleSet = [
      infoTypes: infoTypes,
      rules: [
          exclusionRule: {
            dictionary: wordListDict,

  // Combine configurations to construct inspect config.
  const inspectConfig = {
    infoTypes: infoTypes,
    ruleSet: ruleSet,

  // Define type of de-identification as replacement & associate de-identification type with info type.
  const transformation = {
    infoTypes: [],
    primitiveTransformation: {
      replaceWithInfoTypeConfig: {},

  // Construct the configuration for the de-identification request and list all desired transformations.
  const deidentifyConfig = {
    infoTypeTransformations: {
      transformations: [transformation],

  // Combine configurations into a request for the service.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    item: item,
    inspectConfig: inspectConfig,
    deidentifyConfig: deidentifyConfig,

  // Send the request and receive response from the service.
  const [response] = await dlp.deidentifyContent(request);

  // Print the results
    `Text after replace with infotype config: ${response.item.value}`


use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CustomInfoType\Dictionary;
use Google\Cloud\Dlp\V2\CustomInfoType\Dictionary\WordList;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DeidentifyContentRequest;
use Google\Cloud\Dlp\V2\ExclusionRule;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectionRule;
use Google\Cloud\Dlp\V2\InspectionRuleSet;
use Google\Cloud\Dlp\V2\MatchingType;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\ReplaceWithInfoTypeConfig;

 * Create an exception list for de-identification
 * Create an exception list for a regular custom dictionary detector.
 * @param string $callingProjectId  The project ID to run the API call under
 * @param string $textToDeIdentify  The String you want the service to DeIdentify
function deidentify_exception_list(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $textToDeIdentify = ' accessed customer record of'
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Specify what content you want the service to DeIdentify.
    $contentItem = (new ContentItem())

    // Construct the custom word list to be detected.
    $wordList = (new Dictionary())
        ->setWordList((new WordList())
            ->setWords(['', '']));

    // Specify the exclusion rule and build-in info type the inspection will look for.
    $exclusionRule = (new ExclusionRule())

    $emailAddress = (new InfoType())
    $inspectionRuleSet = (new InspectionRuleSet())
            (new InspectionRule())

    $inspectConfig = (new InspectConfig())

    // Define type of deidentification as replacement.
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setReplaceWithInfoTypeConfig(new ReplaceWithInfoTypeConfig());

    // Associate de-identification type with info type.
    $transformation = (new InfoTypeTransformation())

    // Construct the configuration for the de-id request and list all desired transformations.
    $deidentifyConfig = (new DeidentifyConfig())
            (new InfoTypeTransformations())

    // Send the request and receive response from the service
    $parent = "projects/$callingProjectId/locations/global";
    $deidentifyContentRequest = (new DeidentifyContentRequest())
    $response = $dlp->deidentifyContent($deidentifyContentRequest);

    // Print the results
    printf('Text after replace with infotype config: %s', $response->getItem()->getValue());

from typing import List


def deidentify_with_exception_list(
    project: str, content_string: str, info_types: List[str], exception_list: List[str]
) -> None:
    """Uses the Data Loss Prevention API to de-identify sensitive data in a
      string but ignore matches against custom list.

        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        exception_list: The list of strings to ignore matches on.

          None; the response from the API is printed to the terminal.

    # Instantiate a client
    dlp =

    # Construct a list of infoTypes for DLP to locate in `content_string`. See
    # for more information
    # about supported infoTypes.

    info_types = [{"name": info_type} for info_type in info_types]

    # Construct a rule set that will only match on info_type
    # if the matched text is not in the exception list.
    rule_set = [
            "info_types": info_types,
            "rules": [
                    "exclusion_rule": {
                        "dictionary": {"word_list": {"words": exception_list}},

    # Construct the configuration dictionary
    inspect_config = {
        "info_types": info_types,
        "rule_set": rule_set,

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {"primitive_transformation": {"replace_with_info_type_config": {}}}

    # Construct the `item`.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Call the API
    response = dlp.deidentify_content(
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": item,

    # Print out the results.

Im folgenden JSON-Beispiel wird ein benutzerdefiniertes Wörterbuch erstellt, das eine Teilmenge der E-Mail-Adressen aus der WordList-Nachricht ( und auflistet und diesen den benutzerdefinierten infoType-Namen DEVELOPER_EMAIL zuweist. Dieser JSON-Befehl weist den Schutz sensibler Daten an, die angegebenen E-Mail-Adressen zu ignorieren, während alle anderen erkannten E-Mail-Adressen durch einen String ersetzt werden, der dem infoType entspricht (in diesem Fall EMAIL_ADDRESS):



    "value":" accessed customer record of"

    "ruleSet": [
        "infoTypes": [
            "name": "EMAIL_ADDRESS"
        "rules": [
            "exclusionRule": {
              "excludeInfoTypes": {
                "infoTypes": [
                    "name": "DEVELOPER_EMAIL"
              "matchingType": "MATCHING_TYPE_FULL_MATCH"


Wenn wir diese JSON-Nachricht an content:deidentify senden, wird die folgende JSON-Antwort zurückgegeben:

    "value":" accessed customer record of [EMAIL_ADDRESS]"


Die Ausgabe hat korrekt als übereinstimmend mit dem infoType-Detektor EMAIL_ADDRESS und als übereinstimmend mit dem benutzerdefinierten infoType-Detektor DEVELOPER_EMAIL erkannt. Beachten Sie, dass unverändert beibehalten wird, weil nur EMAIL_ADDRESS gepasst wurde.

Integrierten infoType-Detektor erweitern

Stellen Sie sich ein Szenario vor, in dem ein integrierter infoType-Detektor nicht die korrekten Werte zurückgibt. Sie möchten beispielsweise Übereinstimmungen für Personennamen zurückgeben, aber der in der Funktion zum Schutz sensibler Daten integrierte PERSON_NAME-Detektor gibt keine Übereinstimmungen für einige Personennamen zurück, die in Ihrem Dataset häufig anzutreffen sind.

Mit dem Schutz sensibler Daten können Sie integrierte infoType-Detektoren dadurch erweitern, dass Sie einen integrierten Detektor in die Deklaration für einen benutzerdefinierten infoType-Detektor einschließen, wie im folgenden Beispiel gezeigt. Dieses Code-Snippet veranschaulicht, wie Sie den Schutz sensibler Daten so konfigurieren, dass der integrierte infoType-Detektor PERSON_NAME zusätzlich den Abgleich gegen den Namen „Quasimodo“ vornimmt:

using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;
using System;
using System.Linq;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;

public class InspectDataUsingAugmentInfoTypes
    public static InspectContentResponse InspectData(
        string projectId,
        string text,
        InfoType infoType = null)
        // Instantiate the dlp client.
        var dlp = DlpServiceClient.Create();

        // Specify the type of info to be inspected and construct the infotype.
        var infotype = infoType ?? new InfoType { Name = "PERSON_NAME" };

        // Construct the custom infoTypes with dictionary.
        var customInfoTypes = new CustomInfoType
            InfoType = infotype,
            Dictionary = new Dictionary
                WordList = new Dictionary.Types.WordList
                    Words = { new string[] { "quasimodo" } }

        // Construct the inspect config using custom infoTypes.
        var inspectConfig = new InspectConfig
            CustomInfoTypes = { customInfoTypes },
            IncludeQuote = true,
            InfoTypes = { infotype }

        // Construct the request.
        var request = new InspectContentRequest
            ParentAsLocationName = new LocationName(projectId, "global"),
            InspectConfig = inspectConfig,
            Item = new ContentItem
                ByteItem = new ByteContentItem
                    Data = ByteString.CopyFromUtf8(text),
                    Type = ByteContentItem.Types.BytesType.TextUtf8

        // Call the API.
        InspectContentResponse response = dlp.InspectContent(request);

        // Parse the response.
        var findings = response.Result.Findings;
        Console.WriteLine($"Finding: {findings.Count}");

        foreach (var f in findings)
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);

        return response;

import (

	dlp ""

// inspectAugmentInfoTypes performs info type augmentation using Google Cloud DLP.
// It enhances data inspection by supplementing existing info types with custom-defined ones,
// expanding the ability to identify sensitive information in different contexts.
func inspectAugmentInfoTypes(w io.Writer, projectID, textToInspect string, wordList []string) error {
	// projectID := "your-project-id"
	// textToInspect := "The patient's name is quasimodo"
	// wordList := []string{"quasimodo"}

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be inspected.
	item := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: textToInspect,

	// Construct the custom word list to be detected.
	dictionary := &dlppb.CustomInfoType_Dictionary{
		Source: &dlppb.CustomInfoType_Dictionary_WordList_{
			WordList: &dlppb.CustomInfoType_Dictionary_WordList{
				Words: wordList,

	// Specify the type of info the inspection will look for.
	// See for complete list of info types.
	infoType := &dlppb.InfoType{
		Name: "PERSON_NAME",

	// Construct a custom infoType detector by augmenting the PERSON_NAME detector with a word list.
	customInfoType := &dlppb.CustomInfoType{
		InfoType: infoType,
		Type: &dlppb.CustomInfoType_Dictionary_{
			Dictionary: dictionary,

	// Specify the inspect config for data inspection settings in DLP API, enabling rule
	// specification, custom info types, and actions on sensitive data. Crucial for tailored
	// data protection and privacy regulation compliance.
	inspectConfig := &dlppb.InspectConfig{
		CustomInfoTypes: []*dlppb.CustomInfoType{
		IncludeQuote: true,

	// Construct the Inspect request to be sent by the client.
	req := &dlppb.InspectContentRequest{
		Parent:        fmt.Sprintf("projects/%s/locations/global", projectID),
		Item:          item,
		InspectConfig: inspectConfig,

	// Create the request for the job configured above.
	resp, err := client.InspectContent(ctx, req)
	if err != nil {
		return err

	// Process the results.
	result := resp.Result
	fmt.Fprintf(w, "Findings: %d\n", len(result.Findings))
	for _, f := range result.Findings {
		fmt.Fprintf(w, "\tQuote: %s\n", f.Quote)
		fmt.Fprintf(w, "\tInfo type: %s\n", f.InfoType.Name)
		fmt.Fprintf(w, "\tLikelihood: %s\n", f.Likelihood)
	return nil

import java.util.Arrays;
import java.util.List;

public class InspectStringAugmentInfoType {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    // The Google Cloud project id to use as a parent resource.
    String projectId = "your-project-id";
    // The string to de-identify.
    String textToInspect = "The patient's name is quasimodo";
    // The string to be additionally matched.
    List<String> wordList = Arrays.asList("quasimodo");
    inspectStringAugmentInfoType(projectId, textToInspect, wordList);

  // Inspects the text using new custom words added to the dictionary.
  public static void inspectStringAugmentInfoType(
      String projectId, String textToInspect, List<String> wordList) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Construct the custom word list to be detected.
      CustomInfoType.Dictionary dictionary =

      InfoType infoType = InfoType.newBuilder().setName("PERSON_NAME").build();
      // Construct a custom infotype detector by augmenting the PERSON_NAME detector with a word
      // list.
      CustomInfoType customInfoType =

      InspectConfig inspectConfig =

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
              .setParent(LocationName.of(projectId, "global").toString())

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());

// Imports the Google Cloud client library
const DLP = require('@google-cloud/dlp');
// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const projectId = 'my-project';

// The string to inspect
// const string = "The patient's name is quasimodo";

// Word list
// const words = ['quasimodo'];

async function inspectStringAugmentInfoType() {
  // Specify the type and content to be inspected.
  const byteItem = {
    type: 'BYTES',
    data: Buffer.from(string),
  const item = {byteItem: byteItem};

  // Construct the custom word list to be detected.
  const dictionary = {
    wordList: {
      words: words,

  // Construct a custom infotype detector by augmenting the PERSON_NAME detector with a word list.
  const customInfoType = {
    infoType: {name: 'PERSON_NAME'},
    dictionary: dictionary,

  const inspectConfig = {
    customInfoTypes: [customInfoType],
    includeQuote: true,

  // Construct the Inspect request to be sent by the client.
  const inspectRequest = {
    parent: `projects/${projectId}/locations/global`,
    inspectConfig: inspectConfig,
    item: item,

  // Use the client to send the API request.
  const [response] = await dlp.inspectContent(inspectRequest);

  // Print Findings.
  const findings = response.result.findings;
  if (findings.length > 0) {
    console.log(`Findings: ${findings.length}\n`);
    findings.forEach(finding => {
      console.log(`InfoType: ${}`);
      console.log(`\tQuote: ${finding.quote}`);
      console.log(`\tLikelihood: ${finding.likelihood} \n`);
  } else {
    console.log('No findings.');

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CustomInfoType;
use Google\Cloud\Dlp\V2\CustomInfoType\Dictionary;
use Google\Cloud\Dlp\V2\CustomInfoType\Dictionary\WordList;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\InspectContentRequest;
use Google\Cloud\Dlp\V2\Likelihood;

 * Augment a built-in infotype detector.
 * Consider a scenario in which a built-in infoType detector isn’t returning the correct values.
 * For example, you want to return matches on person names, but Cloud DLP's built-in
 * PERSON_NAME detector is failing to return matches on some person names that are common in your dataset.
 * Cloud DLP allows you to augment built-in infoType detectors by including a built-in detector in the
 * declaration for a custom infoType detector, as shown in the following example. This snippet
 * illustrates how to configure Cloud DLP so that the PERSON_NAME built-in infoType detector will
 * additionally match the name “Quasimodo:”.
 * @param string $projectId         The Google Cloud project id to use as a parent resource.
 * @param string $textToInspect     The string to inspect.
 * @param array  $matchWordList     Specify the set of words to match.
function inspect_augment_infotypes(
    // TODO(developer): Replace sample parameters before running the code.
    string $projectId,
    string $textToInspect = 'Smith and Quasimodo are good cricketer',
    array  $matchWordList = ['quasimodo']
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$projectId/locations/global";

    // Specify what content you want the service to Inspect.
    $item = (new ContentItem())

    // The infoTypes of information to match.
    $personNameInfoType = (new InfoType())

    // Construct the word list to be detected.
    $wordList = (new Dictionary())
        ->setWordList((new WordList())

    // Construct the custom infotype detector.
    $customInfoType = (new CustomInfoType())

    // Construct the configuration for the Inspect request.
    $inspectConfig = (new InspectConfig())

    // Run request.
    $inspectContentRequest = (new InspectContentRequest())
    $response = $dlp->inspectContent($inspectContentRequest);

    // Print the results.
    $findings = $response->getResult()->getFindings();
    if (count($findings) == 0) {
        printf('No findings.' . PHP_EOL);
    } else {
        printf('Findings:' . PHP_EOL);
        foreach ($findings as $finding) {
            printf('  Quote: %s' . PHP_EOL, $finding->getQuote());
            printf('  Info type: %s' . PHP_EOL, $finding->getInfoType()->getName());
            printf('  Likelihood: %s' . PHP_EOL, Likelihood::name($finding->getLikelihood()));

from typing import List


def inspect_string_augment_infotype(
    project: str,
    input_str: str,
    info_type: str,
    word_list: List[str],
) -> None:
    """Uses the Data Loss Prevention API to augment built-in infoType
    detector and inspect the content string with augmented infoType.
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to inspect using augmented infoType
            (will be treated as text).
        info_type: A string representing built-in infoType to augment.
            A full list of infoType categories can be fetched from the API.
        word_list: List of words or phrases to be added to extend the behaviour
            of built-in infoType.

    # Instantiate a client.
    dlp =

    # Construct the custom infoTypes dictionary with declaration of a built-in detector.
    custom_info_types = [
            "info_type": {"name": info_type},
            "dictionary": {"word_list": {"words": word_list}},

    # Construct inspect configuration dictionary with the custom info type.
    inspect_config = {
        "custom_info_types": custom_info_types,
        "include_quote": True,

    # Construct the `item` to be inspected.
    item = {"value": input_str}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}"

    # Call the API.
    response = dlp.inspect_content(
            "parent": parent,
            "inspect_config": inspect_config,
            "item": item,

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            print(f"Quote: {finding.quote}")
            print(f"Info type: {}")
            print(f"Likelihood: {finding.likelihood} \n")
        print("No findings.")


Nächste Schritte

Weitere Informationen zu großen benutzerdefinierten Wörterbüchern