Modifying infoType detectors to refine scan results

Cloud Data Loss Prevention (DLP)'s built-in infoType detectors are effective at finding common types of sensitive data. Custom infoType detectors enable you to fully customize your own sensitive data detector. Inspection rules help refine the scan results that the Cloud DLP returns by modifying the detection mechanism of a given infoType detector.

If you want to exclude or include more values from the results that are returned by a built-in infoType detector, you can create a new custom infoType from scratch and define all the criteria that Cloud DLP should look for. Alternatively, you can refine the findings that Cloud DLP's built-in or custom detectors return according to criteria that you specify. You can do this by adding inspection rules that can help reduce noise, increase precision and recall, or adjust likelihood certainty of scan findings.

This topic discusses how to use the two types of inspection rules to either exclude certain findings or add additional findings, all according to custom criteria that you specify. Presented in this topic are several scenarios in which you might want to alter an existing infoType detector.

The two types of inspection rules are:

Exclusion rules

Exclusion rules are useful in situations like the following:

  • You want to exclude duplicate scan matches in results that are caused by overlapping infoType detectors. For example, you're scanning for email addresses and phone numbers, but you are receiving two hits for email addresses with phone numbers in them, such as "206-555-0764@example.org."
  • You're experiencing noise in your scan findings. For example, you're seeing the same dummy email address (such as example@example.com") or domain (such as "example.com") returned an inordinate number of times by a scan for legitimate email addresses.
  • You have a list of terms, phrases, or combination of characters that you want to exclude from findings.

Exclusion rules API overview

Cloud DLP defines an exclusion rule in the ExclusionRule object. Within ExclusionRule you specify one of the following:

  • A Dictionary object, which indicates that the exclusion rule is a regular dictionary rule.
  • A Regex object, which indicates that the exclusion rule is a regular expression rule.
  • An ExcludeInfoTypes object, which contains an array of infoType detectors. If a finding is matched by any of the infoType detectors listed here, the finding will be excluded from the scan results.

Exclusion rule example scenarios

Each of the following JSON snippets illustrates how to configure Cloud DLP for the given scenario.

Omit specific email address from EMAIL_ADDRESS detector scan

The following JSON snippet and code in several languages illustrates how to indicate to Cloud DLP using an InspectConfig that it should avoid matching on "example@example.com" in a scan that uses the infoType detector EMAIL_ADDRESS:

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "dictionary":{
                  "wordList":{
                    "words":[
                      "example@example.com"
                    ]
                  }
                },
                "matchingType": "MATCHING_TYPE_FULL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class InspectStringWithExclusionDict {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "Some email addresses: gary@example.com, example@example.com";
    List<String> excludedMatchList = Arrays.asList("example@example.com");
    inspectStringWithExclusionDict(projectId, textToInspect, excludedMatchList);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringWithExclusionDict(
      String projectId, String textToInspect, List<String> excludedMatchList) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
      List<InfoType> infoTypes = new ArrayList<>();
      for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Exclude matches from the specified excludedMatchList.
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setMatchingType(MatchingType.MATCHING_TYPE_FULL_MATCH)
              .setDictionary(
                  Dictionary.newBuilder()
                      .setWordList(WordList.newBuilder().addAllWords(excludedMatchList)))
              .build();

      // Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectStringWithExclusionDict
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect, List<String> excludedMatchList)
    {
        var dlp = DlpServiceClient.Create();

        var byteContentItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem { ByteItem = byteContentItem };

        var infoTypes = new string[] { "PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER" }.Select(it => new InfoType { Name = it });

        var exclusionRule = new ExclusionRule
        {
            MatchingType = MatchingType.FullMatch,
            Dictionary = new CustomInfoType.Types.Dictionary
            {
                WordList = new CustomInfoType.Types.Dictionary.Types.WordList
                {
                    Words = { excludedMatchList }
                }
            }
        };

        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        var config = new InspectConfig
        {
            InfoTypes = { infoTypes },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Omit email addresses ending with a specific domain from EMAIL_ADDRESS detector scan

The following JSON snippet and code in several languages illustrates how to indicate to Cloud DLP using an InspectConfig that it should avoid matching on any email addresses that end with "@example.com" in a scan that uses the infoType detector EMAIL_ADDRESS:

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "regex":{
                  "pattern":".+@example.com"
                },
                "matchingType": "MATCHING_TYPE_FULL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class InspectStringWithExclusionRegex {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "Some email addresses: gary@example.com, bob@example.org";
    String excludedRegex = ".+@example.com";
    inspectStringWithExclusionRegex(projectId, textToInspect, excludedRegex);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringWithExclusionRegex(
      String projectId, String textToInspect, String excludedRegex) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
      List<InfoType> infoTypes = new ArrayList<>();
      for (String typeName : new String[] {"PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Exclude matches from the specified excludedMatchList.
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setMatchingType(MatchingType.MATCHING_TYPE_FULL_MATCH)
              .setRegex(Regex.newBuilder().setPattern(excludedRegex))
              .build();

      // Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectStringWithExclusionRegex
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect, string excludedRegex)
    {
        var dlp = DlpServiceClient.Create();

        var byteContentItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem { ByteItem = byteContentItem };

        var infoTypes = new string[] { "PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER" }.Select(it => new InfoType { Name = it });

        var exclusionRule = new ExclusionRule
        {
            MatchingType = MatchingType.FullMatch,
            Regex = new CustomInfoType.Types.Regex { Pattern = excludedRegex }
        };

        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        var config = new InspectConfig
        {
            InfoTypes = { infoTypes },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Omit scan matches that include the substring "TEST"

The following JSON snippet and code in several languages illustrates how to indicate to Cloud DLP using an InspectConfig that it should exclude any findings that include the token "TEST" from the specified list of infoTypes.

Note that this matches on "TEST" as a token, not a substring, so that although something like "TEST@email.com" will match, "TESTER@email.com" will not. If matching on a substring is desired, use a regex in the exclusion rule instead of a dictionary.

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            },
            {
              "name":"DOMAIN_NAME"
            },
            {
              "name":"PHONE_NUMBER"
            },
            {
              "name":"PERSON_NAME"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "dictionary":{
                  "wordList":{
                    "words":[
                      "TEST"
                    ]
                  }
                },
                "matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class InspectStringWithExclusionDictSubstring {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "Some email addresses: gary@example.com, TEST@example.com";
    List<String> excludedSubstringList = Arrays.asList("TEST");
    inspectStringWithExclusionDictSubstring(projectId, textToInspect, excludedSubstringList);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringWithExclusionDictSubstring(
      String projectId, String textToInspect, List<String> excludedSubstringList)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
      List<InfoType> infoTypes = new ArrayList<>();
      for (String typeName :
          new String[] {"EMAIL_ADDRESS", "DOMAIN_NAME", "PHONE_NUMBER", "PERSON_NAME"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Exclude partial matches from the specified excludedSubstringList.
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
              .setDictionary(
                  Dictionary.newBuilder()
                      .setWordList(WordList.newBuilder().addAllWords(excludedSubstringList)))
              .build();

      // Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addAllInfoTypes(infoTypes)
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using System.Collections.Generic;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectStringWithExclusionDictSubstring
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect, List<String> excludedSubstringList)
    {
        var dlp = DlpServiceClient.Create();

        var byteItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem { ByteItem = byteItem };

        var infoTypes = new string[]
        {
            "EMAIL_ADDRESS",
            "DOMAIN_NAME",
            "PHONE_NUMBER",
            "PERSON_NAME"
        }.Select(it => new InfoType { Name = it });

        var exclusionRule = new ExclusionRule
        {
            MatchingType = MatchingType.PartialMatch,
            Dictionary = new CustomInfoType.Types.Dictionary
            {
                WordList = new CustomInfoType.Types.Dictionary.Types.WordList
                {
                    Words = { excludedSubstringList }
                }
            }
        };

        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { infoTypes },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        var config = new InspectConfig
        {
            InfoTypes = { infoTypes },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Omit scan matches that include the substring "Jimmy" from a custom infoType detector scan

The following JSON snippet and code in several languages illustrates how to indicate to Cloud DLP using an InspectConfig that it should avoid matching on the name "Jimmy" in a scan that uses the specified custom regex detector:

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "customInfoTypes":[
        {
          "infoType":{
            "name":"CUSTOM_NAME_DETECTOR"
          },
          "regex":{
            "pattern":"[A-Z][a-z]{1,15}, [A-Z][a-z]{1,15}"
          }
        }
      ],
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"CUSTOM_NAME_DETECTOR"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "dictionary":{
                  "wordList":{
                    "words":[
                      "Jimmy"
                    ]
                  }
                },
                "matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class InspectStringCustomExcludingSubstring {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "Name: Doe, John. Name: Example, Jimmy";
    String customDetectorPattern = "[A-Z][a-z]{1,15}, [A-Z][a-z]{1,15}";
    List<String> excludedSubstringList = Arrays.asList("Jimmy");
    inspectStringCustomExcludingSubstring(
        projectId, textToInspect, customDetectorPattern, excludedSubstringList);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringCustomExcludingSubstring(
      String projectId,
      String textToInspect,
      String customDetectorPattern,
      List<String> excludedSubstringList)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      InfoType infoType = InfoType.newBuilder().setName("CUSTOM_NAME_DETECTOR").build();
      CustomInfoType customInfoType =
          CustomInfoType.newBuilder()
              .setInfoType(infoType)
              .setRegex(Regex.newBuilder().setPattern(customDetectorPattern))
              .build();

      // Exclude partial matches from the specified excludedSubstringList.
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
              .setDictionary(
                  Dictionary.newBuilder()
                      .setWordList(WordList.newBuilder().addAllWords(excludedSubstringList)))
              .build();

      // Construct a ruleset that applies the exclusion rule to the EMAIL_ADDRESSES infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(infoType)
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addCustomInfoTypes(customInfoType)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectStringCustomExcludingSubstring
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect, string customDetectorPattern, List<String> excludedSubstringList)
    {
        var dlp = DlpServiceClient.Create();

        var byteContentItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem { ByteItem = byteContentItem };

        var infoType = new InfoType
        {
            Name = "CUSTOM_NAME_DETECTOR"
        };
        var customInfoType = new CustomInfoType
        {
            InfoType = infoType,
            Regex = new CustomInfoType.Types.Regex { Pattern = customDetectorPattern }
        };

        var exclusionRule = new ExclusionRule
        {
            MatchingType = MatchingType.PartialMatch,
            Dictionary = new CustomInfoType.Types.Dictionary
            {
                WordList = new CustomInfoType.Types.Dictionary.Types.WordList
                {
                    Words = { excludedSubstringList }
                }
            }
        };

        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { infoType },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        var config = new InspectConfig
        {
            CustomInfoTypes = { customInfoType },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Omit scan matches from a PERSON_NAME detector scan that overlap with a custom detector

In this scenario, the user does not want a match from a Cloud DLP scan using the PERSON_NAME built-in detector returned if the match would also be matched in a scan using the custom regex detector defined in the first part of the snippet.

The following JSON snippet and code in several languages specifies both a custom regex detector and an exclusion rule in the InspectConfig. The custom regex detector specifies the names to exclude from results. The exclusion rule specifies that if any results returned from a scan for PERSON_NAME would also be matched by the custom regex detector, they are omitted. Note that VIP_DETECTOR in this case is marked as EXCLUSION_TYPE_EXCLUDE, so it will not produce results itself. It will only affect results produced by the PERSON_NAME detector.

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "customInfoTypes":[
        {
          "infoType":{
            "name":"VIP_DETECTOR"
          },
          "regex":{
            "pattern":"Larry Page|Sergey Brin"
          },
          "exclusionType":"EXCLUSION_TYPE_EXCLUDE"
        }
      ],
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"PERSON_NAME"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "excludeInfoTypes":{
                  "infoTypes":[
                    {
                      "name":"VIP_DETECTOR"
                    }
                  ]
                },
                "matchingType": "MATCHING_TYPE_FULL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.CustomInfoType.ExclusionType;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExcludeInfoTypes;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;

public class InspectStringCustomOmitOverlap {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "Name: Jane Doe. Name: Larry Page.";
    inspectStringCustomOmitOverlap(projectId, textToInspect);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringCustomOmitOverlap(String projectId, String textToInspect)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Construct the custom infotype.
      CustomInfoType customInfoType =
          CustomInfoType.newBuilder()
              .setInfoType(InfoType.newBuilder().setName("VIP_DETECTOR"))
              .setRegex(Regex.newBuilder().setPattern("Larry Page|Sergey Brin"))
              .setExclusionType(ExclusionType.EXCLUSION_TYPE_EXCLUDE)
              .build();

      // Exclude matches that also match the custom infotype.
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setExcludeInfoTypes(
                  ExcludeInfoTypes.newBuilder().addInfoTypes(customInfoType.getInfoType()))
              .setMatchingType(MatchingType.MATCHING_TYPE_FULL_MATCH)
              .build();

      // Construct a ruleset that applies the exclusion rule to the PERSON_NAME infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
              .addCustomInfoTypes(customInfoType)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;

public class InspectStringCustomOmitOverlap
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect)
    {
        var dlp = DlpServiceClient.Create();

        var byteItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem { ByteItem = byteItem };

        var customInfoType = new CustomInfoType
        {
            InfoType = new InfoType { Name = "VIP_DETECTOR" },
            Regex = new CustomInfoType.Types.Regex { Pattern = "Larry Page|Sergey Brin" },
            ExclusionType = ExclusionType.Exclude
        };

        var exclusionRule = new ExclusionRule
        {
            ExcludeInfoTypes = new ExcludeInfoTypes { InfoTypes = { customInfoType.InfoType } },
            MatchingType = MatchingType.FullMatch
        };

        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { new InfoType { Name = "PERSON_NAME" } },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        var config = new InspectConfig
        {
            InfoTypes = { new InfoType { Name = "PERSON_NAME" } },
            CustomInfoTypes = { customInfoType },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Omit matches on PERSON_NAME detector if also matched by EMAIL_ADDRESS detector

The following JSON snippet and code in several languages illustrate how to indicate to Cloud DLP using an InspectConfig that it should only return one match in the case that matches for the PERSON_NAME detector overlap with matches for the EMAIL_ADDRESS detector. Doing this is to avoid the situation where an email address such as "james@example.com" matches on both the PERSON_NAME and EMAIL_ADDRESS detectors.

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"PERSON_NAME"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "excludeInfoTypes":{
                  "infoTypes":[
                    {
                      "name":"EMAIL_ADDRESS"
                    }
                  ]
                },
                "matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Python

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.

def omit_name_if_also_email(
    project, content_string,
):
    """Marches PERSON_NAME and EMAIL_ADDRESS, but not both.

    Uses the Data Loss Prevention API omit matches on PERSON_NAME if the
    EMAIL_ADDRESS detector also matches.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to inspect.

    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct a list of infoTypes for DLP to locate in `content_string`. See
    # https://cloud.google.com/dlp/docs/concepts-infotypes for more information
    # about supported infoTypes.
    info_types_to_locate = [{"name": "PERSON_NAME"}, {"name": "EMAIL_ADDRESS"}]

    # Construct the configuration dictionary that will only match on PERSON_NAME
    # if the EMAIL_ADDRESS doesn't also match. This configuration helps reduce
    # the total number of findings when there is a large overlap between different
    # infoTypes.
    inspect_config = {
        "info_types": info_types_to_locate,
        "rule_set": [
            {
                "info_types": [{"name": "PERSON_NAME"}],
                "rules": [
                    {
                        "exclusion_rule": {
                            "exclude_info_types": {
                                "info_types": [{"name": "EMAIL_ADDRESS"}]
                            },
                            "matching_type": google.cloud.dlp_v2.MatchingType.MATCHING_TYPE_PARTIAL_MATCH,
                        }
                    }
                ],
            }
        ],
    }

    # Construct the `item`.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}"

    # Call the API.
    response = dlp.inspect_content(
        request={"parent": parent, "inspect_config": inspect_config, "item": item}
    )

    return [f.info_type.name for f in response.result.findings]

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.ExcludeInfoTypes;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class InspectStringOmitOverlap {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "james@example.com";
    inspectStringOmitOverlap(projectId, textToInspect);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringOmitOverlap(String projectId, String textToInspect)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
      List<InfoType> infoTypes = new ArrayList<>();
      for (String typeName : new String[] {"PERSON_NAME", "EMAIL_ADDRESS"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Exclude EMAIL_ADDRESS matches
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setExcludeInfoTypes(
                  ExcludeInfoTypes.newBuilder()
                      .addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS")))
              .setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
              .build();

      // Construct a ruleset that applies the exclusion rule to the PERSON_NAME infotype.
      // If a PERSON_NAME match overlaps with an EMAIL_ADDRESS match, the PERSON_NAME match will
      // be excluded.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class InspectStringOmitOverlap
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect)
    {
        // Initialize client that will be used to send requests. This client only needs to be created
        // once, and can be reused for multiple requests. After completing all of your requests, call
        // the "close" method on the client to safely clean up any remaining background resources.
        var dlp = DlpServiceClient.Create();

        // Specify the type and content to be inspected.
        var byteItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };
        var contentItem = new ContentItem { ByteItem = byteItem };

        // Specify the type of info the inspection will look for.
        // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
        var infoTypes = new string[] { "PERSON_NAME", "EMAIL_ADDRESS" }.Select(it => new InfoType { Name = it });

        // Exclude EMAIL_ADDRESS matches
        var exclusionRule = new ExclusionRule
        {
            ExcludeInfoTypes = new ExcludeInfoTypes { InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } } },
            MatchingType = MatchingType.PartialMatch
        };

        // Construct a ruleset that applies the exclusion rule to the PERSON_NAME infotype.
        // If a PERSON_NAME match overlaps with an EMAIL_ADDRESS match, the PERSON_NAME match will
        // be excluded.
        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { new InfoType { Name = "PERSON_NAME" } },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        // Construct the configuration for the Inspect request, including the ruleset.
        var config = new InspectConfig
        {
            InfoTypes = { infoTypes },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        // Construct the Inspect request to be sent by the client.
        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        // Use the client to send the API request.
        var response = dlp.InspectContent(request);

        // Parse the response and process results
        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Omit matches on domain names that are part of email addresses in a DOMAIN_NAME detector scan

The following JSON snippet and code in several languages illustrate how to indicate to Cloud DLP using an InspectConfig that it should only return matches for a DOMAIN_NAME detector scan if the match does not overlap with a match in an EMAIL_ADDRESS detector scan. In this scenario, the main scan is a DOMAIN_NAME detector scan. The user does not want a domain name match returned in findings if the domain name is used in an email address:

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
    "inspectConfig":{
      "infoTypes":[
        {
          "name":"DOMAIN_NAME"
        },
        {
          "name":"EMAIL_ADDRESS"
        }
      ],
      "customInfoTypes":[
        {
          "infoType":{
            "name":"EMAIL_ADDRESS"
          },
          "exclusionType":"EXCLUSION_TYPE_EXCLUDE"
        }
      ],
      "ruleSet":[
        {
          "infoTypes":[
            {
              "name":"DOMAIN_NAME"
            }
          ],
          "rules":[
            {
              "exclusionRule":{
                "excludeInfoTypes":{
                  "infoTypes":[
                    {
                      "name":"EMAIL_ADDRESS"
                    }
                  ]
                },
                "matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
              }
            }
          ]
        }
      ]
    }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType;
import com.google.privacy.dlp.v2.CustomInfoType.ExclusionType;
import com.google.privacy.dlp.v2.ExcludeInfoTypes;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class InspectStringWithoutOverlap {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "example.com is a domain, james@example.org is an email.";
    inspectStringWithoutOverlap(projectId, textToInspect);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringWithoutOverlap(String projectId, String textToInspect)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
      List<InfoType> infoTypes = new ArrayList<>();
      for (String typeName : new String[] {"DOMAIN_NAME", "EMAIL_ADDRESS"}) {
        infoTypes.add(InfoType.newBuilder().setName(typeName).build());
      }

      // Define a custom info type to exclude email addresses
      CustomInfoType customInfoType =
          CustomInfoType.newBuilder()
              .setInfoType(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
              .setExclusionType(ExclusionType.EXCLUSION_TYPE_EXCLUDE)
              .build();

      // Exclude EMAIL_ADDRESS matches
      ExclusionRule exclusionRule =
          ExclusionRule.newBuilder()
              .setExcludeInfoTypes(
                  ExcludeInfoTypes.newBuilder()
                      .addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS")))
              .setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
              .build();

      // Construct a ruleset that applies the exclusion rule to the DOMAIN_NAME infotype.
      // If a DOMAIN_NAME match is part of an EMAIL_ADDRESS match, the DOMAIN_NAME match will
      // be excluded.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("DOMAIN_NAME"))
              .addRules(InspectionRule.newBuilder().setExclusionRule(exclusionRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .addCustomInfoTypes(customInfoType)
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;

public class InspectStringWithoutOverlap
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect)
    {
        // Initialize client that will be used to send requests. This client only needs to be created
        // once, and can be reused for multiple requests. After completing all of your requests, call
        // the "close" method on the client to safely clean up any remaining background resources.
        var dlp = DlpServiceClient.Create();

        // Specify the type and content to be inspected.
        var byteContentItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };
        var contentItem = new ContentItem
        {
            ByteItem = byteContentItem
        };

        // Specify the type of info the inspection will look for.
        // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
        var infoTypes = new string[] { "DOMAIN_NAME", "EMAIL_ADDRESS" }.Select(it => new InfoType { Name = it });

        // Define a custom info type to exclude email addresses
        var customInfoType = new CustomInfoType
        {
            InfoType = new InfoType { Name = "EMAIL_ADDRESS" },
            ExclusionType = ExclusionType.Exclude
        };

        // Exclude EMAIL_ADDRESS matches
        var exclusionRule = new ExclusionRule
        {
            ExcludeInfoTypes = new ExcludeInfoTypes
            {
                InfoTypes = { new InfoType { Name = "EMAIL_ADDRESS" } }
            },
            MatchingType = MatchingType.PartialMatch
        };

        // Construct a ruleset that applies the exclusion rule to the DOMAIN_NAME infotype.
        // If a DOMAIN_NAME match is part of an EMAIL_ADDRESS match, the DOMAIN_NAME match will
        // be excluded.
        var ruleSet = new InspectionRuleSet
        {
            InfoTypes = { new InfoType { Name = "DOMAIN_NAME" } },
            Rules = { new InspectionRule { ExclusionRule = exclusionRule } }
        };

        // Construct the configuration for the Inspect request, including the ruleset.
        var config = new InspectConfig
        {
            InfoTypes = { infoTypes },
            CustomInfoTypes = { customInfoType },
            IncludeQuote = true,
            RuleSet = { ruleSet }
        };

        // Construct the Inspect request to be sent by the client.
        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = config
        };

        // Use the client to send the API request.
        var response = dlp.InspectContent(request);

        return response;
    }
}

Hotword rules

Hotword rules are useful in situations such as the following:

  • You want to change likelihood values assigned to scan matches based on the match's proximity to a hotword. For example, you want to set the likelihood value higher for matches on patient names depending on the names' proximity to the word "patient."
  • When inspecting structured, tabular data, you want to change likelihood values assigned to matches based on a column header name. For example, you want to set the likelihood value higher for US_SOCIAL_SECURITY_NUMBER when found in a column with header ACCOUNT_ID.

Hotword rules API overview

Within Cloud DLP's InspectionRule object, you specify a HotwordRule object, which adjusts the likelihood of findings within a certain proximity of hotwords.

InspectionRule objects are grouped as a "rule set" in an InspectionRuleSet object, along with a list of infoType detectors the rule set applies to. Rules within a rule set are applied in the order specified.

Hotword rule example scenarios

The following code snippet illustrates how to configure Cloud DLP for the given scenario.

Increase the likelihood of a PERSON_NAME match if there is the hotword "patient" nearby

The following JSON snippet and code in several languages illustrate using the InspectConfig property for the purpose of scanning a medical database for patient names. You can use Cloud DLP's built-in PERSON_NAME infoType detector, but that will cause Cloud DLP to match on all names of people, not just names of patients. To fix this, you can include a hotword rule that looks for the word "patient" within a certain character proximity from the first character of potential matches. You can then assign findings matching this pattern a likelihood of "very likely," since they correspond to your special criteria. Setting the minimum Likelihood to VERY_LIKELY within InspectConfig ensures that only matches to this configuration are returned in findings.

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
  "inspectConfig":{
    "ruleSet":[
      {
        "infoTypes":[
          {
            "name":"PERSON_NAME"
          }
        ],
        "rules":[
          {
            "hotwordRule":{
              "hotwordRegex":{
                "pattern":"patient"
              },
              "proximity":{
                "windowBefore":50
              },
              "likelihoodAdjustment":{
                "fixedLikelihood":"VERY_LIKELY"
              }
            }
          }
        ]
      }
    ],
    "minLikelihood":"VERY_LIKELY"
  }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.HotwordRule;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.LikelihoodAdjustment;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.Proximity;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.Likelihood;
import com.google.privacy.dlp.v2.LocationName;
import com.google.protobuf.ByteString;
import java.io.IOException;

public class InspectStringCustomHotword {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "patient name: John Doe";
    String customHotword = "patient";
    inspectStringCustomHotword(projectId, textToInspect, customHotword);
  }

  // Inspects the provided text.
  public static void inspectStringCustomHotword(
      String projectId, String textToInspect, String customHotword) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Increase likelihood of matches that have customHotword nearby
      HotwordRule hotwordRule =
          HotwordRule.newBuilder()
              .setHotwordRegex(Regex.newBuilder().setPattern(customHotword))
              .setProximity(Proximity.newBuilder().setWindowBefore(50))
              .setLikelihoodAdjustment(
                  LikelihoodAdjustment.newBuilder().setFixedLikelihood(Likelihood.VERY_LIKELY))
              .build();

      // Construct a ruleset that applies the hotword rule to the PERSON_NAME infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME").build())
              .addRules(InspectionRule.newBuilder().setHotwordRule(hotwordRule))
              .build();

      // Construct the configuration for the Inspect request.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME").build())
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .setMinLikelihood(Likelihood.VERY_LIKELY)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using System;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;

public class InspectStringCustomHotword
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect, string customHotword)
    {
        var dlp = DlpServiceClient.Create();

        var byteContentItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem
        {
            ByteItem = byteContentItem
        };

        var hotwordRule = new DetectionRule.Types.HotwordRule
        {
            HotwordRegex = new Regex { Pattern = customHotword },
            Proximity = new DetectionRule.Types.Proximity { WindowBefore = 50 },
            LikelihoodAdjustment = new DetectionRule.Types.LikelihoodAdjustment { FixedLikelihood = Likelihood.VeryLikely }
        };

        var infoType = new InfoType { Name = "PERSON_NAME" };

        var inspectionRuleSet = new InspectionRuleSet
        {
            InfoTypes = { infoType },
            Rules = { new InspectionRule { HotwordRule = hotwordRule } }
        };

        var inspectConfig = new InspectConfig
        {
            InfoTypes = { infoType },
            IncludeQuote = true,
            RuleSet = { inspectionRuleSet },
            MinLikelihood = Likelihood.VeryLikely
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = inspectConfig
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Python

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.

def inspect_with_person_name_w_custom_hotword(
    project, content_string, custom_hotword="patient"
):
    """Uses the Data Loss Prevention API increase likelihood for matches on
       PERSON_NAME if the user specified custom hotword is present. Only
       includes findings with the increased likelihood by setting a minimum
       likelihood threshold of VERY_LIKELY.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        content_string: The string to inspect.
        custom_hotword: The custom hotword used for likelihood boosting.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library.
    import google.cloud.dlp

    # Instantiate a client.
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Construct a rule set with caller provided hotword, with a likelihood
    # boost to VERY_LIKELY when the hotword are present within the 50 character-
    # window preceding the PII finding.
    hotword_rule = {
        "hotword_regex": {"pattern": custom_hotword},
        "likelihood_adjustment": {
            "fixed_likelihood": google.cloud.dlp_v2.Likelihood.VERY_LIKELY
        },
        "proximity": {"window_before": 50},
    }

    rule_set = [
        {
            "info_types": [{"name": "PERSON_NAME"}],
            "rules": [{"hotword_rule": hotword_rule}],
        }
    ]

    # Construct the configuration dictionary with the custom regex info type.
    inspect_config = {
        "rule_set": rule_set,
        "min_likelihood": google.cloud.dlp_v2.Likelihood.VERY_LIKELY,
    }

    # Construct the `item`.
    item = {"value": content_string}

    # Convert the project id into a full resource id.
    parent = f"projects/{project}"

    # Call the API.
    response = dlp.inspect_content(
        request={"parent": parent, "inspect_config": inspect_config, "item": item}
    )

    # Print out the results.
    if response.result.findings:
        for finding in response.result.findings:
            try:
                if finding.quote:
                    print(f"Quote: {finding.quote}")
            except AttributeError:
                pass
            print(f"Info type: {finding.info_type.name}")
            print(f"Likelihood: {finding.likelihood}")
    else:
        print("No findings.")

For more detailed information about hotwords, see Customizing match likelihood.

Multiple inspection rules scenario

The following InspectConfig JSON snippet and code in several languages illustrate applying both exclusion and hotword rules. This snippet's rule set includes both hotword rules and dictionary and regex exclusion rules. Notice that the four rules are specified in an array within the rules element.

Protocol

See the JSON quickstart for more information about using the Cloud DLP API with JSON.

...
  "inspectConfig":{
    "ruleSet":[
      {
        "infoTypes":[
          {
            "name":"PERSON_NAME"
          }
        ],
        "rules":[
          {
            "hotwordRule":{
              "hotwordRegex":{
                "pattern":"patient"
              },
              "proximity":{
                "windowBefore":10
              },
              "likelihoodAdjustment":{
                "fixedLikelihood":"VERY_LIKELY"
              }
            }
          },
          {
            "hotwordRule":{
              "hotwordRegex":{
                "pattern":"doctor"
              },
              "proximity":{
                "windowBefore":10
              },
              "likelihoodAdjustment":{
                "fixedLikelihood":"UNLIKELY"
              }
            }
          },
          {
            "exclusionRule":{
              "dictionary":{
                "wordList":{
                  "words":[
                    "Quasimodo"
                  ]
                }
              },
              "matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
            }
          },
          {
            "exclusionRule":{
              "regex":{
                "pattern":"REDACTED"
              },
              "matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
            }
          }
        ]
      }
    ]
  }
...

Java

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ByteContentItem;
import com.google.privacy.dlp.v2.ByteContentItem.BytesType;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.HotwordRule;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.LikelihoodAdjustment;
import com.google.privacy.dlp.v2.CustomInfoType.DetectionRule.Proximity;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary;
import com.google.privacy.dlp.v2.CustomInfoType.Dictionary.WordList;
import com.google.privacy.dlp.v2.CustomInfoType.Regex;
import com.google.privacy.dlp.v2.ExclusionRule;
import com.google.privacy.dlp.v2.Finding;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectContentRequest;
import com.google.privacy.dlp.v2.InspectContentResponse;
import com.google.privacy.dlp.v2.InspectionRule;
import com.google.privacy.dlp.v2.InspectionRuleSet;
import com.google.privacy.dlp.v2.Likelihood;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.MatchingType;
import com.google.protobuf.ByteString;
import java.io.IOException;

public class InspectStringMultipleRules {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "patient: Jane Doe";
    inspectStringMultipleRules(projectId, textToInspect);
  }

  // Inspects the provided text, avoiding matches specified in the exclusion list.
  public static void inspectStringMultipleRules(String projectId, String textToInspect)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the type and content to be inspected.
      ByteContentItem byteItem =
          ByteContentItem.newBuilder()
              .setType(BytesType.TEXT_UTF8)
              .setData(ByteString.copyFromUtf8(textToInspect))
              .build();
      ContentItem item = ContentItem.newBuilder().setByteItem(byteItem).build();

      // Construct hotword rules
      HotwordRule patientRule =
          HotwordRule.newBuilder()
              .setHotwordRegex(Regex.newBuilder().setPattern("patient"))
              .setProximity(Proximity.newBuilder().setWindowBefore(10))
              .setLikelihoodAdjustment(
                  LikelihoodAdjustment.newBuilder().setFixedLikelihood(Likelihood.VERY_LIKELY))
              .build();

      HotwordRule doctorRule =
          HotwordRule.newBuilder()
              .setHotwordRegex(Regex.newBuilder().setPattern("doctor"))
              .setProximity(Proximity.newBuilder().setWindowBefore(10))
              .setLikelihoodAdjustment(
                  LikelihoodAdjustment.newBuilder().setFixedLikelihood(Likelihood.UNLIKELY))
              .build();

      // Construct exclusion rules
      ExclusionRule quasimodoRule =
          ExclusionRule.newBuilder()
              .setDictionary(
                  Dictionary.newBuilder().setWordList(WordList.newBuilder().addWords("Quasimodo")))
              .setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
              .build();

      ExclusionRule redactedRule =
          ExclusionRule.newBuilder()
              .setRegex(Regex.newBuilder().setPattern("REDACTED"))
              .setMatchingType(MatchingType.MATCHING_TYPE_PARTIAL_MATCH)
              .build();

      // Construct a ruleset that applies the rules to the PERSON_NAME infotype.
      InspectionRuleSet ruleSet =
          InspectionRuleSet.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
              .addRules(InspectionRule.newBuilder().setHotwordRule(patientRule))
              .addRules(InspectionRule.newBuilder().setHotwordRule(doctorRule))
              .addRules(InspectionRule.newBuilder().setExclusionRule(quasimodoRule))
              .addRules(InspectionRule.newBuilder().setExclusionRule(redactedRule))
              .build();

      // Construct the configuration for the Inspect request, including the ruleset.
      InspectConfig config =
          InspectConfig.newBuilder()
              .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
              .setIncludeQuote(true)
              .addRuleSet(ruleSet)
              .build();

      // Construct the Inspect request to be sent by the client.
      InspectContentRequest request =
          InspectContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setInspectConfig(config)
              .build();

      // Use the client to send the API request.
      InspectContentResponse response = dlp.inspectContent(request);

      // Parse the response and process results
      System.out.println("Findings: " + response.getResult().getFindingsCount());
      for (Finding f : response.getResult().getFindingsList()) {
        System.out.println("\tQuote: " + f.getQuote());
        System.out.println("\tInfo type: " + f.getInfoType().getName());
        System.out.println("\tLikelihood: " + f.getLikelihood());
      }
    }
  }
}

C#

To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.


using System;
using System.Text.RegularExpressions;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using static Google.Cloud.Dlp.V2.CustomInfoType.Types;

public class InspectStringMultipleRules
{
    public static InspectContentResponse Inspect(string projectId, string textToInspect)
    {
        var dlp = DlpServiceClient.Create();

        var byteContentItem = new ByteContentItem
        {
            Type = ByteContentItem.Types.BytesType.TextUtf8,
            Data = Google.Protobuf.ByteString.CopyFromUtf8(textToInspect)
        };

        var contentItem = new ContentItem
        {
            ByteItem = byteContentItem
        };

        var patientRule = new DetectionRule.Types.HotwordRule
        {
            HotwordRegex = new CustomInfoType.Types.Regex { Pattern = "patient" },
            Proximity = new DetectionRule.Types.Proximity { WindowBefore = 10 },
            LikelihoodAdjustment = new DetectionRule.Types.LikelihoodAdjustment { FixedLikelihood = Likelihood.VeryLikely }
        };

        var doctorRule = new DetectionRule.Types.HotwordRule
        {
            HotwordRegex = new CustomInfoType.Types.Regex { Pattern = "doctor" },
            Proximity = new DetectionRule.Types.Proximity { WindowBefore = 10 },
            LikelihoodAdjustment = new DetectionRule.Types.LikelihoodAdjustment { FixedLikelihood = Likelihood.Unlikely }
        };

        // Construct exclusion rules
        var quasimodoRule = new ExclusionRule
        {
            Dictionary = new Dictionary { WordList = new Dictionary.Types.WordList { Words = { "Quasimodo" } } },
            MatchingType = MatchingType.PartialMatch
        };

        var redactedRule = new ExclusionRule
        {
            Regex = new CustomInfoType.Types.Regex { Pattern = "REDACTED" },
            MatchingType = MatchingType.PartialMatch
        };

        var infoType = new InfoType { Name = "PERSON_NAME" };

        var inspectionRuleSet = new InspectionRuleSet
        {
            InfoTypes = { infoType },
            Rules =
            {
                new InspectionRule { HotwordRule = patientRule },
                new InspectionRule { HotwordRule = doctorRule},
                new InspectionRule { ExclusionRule = quasimodoRule },
                new InspectionRule { ExclusionRule = redactedRule }
            }
        };

        var inspectConfig = new InspectConfig
        {
            InfoTypes = { infoType },
            IncludeQuote = true,
            RuleSet = { inspectionRuleSet }
        };

        var request = new InspectContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            Item = contentItem,
            InspectConfig = inspectConfig
        };

        var response = dlp.InspectContent(request);

        Console.WriteLine($"Findings: {response.Result.Findings.Count}");
        foreach (var f in response.Result.Findings)
        {
            Console.WriteLine("\tQuote: " + f.Quote);
            Console.WriteLine("\tInfo type: " + f.InfoType.Name);
            Console.WriteLine("\tLikelihood: " + f.Likelihood);
        }

        return response;
    }
}

Overlapping infoType detectors

It is possible to define a custom infoType detector that has the same name as a built-in infoType detector. As shown in the example in the "Hotword rule example scenarios" section, when you create a custom infoType detector with the same name as a built-in infoType, any findings detected by the new infoType detector are added to those detected by the built-in detector. This is only true as long as the built-in infoType is specified in the list of infoTypes in the InspectConfig object.

When creating new custom infoType detectors, test them thoroughly on example content to ensure they work as you intend.