Creating custom infoType detectors

Cloud Data Loss Prevention (DLP) contains many built-in infoType detectors, but you can also create your own. You can customize detection behavior by defining your own custom infoType detectors, so that Cloud DLP will inspect and redact sensitive data that matches patterns that you specify. There are three kinds of custom infoType detectors:

  • Regular custom dictionary detectors are simple word and phrase lists that Cloud DLP matches on. Use regular custom dictionary detectors when you have at most several hundred thousand words.
  • Stored custom dictionary detectors are generated by Cloud DLP using large lists of words or phrases stored in either Cloud Storage or BigQuery. Use stored custom dictionary detectors when you have a large list of words or phrases—up to tens of millions.
  • Regular expressions (regex) enable Cloud DLP to detect matches based on a regular expression pattern.

In addition, Cloud DLP includes the concept of inspection rules, which enable you to fine-tune scan results using the following:

  • Exclusion rules enable you to exclude false or unwanted findings by adding rules to a built-in or custom infoType detector.
  • Hotword rules enable you to increase the quantity or accuracy of findings returned by adding rules to a built-in or custom infoType detector.

To learn more about custom infoType detectors, see the InfoTypes and infoType detectors concept page. For several examples that you can use or alter as you see fit, see Examples of custom infoType detectors. The rest of this topic describes how to use Cloud DLP to create your own custom infoType detectors.

Where to use custom infoType detectors

Custom infoType detectors are defined in the CustomInfoType object. You specify a CustomInfoType in the InspectConfig object when configuring the following:

API overview

The CustomInfoType object allows you to create a custom infoType detector for new content or to fine-tune the results returned by pre-defined infoType detectors.

The CustomInfoType object is comprised of the following fields, which are set as described:

  • "infotype": The name of the custom infoType detector, contained in an InfoType object.
  • "likelihood": The default Likelihood value to return for this custom infoType detector. You can specify alternate Likelihood values in "detectionRules" that will supersede this base Likelihood if the finding meets the criteria specified by the rule. If you don't include the "likelihood" field, the custom infoType detector defaults to VERY_LIKELY.
  • "detectionRules": A set of DetectionRule objects to additionally apply to all findings of this custom infoType detector. This is where you specify hotword rules, as HotwordRule objects. Rules are applied in the order in which they are specified. This field does not apply to SurrogateType objects.
  • One of the following fields, depending on the kind of custom infoType detector you're creating:

    • "dictionary": A Dictionary object, which contains a list of words or phrases to search for.
    • "regex": A Regex object, which contains a single pattern defining the regular expression.
    • "surrogateType": A SurrogateType object, if present, indicates that the custom infoType detector is a surrogate. For more information about how to use surrogate custom infoType detectors, see Pseudonymization.

Next steps

Learn more about creating custom infoTypes from the following topics:

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Data Loss Prevention