Custom infoType detectors

Sensitive Data Protection contains many built-in infoType detectors, but you can also create your own. You can customize detection behavior by defining your own custom infoType detectors, so that Sensitive Data Protection will inspect or de-identify sensitive data that matches patterns that you specify. The following are the types of custom infoType detectors:

  • Regular custom dictionary detectors are simple word and phrase lists that Sensitive Data Protection matches on. Use regular custom dictionary detectors when you have at most several hundred thousand words.
  • Large custom dictionary detectors are generated by Sensitive Data Protection using large lists of words or phrases stored in either Cloud Storage or BigQuery. Use large custom dictionary detectors when you have a large list of words or phrases—up to tens of millions.
  • Regular expression (regex) detectors enable Sensitive Data Protection to detect matches based on a regular expression pattern.
  • Surrogate infoType detectors detect output from Sensitive Data Protection de-identification transformation CryptoReplaceFfxFpeConfig. This custom infoType detector is only used with the content:reidentify method to reverse de-identification using format-preserving encryption (FPE) in FFX mode. For this reason, surrogates are not extensively described in these topics. For more information about how and when to use surrogate custom infoType detectors, see Pseudonymization.

In addition, Sensitive Data Protection includes the concept of inspection rules, which enable you to fine-tune scan results using the following:

  • Exclusion rules enable you to exclude false or unwanted findings by adding rules to a built-in or custom infoType detector.
  • Hotword rules enable you to increase the quantity or accuracy of findings returned by adding rules to a built-in or custom infoType detector.

To learn more about custom infoType detectors, see the InfoTypes and infoType detectors concept page. For several examples that you can use or alter as you see fit, see Examples of custom infoType detectors. The rest of this topic describes how to use Sensitive Data Protection to create your own custom infoType detectors.

Where to use custom infoType detectors

Custom infoType detectors are defined in the CustomInfoType object. You specify a CustomInfoType in the InspectConfig object when configuring the following:

API overview

The CustomInfoType object allows you to create a custom infoType detector for new content or to fine-tune the results returned by pre-defined infoType detectors.

The CustomInfoType object is comprised of the following fields, which are set as described:

  • "infotype": The name of the custom infoType detector, contained in an InfoType object.
  • "likelihood": The default Likelihood value to return for this custom infoType detector. You can specify alternate Likelihood values in "detectionRules" that will supersede this base Likelihood if the finding meets the criteria specified by the rule. If you don't include the "likelihood" field, the custom infoType detector defaults to VERY_LIKELY. For more information about likelihood, see the Likelihood concept page.
  • "detectionRules": A set of DetectionRule objects to additionally apply to all findings of this custom infoType detector. This is where you specify hotword rules, as HotwordRule objects. Rules are applied in the order in which they are specified. This field does not apply to SurrogateType objects.
  • "sensitivityScore": The SensitivityScore value to return for this custom infoType detector. If you don't include the "sensitivityScore" field, the custom infoType detector defaults to VERY_LIKELY.

    Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.

  • One of the following fields, depending on the kind of custom infoType detector you're creating:

    • "dictionary": A Dictionary object, which contains a list of words or phrases to search for.
    • "regex": A Regex object, which contains a single pattern defining the regular expression.
    • "surrogateType": A SurrogateType object, if present, indicates that the custom infoType detector is a surrogate. For more information about how to use surrogate custom infoType detectors, see Pseudonymization.
    • "storedType": A reference to an existing StoredInfoType object. This field is required when creating a large custom dictionary detector. Although you can create regular dictionary detectors or regular expression detectors by defining this field, it's simpler to create those by defining the dictionary field or regex field respectively.

Next steps

Learn more about creating custom infoTypes from the following topics: