Sensitive Data Protection contains many built-in infoType detectors, but you can also create your own. You can customize detection behavior by defining your own custom infoType detectors, so that Sensitive Data Protection will inspect or de-identify sensitive data that matches patterns that you specify. The following are the types of custom infoType detectors:
- Regular custom dictionary detectors are simple word and phrase lists that Sensitive Data Protection matches on. Use regular custom dictionary detectors when you have at most several hundred thousand words.
- Large custom dictionary detectors are generated by Sensitive Data Protection using large lists of words or phrases stored in either Cloud Storage or BigQuery. Use large custom dictionary detectors when you have a large list of words or phrases—up to tens of millions.
- Regular expression (regex) detectors enable Sensitive Data Protection to detect matches based on a regular expression pattern.
- Surrogate infoType detectors detect output from Sensitive Data Protection
de-identification transformation
CryptoReplaceFfxFpeConfig
. This custom infoType detector is only used with thecontent:reidentify
method to reverse de-identification using format-preserving encryption (FPE) in FFX mode. For this reason, surrogates are not extensively described in these topics. For more information about how and when to use surrogate custom infoType detectors, see Pseudonymization.
In addition, Sensitive Data Protection includes the concept of inspection rules, which enable you to fine-tune scan results using the following:
- Exclusion rules enable you to exclude false or unwanted findings by adding rules to a built-in or custom infoType detector.
- Hotword rules enable you to increase the quantity or accuracy of findings returned by adding rules to a built-in or custom infoType detector.
To learn more about custom infoType detectors, see the InfoTypes and infoType detectors concept page. For several examples that you can use or alter as you see fit, see Examples of custom infoType detectors. The rest of this topic describes how to use Sensitive Data Protection to create your own custom infoType detectors.
Where to use custom infoType detectors
Custom infoType detectors are defined in the
CustomInfoType
object. You specify a CustomInfoType
in the
InspectConfig
object when configuring the following:
- Inspection using
projects.content.inspect
. - Inspection jobs inside
InspectJobConfig
. - Inspection templates inside
InspectTemplate
. - De-identification using
projects.content.deidentify
. - De-identification templates inside
DeidentifyTemplate
. - Re-identification of content that has been de-identified with FPE in FFX
mode using
projects.content.reidentify
. This scenario is specific to surrogate custom infoType detectors.
API overview
The
CustomInfoType
object allows you to create a custom infoType detector for new content or to
fine-tune the results returned by pre-defined infoType detectors.
The CustomInfoType
object is comprised of the following fields, which are set
as described:
"infotype"
: The name of the custom infoType detector, contained in anInfoType
object."likelihood"
: The defaultLikelihood
value to return for this custom infoType detector. You can specify alternateLikelihood
values in"detectionRules"
that will supersede this baseLikelihood
if the finding meets the criteria specified by the rule. If you don't include the"likelihood"
field, the custom infoType detector defaults toVERY_LIKELY
. For more information about likelihood, see the Likelihood concept page."detectionRules"
: A set ofDetectionRule
objects to additionally apply to all findings of this custom infoType detector. This is where you specify hotword rules, asHotwordRule
objects. Rules are applied in the order in which they are specified. This field does not apply toSurrogateType
objects."sensitivityScore"
: TheSensitivityScore
value to return for this custom infoType detector. If you don't include the"sensitivityScore"
field, the custom infoType detector defaults toVERY_LIKELY
.Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.
One of the following fields, depending on the kind of custom infoType detector you're creating:
"dictionary"
: ADictionary
object, which contains a list of words or phrases to search for."regex"
: ARegex
object, which contains a single pattern defining the regular expression."surrogateType"
: ASurrogateType
object, if present, indicates that the custom infoType detector is a surrogate. For more information about how to use surrogate custom infoType detectors, see Pseudonymization."storedType"
: A reference to an existingStoredInfoType
object. This field is required when creating a large custom dictionary detector. Although you can create regular dictionary detectors or regular expression detectors by defining this field, it's simpler to create those by defining thedictionary
field orregex
field respectively.
Next steps
Learn more about creating custom infoTypes from the following topics:
- Creating a regular custom dictionary detector: Learn how to create a custom infoType detector to match findings on a list of words and phrases.
- Creating a large custom dictionary detector: Learn how to match findings on a very large list of words and phrases. Stored custom infoType detectors can match on up to tens of millions of words.
- Creating a custom regex detector: Learn how to create a custom infoType detector to match findings on a regular expression.
- Modifying infoType detectors to refine scan results: Learn how to create modifiers for both built-in and custom infoType detectors that can fine-tune scan results.
- Customizing match likelihood: Learn how to use detection rules and hotwords to customize the likelihood values that are assigned to custom detector matches.
- Examples of custom infoType detectors: Several example JSON custom infoType detector definitions that you can use or alter as you see fit.