Cloud Data Loss Prevention contains many built-in infoType detectors, but you can also create your own. You can customize detection behavior by defining your own custom infoType detectors, so that Cloud DLP will inspect or de-identify sensitive data that matches patterns that you specify. The following are the types of custom infoType detectors:
- Regular custom dictionary detectors are simple word and phrase lists that Cloud DLP matches on. Use regular custom dictionary detectors when you have at most several hundred thousand words.
- Large custom dictionary detectors are generated by Cloud DLP using large lists of words or phrases stored in either Cloud Storage or BigQuery. Use large custom dictionary detectors when you have a large list of words or phrases—up to tens of millions.
- Regular expression (regex) detectors enable Cloud DLP to detect matches based on a regular expression pattern.
- Surrogate infoType detectors detect output from Cloud DLP
CryptoReplaceFfxFpeConfig. This custom infoType detector is only used with the
content:reidentifymethod to reverse de-identification using format-preserving encryption (FPE) in FFX mode. For this reason, surrogates are not extensively described in these topics. For more information about how and when to use surrogate custom infoType detectors, see Pseudonymization.
In addition, Cloud DLP includes the concept of inspection rules, which enable you to fine-tune scan results using the following:
- Exclusion rules enable you to exclude false or unwanted findings by adding rules to a built-in or custom infoType detector.
- Hotword rules enable you to increase the quantity or accuracy of findings returned by adding rules to a built-in or custom infoType detector.
To learn more about custom infoType detectors, see the InfoTypes and infoType detectors concept page. For several examples that you can use or alter as you see fit, see Examples of custom infoType detectors. The rest of this topic describes how to use Cloud DLP to create your own custom infoType detectors.
Where to use custom infoType detectors
Custom infoType detectors are defined in the
object. You specify a
CustomInfoType in the
object when configuring the following:
- Inspection using
- Inspection jobs inside
- Inspection templates inside
- De-identification using
- De-identification templates inside
- Re-identification of content that has been de-identified with FPE in FFX
projects.content.reidentify. This scenario is specific to surrogate custom infoType detectors.
object allows you to create a custom infoType detector for new content or to
fine-tune the results returned by pre-defined infoType detectors.
CustomInfoType object is comprised of the following fields, which are set
"infotype": The name of the custom infoType detector, contained in an
"likelihood": The default
Likelihoodvalue to return for this custom infoType detector. You can specify alternate
"detectionRules"that will supersede this base
Likelihoodif the finding meets the criteria specified by the rule. If you don't include the
"likelihood"field, the custom infoType detector defaults to
VERY_LIKELY. For more information about likelihood, see the Likelihood concept page.
"detectionRules": A set of
DetectionRuleobjects to additionally apply to all findings of this custom infoType detector. This is where you specify hotword rules, as
HotwordRuleobjects. Rules are applied in the order in which they are specified. This field does not apply to
SensitivityScorevalue to return for this custom infoType detector. If you don't include the
"sensitivityScore"field, the custom infoType detector defaults to
Sensitivity scores are used in data profiles. When profiling your data, Cloud DLP uses the sensitivity scores of the infoTypes to calculate the sensitivity level.
One of the following fields, depending on the kind of custom infoType detector you're creating:
Dictionaryobject, which contains a list of words or phrases to search for.
Regexobject, which contains a single pattern defining the regular expression.
SurrogateTypeobject, if present, indicates that the custom infoType detector is a surrogate. For more information about how to use surrogate custom infoType detectors, see Pseudonymization.
"storedType": A reference to an existing
StoredInfoTypeobject. This field is required when creating a large custom dictionary detector. Although you can create regular dictionary detectors or regular expression detectors by defining this field, it's simpler to create those by defining the
Learn more about creating custom infoTypes from the following topics:
- Creating a regular custom dictionary detector: Learn how to create a custom infoType detector to match findings on a list of words and phrases.
- Creating a large custom dictionary detector: Learn how to match findings on a very large list of words and phrases. Stored custom infoType detectors can match on up to tens of millions of words.
- Creating a custom regex detector: Learn how to create a custom infoType detector to match findings on a regular expression.
- Modifying infoType detectors to refine scan results: Learn how to create modifiers for both built-in and custom infoType detectors that can fine-tune scan results.
- Customizing match likelihood: Learn how to use detection rules and hotwords to customize the likelihood values that are assigned to custom detector matches.
- Examples of custom infoType detectors: Several example JSON custom infoType detector definitions that you can use or alter as you see fit.