Hotword rules allow you to further extend built-in and custom infoType detectors with powerful context rules. Hotword rules allow you to add a regex and proximity detector to an existing infoType detector, and to adjust the match likelihood value appropriately. A hotword rule is a kind of inspection rule, which is specified in rule sets. Each rule set is applied to a set of infoTypes, which can be either custom or built-in.
Anatomy of a hotword rule
An infoType detector can have zero or more hotword rules. You define each
hotword rule
(HotwordRule
object) inside an inspection rule
(InspectionRule
object). Each inspection rule is specified within a
(InspectionRuleSet
object, which in turn is contained in an
InspectConfig
object.
As a JSON object, a single hotword rule inside a "inspectionRules"
array looks
like this:
"inspectionRules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"[REGEX_PATTERN]"
},
"proximity":{
"windowAfter":"[NUM_CHARS_TO_CONSIDER_AFTER_FINDING]",
"windowBefore":"[NUM_CHARS_TO_CONSIDER_BEFORE_FINDING]"
}
"likelihoodAdjustment":{
"fixedLikelihood":"[LIKELIHOOD_VALUE]"
-- OR --
"relativeLikelihood":"[LIKELIHOOD_ADJUSTMENT]"
},
}
},
...
]
Each hotword rule has three components:
"hotwordRegex"
: A regex pattern (Regex
object) defining what qualifies as a hotword."proximity"
: The proximity of the finding within which the entire hotword must be contained. This field contains aProximity
object, which is comprised of two values:"windowBefore"
: Number of characters before the finding to consider."windowAfter"
: Number of characters after the finding to consider.
"likelihoodAdjustment"
: The adjustment to the likelihood of a finding. This field contains aLikelihoodAdjustment
object, which can be set to one of two values:"fixedLikelihood"
: A fixedLikelihood
value to set the finding to."relativeLikelihood"
: A number that indicates the levels by which to increase or decrease the likelihood of the finding. For example, if a finding would bePOSSIBLE
without the detection rule andrelativeLikelihood
is 1, then it is upgraded toLIKELY
, while a value of -1 would downgrade it toUNLIKELY
. Likelihood may never drop belowVERY_UNLIKELY
or exceedVERY_LIKELY
, so applying an adjustment of 1 followed by an adjustment of -1 when base likelihood isVERY_LIKELY
will result in a final likelihood ofLIKELY
.
Hotword example: Match medical record numbers
Suppose you wanted to detect a custom infoType such as a medical record number in the form "###-#-#####", and you wanted to boost Cloud DLP finding's match likelihood when the hotword "MRN" was before—but not after—this number. Therefore:
- 123-4-56789 would match as
POSSIBLE
. - MRN 123-4-56789 would match as
VERY_LIKELY
.
The JSON example and code snippets below shows the custom regex defined as explained in Creating a regex infoType detector, but with the appropriate hotword rule added on:
Protocol
See the JSON quickstart for more information about using the Cloud DLP API with JSON.
JSON input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:inspect?key={YOUR_API_KEY}
{
"item":{
"value":"Patient's MRN 444-5-22222 and just a number 333-2-33333"
},
"inspectConfig":{
"customInfoTypes":[
{
"infoType":{
"name":"C_MRN"
},
"regex":{
"pattern":"[0-9]{3}-[0-9]{1}-[0-9]{5}"
},
"likelihood":"POSSIBLE",
}
],
"ruleSet":[
{
"infoTypes": [{"name" : "C_MRN"}],
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"(?i)(mrn|medical)(?-i)"
},
"likelihoodAdjustment":{
"fixedLikelihood":"VERY_LIKELY"
},
"proximity":{
"windowBefore":10
}
}
}
]
}
]
}
}
JSON output (abbreviated):
{
"result": {
"findings": [
{
"infoType": {
"name": "C_MRN"
},
"likelihood": "VERY_LIKELY",
"location": {
"byteRange": {
"start": "14",
"end": "25"
},
"codepointRange": { ... }
}
},
{
"infoType": {
"name": "C_MRN"
},
"likelihood": "POSSIBLE",
"byteRange": {
"start": "44",
"end": "55"
},
"codepointRange": { ... }
}
}
]
}
}
The output shows that, using the custom infoType detector we gave the name
C_MRN
and the custom regex, Cloud DLP has correctly
identified the medical record number. Further, because of the context matching
in the hotword rule, Cloud DLP assigned the first result
(which had MRN close by) a certainty of VERY_LIKELY
, as configured. Second
finding lacked the context, thus the certainty stayed at POSSIBLE
.
Java
To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.
Python
To learn how to install and use the client library for Cloud DLP, see the Cloud DLP Client Libraries.