Customizing Match Likelihood

Hotword rules allow you to further extend built-in and custom infoType detectors with powerful context rules. Hotword rules allow you to add a regex and proximity detector to an existing infoType detector, and to adjust the match likelihood value appropriately.

Anatomy of a hotword rule

An infoType detector can have zero or more hotword rules. You define each hotword rule (HotwordRule object) inside an inspection rule (InspectionRule object). Each inspection rule is specified within a InspectConfig object.

Hotword rules go where indicated by [HOTWORDRULE_OBJECT] in the example JSON in Creating a dictionary custom infoType and Creating a regex custom infoType. As a JSON object, a single hotword rule inside a "detectionRules" array looks like this:

"detectionRules":[
  {
    "hotwordRule":{
      "hotwordRegex":{
        "pattern":"[REGEX_PATTERN]"
      },
      "proximity":{
        "windowAfter":"[NUM_CHARS_TO_CONSIDER_AFTER_FINDING]",
        "windowBefore":"[NUM_CHARS_TO_CONSIDER_BEFORE_FINDING]"
      }
      "likelihoodAdjustment":{
        "fixedLikelihood":"[LIKELIHOOD_VALUE]"
             -- OR --
        "relativeLikelihood":"[LIKELIHOOD_ADJUSTMENT]"
      },
    }
  },
  ...
]

Each hotword rule has three components:

  • "hotwordRegex": A regex pattern (Regex object) defining what qualifies as a hotword.
  • "proximity": The proximity of the finding within which the entire hotword must be contained. This field contains a Proximity object, which is comprised of two values:

    • "windowBefore": Number of characters before the finding to consider.
    • "windowAfter": Number of characters after the finding to consider.
  • "likelihoodAdjustment": The adjustment to the likelihood of a finding. This field contains a LikelihoodAdjustment object, which can be set to one of two values:

    • "fixedLikelihood": A fixed Likelihood value to set the finding to.
    • "relativeLikelihood": A number that indicates the levels by which to increase or decrease the likelihood of the finding. For example, if a finding would be POSSIBLE without the detection rule and relativeLikelihood is 1, then it is upgraded to LIKELY, while a value of -1 would downgrade it to UNLIKELY. Likelihood may never drop below VERY_UNLIKELY or exceed VERY_LIKELY, so applying an adjustment of 1 followed by an adjustment of -1 when base likelihood is VERY_LIKELY will result in a final likelihood of LIKELY.

Hotword example: Match medical record numbers

Suppose you wanted to detect a custom infoType like a medical record number in the form of ###-#-#####, and you wanted to boost the DLP API finding's match likelihood when the hotword "MRN" was before—but not after—this number. Therefore:

  • 123-4-56789 would match as POSSIBLE.
  • MRN 123-4-56789 would match as VERY_LIKELY.

The JSON example below shows the custom regex defined as explained in Creating a regex infoType detector, but with the appropriate hotword rule added on:

JSON Input:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:inspect?key={YOUR_API_KEY}

{
  "item":{
    "value":"Patient's MRN 444-5-22222"
  },
  "inspectConfig":{
    "customInfoTypes":[
      {
        "infoType":{
          "name":"C_MRN"
        },
        "regex":{
          "pattern":"[0-9]{3}-[0-9]{1}-[0-9]{5}"
        },
        "likelihood":"POSSIBLE",
        "detectionRules":[
          {
            "hotwordRule":{
              "hotwordRegex":{
                "pattern":"\b(?i)mrn(?-i)|\b(?i)Medical(?-i)"
              },
              "likelihoodAdjustment":{
                "fixedLikelihood":"VERY_LIKELY"
              },
              "proximity":{
                "windowBefore":10
              }
            }
          }
        ]
      }
    ]
  }
}

JSON Output:

{
  "result":{
    "findings":[
      {
        "infoType":{
          "name":"C_MRN"
        },
        "likelihood":"VERY_LIKELY",
        "location":{
          "byteRange":{
            "start":"14",
            "end":"25"
          },
          "codepointRange":{
            "start":"14",
            "end":"25"
          }
        },
        "createTime":"2018-11-13T18:50:44.337Z"
      }
    ]
  }
}

The output shows that, using the custom infoType detector we gave the name C_MRN and the custom regex, the DLP API has correctly identified the medical record number. Further, because of the context matching in the detection rule, the DLP API assigned the MRN a certainty of VERY_LIKELY, as configured.

Was this page helpful? Let us know how we did:

Send feedback about...

Data Loss Prevention API