Creating a Custom Regex Detector

A regular expression (regex) custom infoType detector allows you to create your own detectors that enable the DLP API to detect matches based on a regex pattern. For example, suppose that you had medical record numbers in the form ###-#-#####. You could define a regex pattern such as the following:

[0-9]{3}-[0-9]{1}-[0-9]{5}

The DLP API would then match items like the following:

012-4-56789

Anatomy of a regex custom infoType detector

As summarized in API Overview, to create a custom regex infoType detector, you define a CustomInfoType object that contains:

  • The name you want to give the custom infoType detector, within in an InfoType object.
  • An optional Likelihood value. If you omit this, regex matches will return a default likelihood of VERY_LIKELY. If you notice a regex custom infoType detector returning too many false positives, try reducing the base likelihood and using detection rules to boost the likelihood using contextual information. To learn more, see Customizing finding likelihood.
  • Optional DetectionRules, or hotword rules. These rules adjust the likelihood of findings within a given proximity of specified hotwords. Learn more about hotword rules in Customizing finding likelihood.
  • A Regex object consisting of a single pattern defining the regular expression.

As a JSON object, a regex custom infoType detector that includes all optional components looks like this:

{
  "customInfoTypes":[
    {
      "infoType":{
        "name":"custom-infoType-name"
      },
      "likelihood":"likelihood-value",
      "detectionRules":[
        {
          "hotwordRule":{
            HotwordRule-object
          }
        },
        ...
      ],
      "regex":{
        "pattern":"regex-pattern"
      }
    }
  ],
  ...
}

Regex example: Match medical record numbers

The JSON input example below shows a regular expression custom infoType detector that instructs the DLP API to match a medical record number (MRN) in the input text "Patient's MRN 444-5-22222," and assign each match a likelihood of POSSIBLE.

JSON Input:

{
 "item": {
  "value": "Patients MRN 444-5-22222"
 },
 "inspectConfig": {
  "customInfoTypes": [
   {
    "infoType": {
     "name": "C_MRN"
    },
    "regex": {
     "pattern": "[1-9]{3}-[1-9]{1}-[1-9]{5}"
    },
    "likelihood": "POSSIBLE"
   }
  ]
 }
}

URL:

POST https://dlp.googleapis.com/v2/{parent=projects/*}/content:inspect

JSON Output:

{
 "result": {
  "findings": [
   {
    "infoType": {
     "name": "C_MRN"
    },
    "likelihood": "POSSIBLE",
    "location": {
     "byteRange": {
      "start": "13",
      "end": "24"
     },
     "codepointRange": {
      "start": "13",
      "end": "24"
     }
    },
    "createTime": "2018-05-18T22:39:49.382Z"
   }
  ]
 }
}

The output shows that, using the custom infoType detector we gave the name C_MRN and its custom regex, the DLP API has correctly identified the medical record number and assigned it a certainty of POSSIBLE, as we specified.

Customizing match likelihood builds on this example to include context words.

Was this page helpful? Let us know how we did:

Send feedback about...

Data Loss Prevention API