A regular expression (regex) custom infoType detector allows you to create your
own detectors that enable Cloud DLP to detect matches based
on a regex pattern. For example, suppose that you had medical record numbers in
the form ###-#-#####
. You could define a regex pattern such as the following:
[0-9]{3}-[0-9]{1}-[0-9]{5}
Cloud DLP would then match items like the following:
012-4-56789
Anatomy of a regex custom infoType detector
As summarized in
API Overview, to create a
custom regex infoType detector, you define a
CustomInfoType
object that contains:
- The name you want to give the custom infoType detector, within in an
InfoType
object. - An optional
Likelihood
value. If you omit this, regex matches will return a default likelihood ofVERY_LIKELY
. If you notice a regex custom infoType detector returning too many false positives, try reducing the base likelihood and using detection rules to boost the likelihood using contextual information. To learn more, see Customizing finding likelihood. - Optional
DetectionRule
s, or hotword rules. These rules adjust the likelihood of findings within a given proximity of specified hotwords. Learn more about hotword rules in Customizing finding likelihood. - A
Regex
object consisting of a single pattern defining the regular expression.
As a JSON object, a regex custom infoType detector that includes all optional components looks like this:
{
"customInfoTypes":[
{
"infoType":{
"name":"[CUSTOM_INFOTYPE_NAME]"
},
"likelihood":"[LIKELIHOOD_VALUE]",
"detectionRules":[
{
"hotwordRule":{
[HOTWORDRULE_OBJECT]
}
},
...
],
"regex":{
"pattern":"[REGEX_PATTERN]"
}
}
],
...
}
Regex example: Match medical record numbers
The following JSON snippet and code in several languages below show
a regular expression custom infoType detector that instructs
Cloud DLP to match a medical record number
(MRN) in the input text "Patient's MRN 444-5-22222," and assign each match a
likelihood of POSSIBLE
.
Protocol
See the JSON quickstart for more information about using the DLP API with JSON.
JSON Input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:inspect?key={YOUR_API_KEY}
{
"item":{
"value":"Patients MRN 444-5-22222"
},
"inspectConfig":{
"customInfoTypes":[
{
"infoType":{
"name":"C_MRN"
},
"regex":{
"pattern":"[1-9]{3}-[1-9]{1}-[1-9]{5}"
},
"likelihood":"POSSIBLE"
}
]
}
}
JSON Output:
{
"result":{
"findings":[
{
"infoType":{
"name":"C_MRN"
},
"likelihood":"POSSIBLE",
"location":{
"byteRange":{
"start":"13",
"end":"24"
},
"codepointRange":{
"start":"13",
"end":"24"
}
},
"createTime":"2018-11-30T01:29:37.799Z"
}
]
}
}
The output shows that, using the custom infoType detector we gave the name
C_MRN
and its custom regex, Cloud DLP has correctly
identified the medical record number and assigned it a certainty of POSSIBLE
,
as we specified.
Customizing match likelihood builds on this example to include context words.
Java
To learn how to install and use the client library for Cloud DLP, see Cloud DLP client libraries.
Python
To learn how to install and use the client library for Cloud DLP, see Cloud DLP client libraries.