Using hotword rules, you can further extend built-in and custom infoType detectors with powerful context rules. A hotword rule instructs Sensitive Data Protection to adjust the likelihood of a finding, depending on whether a hotword occurs near that finding. A hotword rule is a kind of inspection rule, which is specified in rule sets. Each rule is applied to a set of built-in or custom infoTypes.
Anatomy of a hotword rule
An infoType detector can have zero or more hotword rules. In your
inspection configuration, you
define each HotwordRule
object inside the rules
array, as follows:
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"REGEX_PATTERN"
},
"proximity":{
"windowAfter":"NUM_CHARS_TO_CONSIDER_AFTER_FINDING",
"windowBefore":"NUM_CHARS_TO_CONSIDER_BEFORE_FINDING"
}
"likelihoodAdjustment":{
"fixedLikelihood":"LIKELIHOOD_VALUE"
-- OR --
"relativeLikelihood":"LIKELIHOOD_ADJUSTMENT"
},
}
},
...
]
Replace the following:
- REGEX_PATTERN: a regular expression
(
Regex
object) that defines what qualifies as a hotword. - NUM_CHARS_TO_CONSIDER_AFTER_FINDING: a range of characters after the finding. Sensitive Data Protection analyzes this range to determine whether a hotword occurs near the finding.
NUM_CHARS_TO_CONSIDER_BEFORE_FINDING: a range of characters before the finding. Sensitive Data Protection analyzes this range to determine whether a hotword occurs near the finding.
LIKELIHOOD_VALUE: a fixed
Likelihood
level to set the finding to.LIKELIHOOD_ADJUSTMENT: a number that indicates how much Sensitive Data Protection must increase or decrease the likelihood of the finding. A positive integer increases the likelihood level, and a negative integer decreases it. For example, if a finding would be
POSSIBLE
without the detection rule andrelativeLikelihood
is 1, then the finding is upgraded toLIKELY
. IfrelativeLikelihood
is -1, then the finding is downgraded toUNLIKELY
. Likelihood can never drop lower thanVERY_UNLIKELY
or exceedVERY_LIKELY
. In these cases, the likelihood level remains the same. For example, if the base likelihood isVERY_LIKELY
and therelativeLikelihood
is 1, the final likelihood remains to beVERY_LIKELY
.
Hotword example: Match medical record numbers
Suppose you want to detect a custom infoType such as a medical record number (MRN) in the form "###-#-#####". Also, you want Sensitive Data Protection to increase the match likelihood of each finding that follows the hotword "MRN".
Example values:
- 123-4-56789 would match as
POSSIBLE
. - MRN 123-4-56789 would match as
VERY_LIKELY
.
The following JSON example and code snippets show you how to configure the hotword rule. This example uses a custom regular expression detector.
In this example, note the following:
- The request defines the
C_MRN
custom infoType, which is a detector for any string that matches the regular expression[0-9]{3}-[0-9]{1}-[0-9]{5}
. - The regular expression
(?i)(mrn|medical)(?-i)
defines the hotword. Sensitive Data Protection searches for this hotword within the range of characters defined in theproximity
field. - For each
C_MRN
finding that has a hotword within the setproximity
, Sensitive Data Protection sets the likelihood level toVERY_LIKELY
.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
HTTP method and URL:
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:inspect
Replace PROJECT_ID
with the project ID.
JSON input:
{
"item":{
"value":"Patient's MRN 444-5-22222 and just a number 333-2-33333"
},
"inspectConfig":{
"customInfoTypes":[
{
"infoType":{
"name":"C_MRN"
},
"regex":{
"pattern":"[0-9]{3}-[0-9]{1}-[0-9]{5}"
},
"likelihood":"POSSIBLE",
}
],
"ruleSet":[
{
"infoTypes": [{"name" : "C_MRN"}],
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"(?i)(mrn|medical)(?-i)"
},
"likelihoodAdjustment":{
"fixedLikelihood":"VERY_LIKELY"
},
"proximity":{
"windowBefore":10
}
}
}
]
}
]
}
}
JSON output (abbreviated):
{ "result": { "findings": [ { "infoType": { "name": "C_MRN" }, "likelihood": "VERY_LIKELY", "location": { "byteRange": { "start": "14", "end": "25" }, "codepointRange": { ... } } }, { "infoType": { "name": "C_MRN" }, "likelihood": "POSSIBLE", "byteRange": { "start": "44", "end": "55" }, "codepointRange": { ... } } } ] } }
The output shows that Sensitive Data Protection correctly
identified the medical record number using the C_MRN
custom infoType detector.
Further, because of the context matching
in the hotword rule, Sensitive Data Protection assigned the first result—which
had an MRN within the set proximity
—a likelihood of VERY_LIKELY
, as configured. The second
finding lacked the context, so the likelihood
stayed at POSSIBLE
.
Hotword example: Set the match likelihood of a table column
This example demonstrates how you can set the match likelihood of an entire column of data. This approach is helpful, for example, if you want to exclude a column of data from inspection results.
Consider the following table. One column contains placeholder Social Security numbers (SSNs), and another contains real SSNs.
Fake Social Security Number | Real Social Security Number |
---|---|
111-11-1111 | 222-22-2222 |
To minimize noise in inspection results, you can exclude any findings in the
Fake Social Security Number
column. Assign a low likelihood level to this
column. Then, configure the request such that matches with that likelihood level
are excluded from the results.
In this example, note the following:
- The hotword rule is applied to the
US_SOCIAL_SECURITY_NUMBER
infoType. - The hotword regular expression
(Fake Social Security Number)
contains the name of the column that has the placeholder values. - The
windowBefore
property is set to 1, which means that the hotword is in a column header, and the findings must be in the column. - For each
US_SOCIAL_SECURITY_NUMBER
finding in this column, Sensitive Data Protection sets the likelihood level toVERY_UNLIKELY
. - The
minLikelihood
property is set toPOSSIBLE
, which means that any finding that has a likelihood level lower thanPOSSIBLE
is excluded from the inspection results.
See the JSON quickstart for more information about using the DLP API with JSON.
HTTP method and URL:
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:inspect
Replace PROJECT_ID
with the project ID.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
JSON input:{
"item": {
"table": {
"headers": [
{
"name": "Fake Social Security Number"
},
{
"name": "Real Social Security Number"
}
],
"rows": [
{
"values": [
{
"stringValue": "111-11-1111"
},
{
"stringValue": "222-22-2222"
}
]
}
]
}
},
"inspectConfig": {
"infoTypes": [
{
"name": "US_SOCIAL_SECURITY_NUMBER"
}
],
"includeQuote": true,
"ruleSet": [
{
"infoTypes": [
{
"name": "US_SOCIAL_SECURITY_NUMBER"
}
],
"rules": [
{
"hotwordRule": {
"hotwordRegex": {
"pattern": "(Fake Social Security Number)"
},
"likelihoodAdjustment": {
"fixedLikelihood": "VERY_UNLIKELY"
},
"proximity": {
"windowBefore": 1
}
}
}
]
}
],
"minLikelihood": "POSSIBLE"
}
}
JSON output:
{ "result": { "findings": [ { "quote": "222-22-2222", "infoType": { "name": "US_SOCIAL_SECURITY_NUMBER" }, "likelihood": "VERY_LIKELY", "location": { "byteRange": { "end": "11" }, "codepointRange": { "end": "11" }, "contentLocations": [ { "recordLocation": { "fieldId": { "name": "Real Social Security Number" }, "tableLocation": {} } } ] }, "createTime": "TIMESTAMP", "findingId": "TIMESTAMP" } ] } }
The value 111-11-1111, which is in the Fake Social Security Number
column, matched the hotword rule, so Sensitive Data Protection assigned to it the
VERY_UNLIKELY
likelihood level . This level is lower than the minimum
likelihood set in the inspection configuration (POSSIBLE
), so this finding
is excluded from the inspection result.
You can experiment with this example by removing the rule set. Notice that Sensitive Data Protection includes 111-11-1111 in the results.