Text classification
Given the following text input:
Please update my records with the following information: Email address: foo@example.com National Provider Identifier: 1245319599 Driver's license: AC333991
The output is a list of findings, organized into the following categories:
InfoType
Likelihood
Offset
(Where in the string the potentialInfoType
was found)
Example output is shown in the table below.
InfoType |
Likelihood |
Offset |
---|---|---|
US_HEALTHCARE_NPI |
VERY_LIKELY |
122 |
EMAIL_ADDRESS |
LIKELY |
72 |
US_DRIVERS_LICENSE_NUMBER |
LIKELY |
155 |
CANADA_BC_PHN |
VERY_UNLIKELY |
122 |
UK_TAXPAYER_REFERENCE |
VERY_UNLIKELY |
122 |
CANADA_PASSPORT |
VERY_UNLIKELY |
155 |
Automatic text redaction
Automatic redaction produces an output with sensitive data matches removed instead of giving you a list of findings.
Example automation redaction input:
Please update my records with the following information: Email address: foo@example.com National Provider Identifier: 1245319599 Driver's license: AC333991
Example output using a placeholder of "***":
Please update my records with the following information: Email address: *** National Provider Identifier: *** Driver's license: ***
Resources
- For more information about using Sensitive Data Protection to redact text, see Redacting Sensitive Data From Text Content.
- For more information about using Sensitive Data Protection to de-identify sensitive data in text content—which includes "masking" sensitive data, replacing sensitive data with a "token" string, and encrypting and replacing sensitive data using a randomly generated or pre-determined key—see De-identifying sensitive data in text content.