Text classification and redaction

Text classification

Given the following text input:

Please update my records with the following information:
Email address: foo@example.com

National Provider Identifier: 1245319599

Driver's license: AC333991

The output is a list of findings, organized into the following categories:

Example output is shown in the table below.

InfoType Likelihood Offset
US_HEALTHCARE_NPI VERY_LIKELY 122
EMAIL_ADDRESS LIKELY 72
US_DRIVERS_LICENSE_NUMBER LIKELY 155
CANADA_BC_PHN VERY_UNLIKELY 122
UK_TAXPAYER_REFERENCE VERY_UNLIKELY 122
CANADA_PASSPORT VERY_UNLIKELY 155

Automatic text redaction

Automatic redaction produces an output with sensitive data matches removed instead of giving you a list of findings.

Example automation redaction input:

Please update my records with the following information:
Email address: foo@example.com

National Provider Identifier: 1245319599

Driver's license: AC333991

Example output using a placeholder of "***":

Please update my records with the following information:
Email address: ***

National Provider Identifier: ***

Driver's license: ***

Resources

  • For more information about using Sensitive Data Protection to redact text, see Redacting Sensitive Data From Text Content.
  • For more information about using Sensitive Data Protection to de-identify sensitive data in text content—which includes "masking" sensitive data, replacing sensitive data with a "token" string, and encrypting and replacing sensitive data using a randomly generated or pre-determined key—see De-identifying sensitive data in text content.