InfoTypes and infoType detectors

An infoType in the Cloud Data Loss Prevention (DLP) API is a type of sensitive data. For example, the EMAIL_ADDRESS infoType corresponds to an email address, such as jsmith@example.org.

Every infoType has a corresponding detector. The DLP API uses infoType detectors in configuration to determine what to inspect for and how to transform findings. InfoType names are also used when displaying or reporting scan results.

The DLP API supports both built-in infoType detectors and custom infoType detectors. You define custom infoType detectors yourself using the DLP API.

Built-in infoType detectors

Built-in infoType detectors are built into the DLP API, and include detectors for country- or region-specific sensitive data such as the French Numéro d'Inscription au Répertoire (NIR), UK driver's license number, or US Social Security number, and detectors for global sensitive data such as credit card numbers or email addresses. To detect content that corresponds to infoTypes, the DLP API leverages various techniques including pattern matching, checksums, machine-learning, context analysis, and others.

The list of built-in infoType detectors is always being updated. For a complete list of currently supported built-in infoType detectors, see InfoType detector reference.

You can also view a complete list of all built-in infoType detectors by calling the DLP API's infoTypes.list method.

Built-in infoType detectors are not a 100% accurate detection method. For example, they can't guarantee compliance with regulatory requirements. You must decide what data is sensitive and how to best protect it. Google recommends that you test your settings to make sure your configuration meets your requirements.

Custom infoType detectors

There are two kinds of custom infoType detectors:

  • Dictionaries
  • Regular expressions (regex)

In addition, the DLP API includes the following detector extension, which allows you to fine-tine results by adjusting the likelihood based on other content in the vicinity of a potential finding:

  • Hotword rules

API Overview

In the DLP API, custom infoType detectors are defined in the CustomInfoType object, within the InspectConfig object. The CustomInfoType object allows you to create a custom infoType detector for new content or to fine-tune the results returned by existing infoType detectors.

You can use custom infoType detectors in inspection (the projects.content.inspect method) or de-identification (the projects.content.deidentify method), and in DLP jobs and their templates.

Dictionaries

Use custom dictionaries to match a list of words or phrases. A dictionary can act as its own unique detector.

For more details about how dictionary custom infoType detectors work, as well as examples in action, see Creating a dictionary custom infoType detector.

Regular expressions

A regular expression (regex) custom infoType detector allows you to create your own infoType detectors that enable the DLP API to detect matches based on a regex pattern. For example, suppose that you had medical record numbers in the form ###-#-#####. You could define a regex pattern such as the following:

[1-9]{3}-[1-9]{1}-[1-9]{5}

The DLP API would then match items like the following:

123-4-56789

You can also specify a likelihood to assign to each custom infoType match. That is, when the DLP API matches the sequence you specify, it will assign the likelihood that you have indicated. This is useful because if your custom regex defines a sequence that is common enough it could easily match some other random sequence, you would not want the DLP API to label every match as VERY_LIKELY. Doing so would erode confidence in scan results and potentially cause the wrong information to be de-identified.

For more information about regular expression custom infoType detectors, and to see them in action, see Creating a regex custom infoType detector.

Hotword rules

Hotword rules allow you to further extend dictionary and regex custom infoType detectors with powerful context rules. Suppose you wanted to detect a custom infoType like a medical record number in the form of ###-#-#####, and you wanted to boost the DLP API finding's match likelihood when the hotword "MRN" was before—but not after—this number. Therefore:

  • 123-4-56789 would match as POSSIBLE.
  • MRN 123-4-56789 would match as VERY_LIKELY.

Hotword rules enable the DLP API to do this. To learn how to do this, see Customizing match likelihood.

Was this page helpful? Let us know how we did:

Send feedback about...

Data Loss Prevention API