Image inspection and redaction

Sensitive Data Protection can inspect for and redact sensitive text from an image according to criteria that you specify.

Using infoType detectors and optical character recognition (OCR), Sensitive Data Protection inspects a base64-encoded image for text and detects sensitive data within the text. It can then return information about the location of sensitive data within the image, or redact the sensitive data by masking it with an opaque rectangle.

Inspection and redaction are two distinct actions:

  • Inspection: Sensitive Data Protection inspects the submitted base64-encoded image for the specified intoTypes. It returns the detected InfoTypes, along with one or more set of pixel coordinates and dimensions. Each set of pixel coordinate and dimension values indicate the bottom-left corner and the dimensions of bounding boxes, respectively. Each bounding box corresponds to all or part of a Sensitive Data Protection finding.
  • Redaction: Sensitive Data Protection inspects the submitted base64-encoded image for the specified infoTypes. Sensitive Data Protection redacts any sensitive data findings by masking them with opaque rectangles. It returns the redacted base64-encoded image in the same image format as the original image. You can also configure the color of the redaction boxes in the request.

About inspection

Sensitive Data Protection's image inspection takes a base64-encoded image, recognizes any text in the image, and then searches the text for any data that matches its inspection criteria. Finally, Sensitive Data Protection returns the locations of any sensitive data that it's detected.

Consider the following image. This image is an example of a typical image file generated from a scan of a paper document.

Original unredacted image.
Original unredacted image (click to enlarge).

If you instruct Sensitive Data Protection to inspect this image for US Social Security numbers, it goes through the process illustrated in the following diagram.

Image inspection process.
Image inspection process (click to enlarge).
  1. The base64-encoded image is streamed to Sensitive Data Protection using the content.inspect method.
  2. Using optical character recognition (OCR), Sensitive Data Protection recognizes text in the document.
  3. Sensitive Data Protection scans the recognized text using the sensitive data detection configuration you set previously and identifies any matches.
  4. Sensitive Data Protection returns the coordinates and dimensions of the regions within the image where it found sensitive data according to your detection criteria.

The returned coordinates indicate where to find the sensitive data. Be aware that Sensitive Data Protection often uses multiple boxes to indicate where a single instance of sensitive data is in the image. This is especially true when the text is written by hand, as in this example.

If Sensitive Data Protection doesn't find any data in the image that corresponds to your detection criteria, it returns an empty, successful HTTP 200 response.

About redaction

Image redaction is identical to image inspection, with one additional step. Once Sensitive Data Protection has identified the location(s) of sensitive data within the image, instead of returning the coordinates of the areas that contain the data, it fills those areas on the image, returning a redacted, base64-encoded image.

Again consider the original image from the previous section. If you instruct Sensitive Data Protection to redact all US Social Security numbers from the image, it goes through the process illustrated in the following diagram.

Image redaction process.
Image redaction process (click to enlarge).
  1. The base64-encoded image is streamed to Sensitive Data Protection using the image.redact method.
  2. Using optical character recognition (OCR), Sensitive Data Protection recognizes text in the document.
  3. Sensitive Data Protection scans the recognized text using the sensitive data detection configuration you set previously and identifies any matches.
  4. Sensitive Data Protection redacts all detected sensitive data by covering it with an opaque rectangle. It then encodes the image in base64 and returns it in the request response.

If Sensitive Data Protection doesn't find any data in the image that corresponds to your detection criteria, it returns the base64-encoded image unchanged.

What's next