Sensitive Data Protection can inspect for and redact sensitive text from an image according to criteria that you specify.
Using infoType detectors and optical character recognition (OCR), Sensitive Data Protection inspects a base64-encoded image for text and detects sensitive data within the text. It can then return information about the location of sensitive data within the image, or redact the sensitive data by masking it with an opaque rectangle.
Inspection and redaction are two distinct actions:
- Inspection: Sensitive Data Protection inspects the submitted base64-encoded image for the specified intoTypes. It returns the detected InfoTypes, along with one or more set of pixel coordinates and dimensions. Each set of pixel coordinate and dimension values indicate the bottom-left corner and the dimensions of bounding boxes, respectively. Each bounding box corresponds to all or part of a Sensitive Data Protection finding.
- Redaction: Sensitive Data Protection inspects the submitted base64-encoded image for the specified infoTypes. Sensitive Data Protection redacts any sensitive data findings by masking them with opaque rectangles. It returns the redacted base64-encoded image in the same image format as the original image. You can also configure the color of the redaction boxes in the request.
About inspection
Sensitive Data Protection's image inspection takes a base64-encoded image, recognizes any text in the image, and then searches the text for any data that matches its inspection criteria. Finally, Sensitive Data Protection returns the locations of any sensitive data that it's detected.
Consider the following image. This image is an example of a typical image file generated from a scan of a paper document.
If you instruct Sensitive Data Protection to inspect this image for US Social Security numbers, it goes through the process illustrated in the following diagram.
- The base64-encoded image is streamed to Sensitive Data Protection
using the
content.inspect
method. - Using optical character recognition (OCR), Sensitive Data Protection recognizes text in the document.
- Sensitive Data Protection scans the recognized text using the sensitive data detection configuration you set previously and identifies any matches.
- Sensitive Data Protection returns the coordinates and dimensions of the regions within the image where it found sensitive data according to your detection criteria.
The returned coordinates indicate where to find the sensitive data. Be aware that Sensitive Data Protection often uses multiple boxes to indicate where a single instance of sensitive data is in the image. This is especially true when the text is written by hand, as in this example.
If Sensitive Data Protection doesn't find any data in the image that corresponds to your detection criteria, it returns an empty, successful HTTP 200 response.
About redaction
Image redaction is identical to image inspection, with one additional step. Once Sensitive Data Protection has identified the location(s) of sensitive data within the image, instead of returning the coordinates of the areas that contain the data, it fills those areas on the image, returning a redacted, base64-encoded image.
Again consider the original image from the previous section. If you instruct Sensitive Data Protection to redact all US Social Security numbers from the image, it goes through the process illustrated in the following diagram.
- The base64-encoded image is streamed to Sensitive Data Protection
using the
image.redact
method. - Using optical character recognition (OCR), Sensitive Data Protection recognizes text in the document.
- Sensitive Data Protection scans the recognized text using the sensitive data detection configuration you set previously and identifies any matches.
- Sensitive Data Protection redacts all detected sensitive data by covering it with an opaque rectangle. It then encodes the image in base64 and returns it in the request response.
If Sensitive Data Protection doesn't find any data in the image that corresponds to your detection criteria, it returns the base64-encoded image unchanged.
What's next
- Learn how to inspect images for sensitive data using Sensitive Data Protection.
- Learn how to redact sensitive data from images using Sensitive Data Protection.
- Learn more about creating a de-identified copy of data in storage.