Redacting sensitive data from images

The Cloud Data Loss Prevention (DLP) API can redact sensitive text from an image. Sensitive data such as personally identifiable information (PII) is detected by the API, which then obscures it using an opaque rectangle.

For example, consider the following image:

Image before redaction

After submitting the image to the DLP API using default settings, the image appears as follows:

Image after redaction

The API’s image.redact method takes the following as arguments:

  • At least one base64-encoded image. The DLP API currently supports the following image types: IMAGE_JPEG, IMAGE_BMP and IMAGE_PNG.
  • You must include an inspectConfig argument to specify detection configuration information (InspectConfig) such as what infoTypes to look for.
  • You can also include an imageRedactionConfigs[] argument, which specifies redaction configuration information (ImageRedactionConfig) such as what color to use to obscure redacted text or whether to redact only text of certain infoTypes or all text.

The API returns the same image(s) you gave it, in the same format, but any text identified as containing sensitive information according to your criteria has been redacted.

REST example

The following JSON includes a base64-encoded image and sets several parameters for using the image redaction functionality of the DLP API. The image before redaction is shown here:

Image before redaction

The image after redaction is shown here:

See the JSON quickstart for more information about using the DLP API with JSON.

Sample input:

{
  "byteItem": 
  {
    "data": "/9j/4AAQSkZJRgABAQAASABIAAD/4QBARXhpZgAATU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAAqACAAQAAAABAAAAe6ADAAQAAAABAAAADAAAAAD/7QA4UGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAAAA4QklNBCUAAAAAABDUHYzZjwCyBOmACZjs+EJ+/8AAEQgADAB7AwEiAAIRAQMRAf/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC//EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+v/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC//EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/bAEMAHBwcHBwcMBwcMEQwMDBEXERERERcdFxcXFxcdIx0dHR0dHSMjIyMjIyMjKioqKioqMTExMTE3Nzc3Nzc3Nzc3P/bAEMBIiQkODQ4YDQ0YOacgJzm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5ubm5v/dAAQACP/aAAwDAQACEQMRAD8A2UZ5JJGLlQjY2jGMAA85Heojcs4G35cMQcd/k3DqBVwwxM/mMilvUjmhYYV+6ij6AUAVZppFtk8s4kcAA4z2yTikW5LsHGQGWPj0LMQavbVGCAOBge1NEcY6KPXp75/nzQBWN03lLJsz5h+UAknoTzgcdKklkIVQuQW/PsP5mpPJhwRsXBOTwOTQ8SOoU8Y6UARQSNIh3H6Hj/8AVUQeUhV3nDvhWwM7dufTHUenSrYjULsxkHrwOab5EITywi7TzjAxQBTS4kdkL7gu1M4xgljjnPOPpVmR3dCIdwYEdsHGecbhjpUvlxkhtoyvQ46U5lVxtcAj0NAFMyMfKkDt5ZA5wvJJGM8fyp6O6yuJS3cgYBGM9sDNT+XHkHaPl6cdPpShEUllABPU+tAFSeZ8ZjJUBGbkYyVxxyOnNOHmNcYWRsAZZflwM9B0z79asPGkmN6hscjIzTtq88D5uvvQBRV5ipGW+V8HgFgMZ7DHX9KtQO0kCSN1ZQTS+TDt2bF29cY4qTpwKAP/2Q==",
    "type": "IMAGE_JPEG"
  },
  "imageRedactionConfigs": 
  [
    {
      "infoType": 
      {
        "name": "PHONE_NUMBER"
      },
      "redactionColor": 
      {
        "blue": 0.1,
        "green": 0.1,
        "red": 0.8
      }
    },
    {
      "infoType": 
      {
        "name": "PERSON_NAME"
      },
      "redactionColor": 
      {
        "blue": 0.1,
        "green": 0.8,
        "red": 0.1
      }
    }
  ],
  "inspectConfig": 
  {
    "infoTypes": 
    [
      {
        "name": "PHONE_NUMBER"
      },
      {
        "name": "PERSON_NAME"
      }
    ]
  }
}

URL:

POST https://dlp.googleapis.com/v2/{parent=projects/*}/image:redact

Sample output:

{
 "redactedImage": "/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAMAHsDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD1C0mu9T1PWJpdSubWOxu/KFrD5fliNY43+femRuyfoD1qjL4nubwRNb/6MI7p4ZBFz5v+hm4T/WRp7fl+XRzaTpdxqYvZ9LspbsDHnvCC/wCdWLbQtItx/o+lWMP/AFzt0X+lAHkutyvPqKzSvvkktrd3fGNzGFCTjtk1nVu+MIkg8U3cUShI0Eaqo6ACNawq+RxP8afq/wAz4XGf7xU/xP8AMK9Z1TUpore2ijM0UlwOMf6wcpH/AOhyivJq9o1XTLWfT0RkIWDmPB6V6mT/AG/l+p7OQ/8ALz5fqeaeO9SuNQ8D27Tv5gF+myT5eR5cn935fyry+vYPifbx2/giw8tVBe8RnIRQWPlvycAV4/XRiv4h+r5F/ua9WWbC7WyuHlcyANDLF+7xnLxsg69stz3xnHNVq09AVW1GUMoYfYrs4I7i3kxWZWH2T1F/Ffov1CiiipNTT16RZNQiZd+BZWq/MpU5FvGDwR044Pcc81mVreIv+QnD/wBeNn/6TR1k1UviZlQ/hR9EFfS3hFi/g7RiwGfscQ/AKAK+aa+mfCyj/hEdF4/5cYP/AEWK6sH8TPB4j/hQ9T//2Q=="
}

Code examples

Following is sample code in several languages that demonstrates how to use the DLP API to redact sensitive text from an image.

Java

/*
 * Redact sensitive data from an image using the Data Loss Prevention API.
 *
 * @param filePath The path to a local file to inspect. Can be a JPG or PNG image file.
 * @param minLikelihood The minimum likelihood required before redacting a match.
 * @param infoTypes The infoTypes of information to redact.
 * @param outputPath The local path to save the resulting image to.
 * @param projectId The project ID to run the API call under.
 */
private static void redactImage(
    String filePath,
    Likelihood minLikelihood,
    List<InfoType> infoTypes,
    String outputPath,
    String projectId)
    throws Exception {

  // Instantiate the DLP client
  try (DlpServiceClient dlpClient = DlpServiceClient.create()) {
    String mimeType = URLConnection.guessContentTypeFromName(filePath);
    if (mimeType == null) {
      mimeType = MimetypesFileTypeMap.getDefaultFileTypeMap().getContentType(filePath);
    }

    ByteContentItem.BytesType bytesType;
    switch (mimeType) {
      case "image/jpeg":
        bytesType = ByteContentItem.BytesType.IMAGE_JPEG;
        break;
      case "image/bmp":
        bytesType = ByteContentItem.BytesType.IMAGE_BMP;
        break;
      case "image/png":
        bytesType = ByteContentItem.BytesType.IMAGE_PNG;
        break;
      case "image/svg":
        bytesType = ByteContentItem.BytesType.IMAGE_SVG;
        break;
      default:
        bytesType = ByteContentItem.BytesType.BYTES_TYPE_UNSPECIFIED;
        break;
    }

    byte[] data = Files.readAllBytes(Paths.get(filePath));

    InspectConfig inspectConfig =
        InspectConfig.newBuilder()
            .addAllInfoTypes(infoTypes)
            .setMinLikelihood(minLikelihood)
            .build();

    ByteContentItem byteContentItem =
        ByteContentItem.newBuilder()
            .setType(bytesType)
            .setData(ByteString.copyFrom(data))
            .build();

    List<RedactImageRequest.ImageRedactionConfig> imageRedactionConfigs =
        infoTypes
            .stream()
            .map(
                infoType ->
                    RedactImageRequest.ImageRedactionConfig.newBuilder()
                        .setInfoType(infoType)
                        .build())
            .collect(Collectors.toList());

    RedactImageRequest redactImageRequest =
        RedactImageRequest.newBuilder()
            .setParent(ProjectName.of(projectId).toString())
            .addAllImageRedactionConfigs(imageRedactionConfigs)
            .setByteItem(byteContentItem)
            .setInspectConfig(inspectConfig)
            .build();

    RedactImageResponse redactImageResponse = dlpClient.redactImage(redactImageRequest);

    // redacted image data
    ByteString redactedImageData = redactImageResponse.getRedactedImage();
    FileOutputStream outputStream = new FileOutputStream(outputPath);
    outputStream.write(redactedImageData.toByteArray());
    outputStream.close();
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Imports required Node.js libraries
const mime = require('mime');
const fs = require('fs');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The path to a local file to inspect. Can be a JPG or PNG image file.
// const filepath = 'path/to/image.png';

// The minimum likelihood required before redacting a match
// const minLikelihood = 'LIKELIHOOD_UNSPECIFIED';

// The infoTypes of information to redact
// const infoTypes = [{ name: 'EMAIL_ADDRESS' }, { name: 'PHONE_NUMBER' }];

// The local path to save the resulting image to.
// const outputPath = 'result.png';

const imageRedactionConfigs = infoTypes.map(infoType => {
  return {infoType: infoType};
});

// Load image
const fileTypeConstant =
  ['image/jpeg', 'image/bmp', 'image/png', 'image/svg'].indexOf(
    mime.getType(filepath)
  ) + 1;
const fileBytes = Buffer.from(fs.readFileSync(filepath)).toString('base64');

// Construct image redaction request
const request = {
  parent: dlp.projectPath(callingProjectId),
  byteItem: {
    type: fileTypeConstant,
    data: fileBytes,
  },
  inspectConfig: {
    minLikelihood: minLikelihood,
    infoTypes: infoTypes,
  },
  imageRedactionConfigs: imageRedactionConfigs,
};

// Run image redaction request
dlp
  .redactImage(request)
  .then(response => {
    const image = response[0].redactedImage;
    fs.writeFileSync(outputPath, image);
    console.log(`Saved image redaction results to path: ${outputPath}`);
  })
  .catch(err => {
    console.log(`Error in redactImage: ${err.message || err}`);
  });

Python

def redact_image(project, filename, output_filename,
                 info_types, min_likelihood=None, mime_type=None):
    """Uses the Data Loss Prevention API to redact protected data in an image.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        filename: The path to the file to inspect.
        output_filename: The path to which the redacted image will be written.
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        min_likelihood: A string representing the minimum likelihood threshold
            that constitutes a match. One of: 'LIKELIHOOD_UNSPECIFIED',
            'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE', 'LIKELY', 'VERY_LIKELY'.
        mime_type: The MIME type of the file. If not specified, the type is
            inferred via the Python standard library's mimetypes module.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    # Import the client library
    import google.cloud.dlp

    # Instantiate a client.
    dlp = google.cloud.dlp.DlpServiceClient()

    # Prepare info_types by converting the list of strings into a list of
    # dictionaries (protos are also accepted).
    info_types = [{'name': info_type} for info_type in info_types]

    # Prepare image_redaction_configs, a list of dictionaries. Each dictionary
    # contains an info_type and optionally the color used for the replacement.
    # The color is omitted in this sample, so the default (black) will be used.
    image_redaction_configs = []

    if info_types is not None:
        for info_type in info_types:
            image_redaction_configs.append({'info_type': info_type})

    # Construct the configuration dictionary. Keys which are None may
    # optionally be omitted entirely.
    inspect_config = {
        'min_likelihood': min_likelihood,
        'info_types': info_types,
    }

    # If mime_type is not specified, guess it from the filename.
    if mime_type is None:
        mime_guess = mimetypes.MimeTypes().guess_type(filename)
        mime_type = mime_guess[0] or 'application/octet-stream'

    # Select the content type index from the list of supported types.
    supported_content_types = {
        None: 0,  # "Unspecified"
        'image/jpeg': 1,
        'image/bmp': 2,
        'image/png': 3,
        'image/svg': 4,
        'text/plain': 5,
    }
    content_type_index = supported_content_types.get(mime_type, 0)

    # Construct the byte_item, containing the file's byte data.
    with open(filename, mode='rb') as f:
        byte_item = {'type': content_type_index, 'data': f.read()}

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Call the API.
    response = dlp.redact_image(
        parent, inspect_config=inspect_config,
        image_redaction_configs=image_redaction_configs,
        byte_item=byte_item)

    # Write out the results.
    with open(output_filename, mode='wb') as f:
        f.write(response.redacted_image)
    print("Wrote {byte_count} to {filename}".format(
        byte_count=len(response.redacted_image), filename=output_filename))

Go

// redactImage blacks out the identified portions of the input image (with type bytesType)
// and stores the result in outputPath.
func redactImage(w io.Writer, client *dlp.Client, project string, minLikelihood dlppb.Likelihood, infoTypes []string, bytesType dlppb.ByteContentItem_BytesType, inputPath, outputPath string) {
	// Convert the info type strings to a list of InfoTypes.
	var i []*dlppb.InfoType
	for _, it := range infoTypes {
		i = append(i, &dlppb.InfoType{Name: it})
	}

	// Convert the info type strings to a list of types to redact in the image.
	var ir []*dlppb.RedactImageRequest_ImageRedactionConfig
	for _, it := range infoTypes {
		ir = append(ir, &dlppb.RedactImageRequest_ImageRedactionConfig{
			Target: &dlppb.RedactImageRequest_ImageRedactionConfig_InfoType{
				InfoType: &dlppb.InfoType{Name: it},
			},
		})
	}

	// Read the input file.
	b, err := ioutil.ReadFile(inputPath)
	if err != nil {
		log.Fatalf("error reading file: %v", err)
	}

	// Create a configured request.
	req := &dlppb.RedactImageRequest{
		Parent: "projects/" + project,
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes:     i,
			MinLikelihood: minLikelihood,
		},
		// The item to analyze.
		ByteItem: &dlppb.ByteContentItem{
			Type: bytesType,
			Data: b,
		},
		ImageRedactionConfigs: ir,
	}
	// Send the request.
	resp, err := client.RedactImage(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}
	// Write the output file.
	if err := ioutil.WriteFile(outputPath, resp.GetRedactedImage(), 0644); err != nil {
		log.Fatal(err)
	}
	fmt.Fprintf(w, "Wrote output to %s", outputPath)
}

PHP

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\RedactImageRequest_ImageRedactionConfig;
use Google\Cloud\Dlp\V2\Likelihood;
use Google\Cloud\Dlp\V2\ByteContentItem;

/**
 * Redact sensitive data from an image.
 *
 * @param string $callingProjectId The GCP Project ID to run the API call under
 * @param string $imagePath The local filepath of the image to inspect
 * @param string $outputPath The local filepath to save the resulting image to
 */
function redact_image(
    $callingProjectId,
    $imagePath,
    $outputPath
) {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // The infoTypes of information to match
    $phoneNumberInfoType = (new InfoType())
        ->setName('PHONE_NUMBER');
    $infoTypes = [$phoneNumberInfoType];

    // The minimum likelihood required before returning a match
    $minLikelihood = likelihood::LIKELIHOOD_UNSPECIFIED;

    // Whether to include the matching string in the response
    $includeQuote = true;

    // Create the configuration object
    $inspectConfig = (new InspectConfig())
        ->setMinLikelihood($minLikelihood)
        ->setInfoTypes($infoTypes);

    // Read image file into a buffer
    $imageRef = fopen($imagePath, 'rb');
    $imageBytes = fread($imageRef, filesize($imagePath));
    fclose($imageRef);

    // Get the image's content type
    $typeConstant = (int) array_search(
        mime_content_type($imagePath),
        [false, 'image/jpeg', 'image/bmp', 'image/png', 'image/svg']
    );

    // Create the byte-storing object
    $byteContent = (new ByteContentItem())
        ->setType($typeConstant)
        ->setData($imageBytes);

    // Create the image redaction config objects
    $imageRedactionConfigs = [];
    foreach ($infoTypes as $infoType) {
        $config = (new RedactImageRequest_ImageRedactionConfig())
            ->setInfoType($infoType);
        $imageRedactionConfigs[] = $config;
    }

    $parent = $dlp->projectName($callingProjectId);

    // Run request
    $response = $dlp->redactImage($parent, [
        'inspectConfig' => $inspectConfig,
        'byteItem' => $byteContent,
        'imageRedactionConfigs' => $imageRedactionConfigs
    ]);

    // Save result to file
    file_put_contents($outputPath, $response->getRedactedImage());

    // Print completion message
    print('Redacted image saved to ' . $outputPath . PHP_EOL);
}

C#

        public static object RedactFromImage(string projectId, string originalImagePath, string redactedImagePath)
        {
            var request = new RedactImageRequest
            {
                ParentAsProjectName = new Google.Cloud.Dlp.V2.ProjectName(projectId),
                InspectConfig = new InspectConfig
                {
                    MinLikelihood = Likelihood.Likely,
                    Limits = new InspectConfig.Types.FindingLimits() { MaxFindingsPerItem = 5 },
                    IncludeQuote = true,
                    InfoTypes =
                    {
                        new InfoType { Name = "PHONE_NUMBER" },
                        new InfoType { Name = "EMAIL_ADDRESS" }
                    }
                },
                ByteItem = new ByteContentItem
                {
                    Type = ByteContentItem.Types.BytesType.ImagePng,
                    Data = ByteString.FromStream(new FileStream(originalImagePath, FileMode.Open))
                },
            };

            DlpServiceClient client = DlpServiceClient.Create();
            var response = client.RedactImage(request);

            Console.WriteLine($"Extracted text: {response.ExtractedText}");

            // Writes redacted image into file
            response.RedactedImage.WriteTo(new FileStream(redactedImagePath, FileMode.Create, FileAccess.Write));

            return 0;
        }
    }
}

Was this page helpful? Let us know how we did:

Send feedback about...

Data Loss Prevention API