Redacting sensitive data from text

Sensitive Data Protection can redact or obfuscate sensitive data from a string of text. You can feed textual information to the API using JSON over HTTP, or use one of the client libraries to do so using several popular programming languages.

The projects.content.deidentify API takes the following as arguments:

  • A string of text.
  • Placeholder text that will replace any sensitive data detected. In this example, the data is replaced with its corresponding infoType.
  • A list of one or more infoTypes that you want to redact.

Sensitive Data Protection returns the string with any sensitive data replaced by your chosen placeholder.

Example text redaction

For more information about using JSON with the DLP API, see the JSON quickstart.

C#

To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithReplaceInfotypes
{
    public static DeidentifyContentResponse DeidentifyInfo(
        string projectId,
        string text,
        IEnumerable<InfoType> infoTypes = null)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        // Construct the inspect config by specifying the type of info to be inspected.
        var inspectConfig = new InspectConfig
        {
            InfoTypes =
            {
                infoTypes ?? new InfoType[] { new InfoType { Name = "EMAIL_ADDRESS" } }
            }
        };

        // Construct the replace info types config.
        var replaceInfoTypesConfig = new ReplaceWithInfoTypeConfig();

        // Construct the deidentify config using replace config.
        var deidentifyConfig = new DeidentifyConfig
        {
            InfoTypeTransformations = new InfoTypeTransformations
            {
                Transformations =
                {
                    new InfoTypeTransformations.Types.InfoTypeTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            ReplaceWithInfoTypeConfig = replaceInfoTypesConfig
                        }
                    }
                }
            }
        };

        // Construct the request.
        var request = new DeidentifyContentRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            DeidentifyConfig = deidentifyConfig,
            InspectConfig = inspectConfig,
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Check the de-identified content.
        Console.WriteLine($"De-identified content: {response.Item.Value}");
        return response;
    }
}

Go

To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// deidentifyWithInfotype de-identifies sensitive data by replacing infoType.
func deidentifyWithInfotype(w io.Writer, projectID, item string, infoTypeNames []string) error {
	// projectId := "your-project-id"
	// item := "My email is test@example.com"
	// infoTypeNames := "[]string{"EMAIL_ADDRESS"}"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be de-identified.
	input := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: item,
		},
	}

	// Specify the type of info the inspection will look for.
	// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}

	//  Associate de-identification type with info type.
	transformation := &dlppb.DeidentifyConfig_InfoTypeTransformations{
		InfoTypeTransformations: &dlppb.InfoTypeTransformations{
			Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
				{
					PrimitiveTransformation: &dlppb.PrimitiveTransformation{
						Transformation: &dlppb.PrimitiveTransformation_ReplaceWithInfoTypeConfig{},
					},
				},
			},
		},
	}

	// Construct the de-identification request to be sent by the client.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: transformation,
		},
		Item: input,
	}

	// Send the request.
	resp, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return err
	}

	// Print the results.
	fmt.Fprintf(w, "output : %v", resp.GetItem().GetValue())
	return nil
}

Java

To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.ReplaceWithInfoTypeConfig;
import java.io.IOException;

public class DeIdentifyWithInfoType {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "My email is test@example.com";
    deIdentifyWithInfoType(projectId, textToInspect);
  }

  public static void deIdentifyWithInfoType(String projectId, String textToRedact)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder().setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Specify replacement string to be used for the finding.
      ReplaceWithInfoTypeConfig replaceWithInfoTypeConfig =
          ReplaceWithInfoTypeConfig.newBuilder().build();
      // Define type of deidentification as replacement with info type.
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setReplaceWithInfoTypeConfig(replaceWithInfoTypeConfig)
              .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation =
          InfoTypeTransformation.newBuilder()
              .addInfoTypes(infoType)
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig =
          DeidentifyConfig.newBuilder()
              .setInfoTypeTransformations(
                  InfoTypeTransformations.newBuilder().addTransformations(transformation))
              .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    }
  }
}

Node.js

To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const projectId = 'my-project';

// The string to deidentify
// const string = 'My email is test@example.com';

// The string to replace sensitive information with
// const infoTypes = [{name: 'EMAIL_ADDRESS'}];

async function deIdentifyWithInfoTypeReplace() {
  // Define type of deidentification as replacement with info type.
  const primitiveTransformation = {
    replaceWithInfoTypeConfig: {},
  };

  // Associate deidentification type with info type.
  const deidentifyConfig = {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: primitiveTransformation,
        },
      ],
    },
  };

  // Specify inspect confiugration using infotypes.
  const inspectConfig = {
    infoTypes: infoTypes,
  };

  // Specify the content to be inspected.
  const item = {
    value: string,
  };

  // Combine configurations into a request for the service.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    item,
    deidentifyConfig,
    inspectConfig,
  };

  // Send the request and receive response from the service.
  const [response] = await dlp.deidentifyContent(request);
  // Print the results
  console.log(`Text after replacement: ${response.item.value}`);
}

deIdentifyWithInfoTypeReplace();

PHP

To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\ReplaceWithInfoTypeConfig;

/**
 * De-identify sensitive data by replacing with infoType.
 * Uses the Data Loss Prevention API to deidentify sensitive data in a string by replacing it with
 * the info type.
 *
 * @param string $callingProjectId  The Google Cloud project id to use as a parent resource.
 * @param string $string            The string to deidentify (will be treated as text).
 */

function deidentify_replace_infotype(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $string
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$callingProjectId/locations/global";

    // Specify what content you want the service to de-identify.
    $content = (new ContentItem())
        ->setValue($string);

    // The infoTypes of information to mask.
    $phoneNumberinfoType = (new InfoType())
        ->setName('PHONE_NUMBER');
    $personNameinfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $infoTypes = [$phoneNumberinfoType, $personNameinfoType];

    // Create the configuration object.
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes($infoTypes);

    // Create the information transform configuration objects.
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setReplaceWithInfoTypeConfig(new ReplaceWithInfoTypeConfig());

    $infoTypeTransformation = (new InfoTypeTransformation())
        ->setPrimitiveTransformation($primitiveTransformation);

    $infoTypeTransformations = (new InfoTypeTransformations())
        ->setTransformations([$infoTypeTransformation]);

    // Create the deidentification configuration object.
    $deidentifyConfig = (new DeidentifyConfig())
        ->setInfoTypeTransformations($infoTypeTransformations);

    // Run request.
    $response = $dlp->deidentifyContent([
        'parent' => $parent,
        'deidentifyConfig' => $deidentifyConfig,
        'item' => $content,
        'inspectConfig' => $inspectConfig
    ]);

    // Print the results.
    printf('Text after replace with infotype config: %s', $response->getItem()->getValue());
}

Python

To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.

To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

from typing import List

import google.cloud.dlp


def deidentify_with_replace_infotype(
    project: str, item: str, info_types: List[str]
) -> None:
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by replacing it with the info type.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        item: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Construct inspect configuration dictionary
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {"primitive_transformation": {"replace_with_info_type_config": {}}}
            ]
        }
    }

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": {"value": item},
        }
    )

    # Print out the results.
    print(response.item.value)

REST

JSON input:

{
  "item": {
     "value":"My email is test@example.com",
   },
   "deidentifyConfig": {
     "infoTypeTransformations":{
          "transformations": [
            {
              "primitiveTransformation": {
                "replaceWithInfoTypeConfig": {}
              }
            }
          ]
        }
    },
    "inspectConfig": {
      "infoTypes": {
        "name": "EMAIL_ADDRESS"
      }
    }
}

URL:

https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify

Sensitive Data Protection returns the following after receiving the request:

JSON output:

{
  "item":{
    "value":"My email is [EMAIL_ADDRESS]"
  },
  "overview":{
    "transformedBytes":"16",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "replaceWithInfoTypeConfig":{

          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"16"
      }
    ]
  }
}

You can try this out yourself using the API Explorer embedded here.

Next steps

Redaction is one form of de-identification. To learn more about how to de-identify content, see De-identifying sensitive data in text content.