隐去文本中的敏感数据

Sensitive Data Protection 可以对文本字符串中的敏感数据进行隐去或模糊处理。您可以使用 JSON 通过 HTTP 向 API 提供文字信息,也可以采用多种常用编程语言使用某一客户端库提供文字信息。

projects.content.deidentify 将以下各项作为参数:

  • 文本字符串。
  • 用于替换检测到的敏感数据的占位符文本。在此示例中,数据被替换为对应的 infoType。
  • 要隐去的一个或多个 infoType 的列表

敏感数据保护功能会返回一个字符串,其中所有敏感数据都已替换为您选择的占位符。

文本隐去示例

如需详细了解如何将 JSON 与 DLP API 结合使用,请参阅 JSON 快速入门

C#

如需了解如何安装和使用敏感数据保护客户端库,请参阅 敏感数据保护客户端库

如需向 Sensitive Data Protection 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithReplaceInfotypes
{
    public static DeidentifyContentResponse DeidentifyInfo(
        string projectId,
        string text,
        IEnumerable<InfoType> infoTypes = null)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        // Construct the inspect config by specifying the type of info to be inspected.
        var inspectConfig = new InspectConfig
        {
            InfoTypes =
            {
                infoTypes ?? new InfoType[] { new InfoType { Name = "EMAIL_ADDRESS" } }
            }
        };

        // Construct the replace info types config.
        var replaceInfoTypesConfig = new ReplaceWithInfoTypeConfig();

        // Construct the deidentify config using replace config.
        var deidentifyConfig = new DeidentifyConfig
        {
            InfoTypeTransformations = new InfoTypeTransformations
            {
                Transformations =
                {
                    new InfoTypeTransformations.Types.InfoTypeTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            ReplaceWithInfoTypeConfig = replaceInfoTypesConfig
                        }
                    }
                }
            }
        };

        // Construct the request.
        var request = new DeidentifyContentRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            DeidentifyConfig = deidentifyConfig,
            InspectConfig = inspectConfig,
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Check the de-identified content.
        Console.WriteLine($"De-identified content: {response.Item.Value}");
        return response;
    }
}

Go

如需了解如何安装和使用用于 Sensitive Data Protection 的客户端库,请参阅 Sensitive Data Protection 客户端库

如需向 Sensitive Data Protection 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// deidentifyWithInfotype de-identifies sensitive data by replacing infoType.
func deidentifyWithInfotype(w io.Writer, projectID, item string, infoTypeNames []string) error {
	// projectId := "your-project-id"
	// item := "My email is test@example.com"
	// infoTypeNames := "[]string{"EMAIL_ADDRESS"}"

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return err
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be de-identified.
	input := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: item,
		},
	}

	// Specify the type of info the inspection will look for.
	// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}

	//  Associate de-identification type with info type.
	transformation := &dlppb.DeidentifyConfig_InfoTypeTransformations{
		InfoTypeTransformations: &dlppb.InfoTypeTransformations{
			Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
				{
					PrimitiveTransformation: &dlppb.PrimitiveTransformation{
						Transformation: &dlppb.PrimitiveTransformation_ReplaceWithInfoTypeConfig{},
					},
				},
			},
		},
	}

	// Construct the de-identification request to be sent by the client.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: transformation,
		},
		Item: input,
	}

	// Send the request.
	resp, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return err
	}

	// Print the results.
	fmt.Fprintf(w, "output : %v", resp.GetItem().GetValue())
	return nil
}

Java

如需了解如何安装和使用敏感数据保护客户端库,请参阅 敏感数据保护客户端库

如需向 Sensitive Data Protection 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.ReplaceWithInfoTypeConfig;
import java.io.IOException;

public class DeIdentifyWithInfoType {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect = "My email is test@example.com";
    deIdentifyWithInfoType(projectId, textToInspect);
  }

  public static void deIdentifyWithInfoType(String projectId, String textToRedact)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder().setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Specify replacement string to be used for the finding.
      ReplaceWithInfoTypeConfig replaceWithInfoTypeConfig =
          ReplaceWithInfoTypeConfig.newBuilder().build();
      // Define type of deidentification as replacement with info type.
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setReplaceWithInfoTypeConfig(replaceWithInfoTypeConfig)
              .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation =
          InfoTypeTransformation.newBuilder()
              .addInfoTypes(infoType)
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig =
          DeidentifyConfig.newBuilder()
              .setInfoTypeTransformations(
                  InfoTypeTransformations.newBuilder().addTransformations(transformation))
              .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    }
  }
}

Node.js

如需了解如何安装和使用用于 Sensitive Data Protection 的客户端库,请参阅 Sensitive Data Protection 客户端库

如需向 Sensitive Data Protection 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const projectId = 'my-project';

// The string to deidentify
// const string = 'My email is test@example.com';

// The string to replace sensitive information with
// const infoTypes = [{name: 'EMAIL_ADDRESS'}];

async function deIdentifyWithInfoTypeReplace() {
  // Define type of deidentification as replacement with info type.
  const primitiveTransformation = {
    replaceWithInfoTypeConfig: {},
  };

  // Associate deidentification type with info type.
  const deidentifyConfig = {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: primitiveTransformation,
        },
      ],
    },
  };

  // Specify inspect confiugration using infotypes.
  const inspectConfig = {
    infoTypes: infoTypes,
  };

  // Specify the content to be inspected.
  const item = {
    value: string,
  };

  // Combine configurations into a request for the service.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    item,
    deidentifyConfig,
    inspectConfig,
  };

  // Send the request and receive response from the service.
  const [response] = await dlp.deidentifyContent(request);
  // Print the results
  console.log(`Text after replacement: ${response.item.value}`);
}

deIdentifyWithInfoTypeReplace();

PHP

如需了解如何安装和使用敏感数据保护客户端库,请参阅 敏感数据保护客户端库

如需向 Sensitive Data Protection 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\ReplaceWithInfoTypeConfig;

/**
 * De-identify sensitive data by replacing with infoType.
 * Uses the Data Loss Prevention API to deidentify sensitive data in a string by replacing it with
 * the info type.
 *
 * @param string $callingProjectId  The Google Cloud project id to use as a parent resource.
 * @param string $string            The string to deidentify (will be treated as text).
 */

function deidentify_replace_infotype(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $string
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    $parent = "projects/$callingProjectId/locations/global";

    // Specify what content you want the service to de-identify.
    $content = (new ContentItem())
        ->setValue($string);

    // The infoTypes of information to mask.
    $phoneNumberinfoType = (new InfoType())
        ->setName('PHONE_NUMBER');
    $personNameinfoType = (new InfoType())
        ->setName('PERSON_NAME');
    $infoTypes = [$phoneNumberinfoType, $personNameinfoType];

    // Create the configuration object.
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes($infoTypes);

    // Create the information transform configuration objects.
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setReplaceWithInfoTypeConfig(new ReplaceWithInfoTypeConfig());

    $infoTypeTransformation = (new InfoTypeTransformation())
        ->setPrimitiveTransformation($primitiveTransformation);

    $infoTypeTransformations = (new InfoTypeTransformations())
        ->setTransformations([$infoTypeTransformation]);

    // Create the deidentification configuration object.
    $deidentifyConfig = (new DeidentifyConfig())
        ->setInfoTypeTransformations($infoTypeTransformations);

    // Run request.
    $response = $dlp->deidentifyContent([
        'parent' => $parent,
        'deidentifyConfig' => $deidentifyConfig,
        'item' => $content,
        'inspectConfig' => $inspectConfig
    ]);

    // Print the results.
    printf('Text after replace with infotype config: %s', $response->getItem()->getValue());
}

Python

如需了解如何安装和使用用于 Sensitive Data Protection 的客户端库,请参阅 Sensitive Data Protection 客户端库

如需向 Sensitive Data Protection 进行身份验证,请设置应用默认凭据。 如需了解详情,请参阅为本地开发环境设置身份验证

from typing import List

import google.cloud.dlp


def deidentify_with_replace_infotype(
    project: str, item: str, info_types: List[str]
) -> None:
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by replacing it with the info type.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        item: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Construct inspect configuration dictionary
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {"primitive_transformation": {"replace_with_info_type_config": {}}}
            ]
        }
    }

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": {"value": item},
        }
    )

    # Print out the results.
    print(response.item.value)

REST

JSON 输入:

{
  "item": {
     "value":"My email is test@example.com",
   },
   "deidentifyConfig": {
     "infoTypeTransformations":{
          "transformations": [
            {
              "primitiveTransformation": {
                "replaceWithInfoTypeConfig": {}
              }
            }
          ]
        }
    },
    "inspectConfig": {
      "infoTypes": {
        "name": "EMAIL_ADDRESS"
      }
    }
}

网址

https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify

收到敏感数据保护后,敏感数据保护会返回以下内容 请求:

JSON 输出:

{
  "item":{
    "value":"My email is [EMAIL_ADDRESS]"
  },
  "overview":{
    "transformedBytes":"16",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "replaceWithInfoTypeConfig":{

          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"16"
      }
    ]
  }
}

您可以使用此处嵌入的 API Explorer 来亲自尝试此操作。

后续步骤

遮盖是去标识化的一种形式。如需详细了解如何对内容进行去标识化,请参阅对文本内容中的敏感数据进行去标识化