对敏感数据进行去标识化

Cloud Data Loss Prevention (DLP) 可对文本内容(包括表等容器结构中存储的文本)中的敏感数据进行去标识化。去标识化是从数据中移除标识信息的过程。该 API 会检测敏感数据(例如个人身份信息 (PII)),然后通过去标识化转换进行遮盖、删除或以其它方式模糊数据。例如,去标识化技术可能包括以下任何一种:

  • 使用星号 (*) 或井号 (#) 等符号部分或完全替换字符,从而遮盖敏感数据。
  • 使用令牌(即代理)字符串替换敏感数据的每个实例。
  • 使用随机生成或预先确定的密钥加密并替换敏感数据。

使用 CryptoReplaceFfxFpeConfigCryptoDeterministicConfig infoType 转换对数据进行去标识化后,只要您拥有最初对数据进行去标识化所用的 CryptoKey,就可以重新标识数据。

您可使用 JSON 来通过 HTTPS 向 API 提供信息,也可采用 DLP 客户端库通过 CLI 和多种编程语言提供信息。如需设置 CLI,请参阅快速入门。如需详细了解如何以 JSON 格式提交信息,请参阅 JSON 快速入门

API 概览

如需对敏感数据进行去标识化,请使用 Cloud DLP 的 content.deidentify 方法。

去标识化 API 调用分为 3 个部分:

  • 需要检查的数据:API 要检查的字符串或表结构(ContentItem 对象)。
  • 需要在数据中检查的内容:检测配置信息 (InspectConfig),例如要查找的数据类型(或 infoType)、是否过滤超过特定可能性阈值的结果、返回的结果是否不得超过一定数量等等。未在 InspectConfig 参数中指定任何 infoType 相当于指定所有内置 infoType。不建议此操作,因为这可能导致性能下降且费用增加。
  • 检查结果的处置方式:配置信息 (DeidentifyConfig),用于定义如何对敏感数据进行去标识化。此参数将在下一部分详细介绍。

API 会按相同格式返回您提供的相同内容,但会将根据您的标准识别为包含敏感信息的所有文本进行去标识化。

指定检测标准

信息类型(或“infoType”)检测器是 Cloud DLP 查找敏感数据所用的机制。

Cloud DLP 包含多种 infoType 检测器,简列如下:

  • 内置 infoType 检测器 - 内置在 Cloud DLP 中,其中包括适用于某个国家/区域特有的敏感数据类型的检测器以及适用于全球通用数据类型的检测器。
  • 自定义 infoType 检测器 - 您自行创建的检测器。自定义 infoType 检测器有三种:
    • 常规自定义字典检测器 - Cloud DLP 匹配的简单字词列表。当您拥有一个包含多达数万个字词或短语的列表时,请使用常规自定义字典检测器。如果您预计字词列表不会发生显著变化,也建议首选常规自定义字典检测器。
    • 存储的自定义字典检测器 - 由 Cloud DLP 使用 Cloud Storage 或 BigQuery 中存储的大量字词或短语生成。当您拥有一个包含高达数千万个字词或短语的庞大列表时,请使用存储的自定义字典检测器。
    • 正则表达式 (regex) 检测器 - 使 Cloud DLP 能够基于正则表达式模式检测匹配项。

此外,Cloud DLP 还包含检查规则的概念,因此您可使用以下规则优化扫描结果:

  • 排除规则 - 让您可以通过向内置或自定义 infoType 检测器添加规则来减少返回的结果数量。
  • 热词规则 - 让您可以通过向内置或自定义 infoType 检测器添加规则来增加返回的结果数量或更改其可能性值

去标识化转换

设置去标识化配置 (DeidentifyConfig) 时,必须指定一个或多个转换。存在下面两类转换:

  • InfoTypeTransformations:此类转换仅应用于已提交的文本中被识别为特定 infoType 的值。
  • RecordTransformations:此类转换仅应用于已提交的列表文本数据中被识别为特定 infoType 的值,或者整个表格数据列中的值。

InfoType 转换

您可以为每个请求指定一个或多个 infoType 转换。在每个 InfoTypeTransformation 对象中,都要指定以下两项:

请注意,您可根据需要指定 infoType,但如果未在 InspectConfig 参数中指定至少一个 infoType,系统就会将转换应用到未提供转换的所有内置 infoType。不建议此操作,因为这可能导致性能下降且费用增加。

初始转换

无论是仅应用于部分 infoType 还是应用于整个文本字符串,都必须指定至少一个要应用到输入项的初始转换。您有多个转换选项,相关汇总详见下表。点击对象名称可获取更多信息。

可选择使用的转换的完整列表

replaceConfig

如果将 replaceConfig 设置为 ReplaceValueConfig 对象,会将匹配的输入值替换为您指定的值。

例如,假设您已为所有 EMAIL_ADDRESS infoType 将 replaceConfig 设置为“[email-address]”,且已将下列字符串发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia Abernathy, and my email address is [email-address].

多种语言的以下 JSON 示例和代码演示了如何构建 API 请求以及 Cloud DLP API 会返回哪些内容:

协议

如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门

JSON 输入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "replaceConfig":{
              "newValue":{
                "stringValue":"[email-address]"
              }
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

JSON 输出:

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is [email-address]."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "replaceConfig":{
            "newValue":{
              "stringValue":"[email-address]"
            }
          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

Python

def deidentify_with_replace(
    project,
    input_str,
    info_types,
    replacement_str="REPLACEMENT_STR",
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by replacing matched input values with a value you specify.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
        replacement_str: The string to replace all values that match given
            info types.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "replace_config": {
                            "new_value": {
                                "string_value": replacement_str,
                            }
                        }
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print out the results.
    print(response.item.value)

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RedactConfig;
import com.google.privacy.dlp.v2.ReplaceValueConfig;
import com.google.privacy.dlp.v2.Value;

public class DeIdentifyWithReplacement {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect =
        "My name is Alicia Abernathy, and my email address is aabernathy@example.com.";
    deIdentifyWithReplacement(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void deIdentifyWithReplacement(String projectId, String textToRedact) {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder()
          .setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Specify replacement string to be used for the finding.
      ReplaceValueConfig replaceValueConfig = ReplaceValueConfig.newBuilder()
          .setNewValue(Value.newBuilder().setStringValue("[email-address]").build())
          .build();
      // Define type of deidentification as replacement.
      PrimitiveTransformation primitiveTransformation = PrimitiveTransformation.newBuilder()
          .setReplaceConfig(replaceValueConfig)
          .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation = InfoTypeTransformation.newBuilder()
          .addInfoTypes(infoType)
          .setPrimitiveTransformation(primitiveTransformation)
          .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig = DeidentifyConfig.newBuilder()
          .setInfoTypeTransformations(InfoTypeTransformations.newBuilder()
              .addTransformations(transformation))
          .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    } catch (Exception e) {
      System.out.println("Error during inspectString: \n" + e.toString());
    }
  }
}
redactConfig

如果指定 redactConfig,将会彻底移除给定的值来达到隐去该值的目的。redactConfig 消息不带参数,指定它就会启用该转换。

例如,假设您已为所有 EMAIL_ADDRESS infoType 指定 redactConfig,且已将下列字符串发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia Abernathy, and my email address is .

以下示例展示了如何创建 API 请求以及 DLP API 会返回哪些内容:

协议

JSON 输入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "redactConfig":{

            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

JSON 输出:

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is ."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "redactConfig":{

          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RedactConfig;

public class DeIdentifyWithRedaction {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect =
        "My name is Alicia Abernathy, and my email address is aabernathy@example.com.";
    deIdentifyWithRedaction(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void deIdentifyWithRedaction(String projectId, String textToRedact) {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder()
          .setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Define type of deidentification.
      PrimitiveTransformation primitiveTransformation = PrimitiveTransformation.newBuilder()
          .setRedactConfig(RedactConfig.getDefaultInstance())
          .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation = InfoTypeTransformation.newBuilder()
          .addInfoTypes(infoType)
          .setPrimitiveTransformation(primitiveTransformation)
          .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig = DeidentifyConfig.newBuilder()
          .setInfoTypeTransformations(InfoTypeTransformations.newBuilder()
              .addTransformations(transformation))
          .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    } catch (Exception e) {
      System.out.println("Error during inspectString: \n" + e.toString());
    }
  }
}

Python

def deidentify_with_redact(
    project,
    input_str,
    info_types,
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by redacting matched input values.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "redact_config": {}
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print out the results.
    print(response.item.value)

characterMaskConfig

如果将 characterMaskConfig 设置为 CharacterMaskConfig 对象,就可以通过把给定数量的字符替换为固定字符来部分遮盖字符串。可以从字符串的开头或结尾开始遮盖。此转换也适用于数字类型,例如长整型。

CharacterMaskConfig 对象自带多个参数:

  • maskingCharacter:用于遮盖敏感值中每个字符的字符。例如,您可以指定用星号 (*) 或井号 (#) 来遮盖信用卡号等信息中的一连串数字。
  • numberToMask:要遮盖的字符数。如果未设置此值,将遮盖所有匹配的字符。
  • reverseOrder:是否按反向顺序遮盖字符。如果将 reverseOrder 设置为 true,将从值的末尾朝着值的开头遮盖匹配值中的字符。如果设置为 false,则从值的开头开始遮盖。
  • charactersToIgnore[]:遮盖值时要跳过的一个或多个字符。例如,如果在此处指定连字符,就会在遮盖电话号码时保留连字符。您还可以指定一组要在遮盖时忽略的常见字符 (CharsToIgnore)。

例如,假设您已设置 characterMaskConfig 来使用“#”遮盖 EMAIL_ADDRESS infotype(“.”和“@”字符除外)。如果将以下字符串发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia Abernathy, and my email address is ##########@#######.###.

下面的示例演示了如何通过 Cloud DLP API 来使用遮盖技术对敏感数据进行去标识化。

协议

下面的 JSON 示例演示了如何创建 API 请求以及 DLP API 会返回哪些内容:

JSON 输入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "characterMaskConfig":{
              "maskingCharacter":"#",
              "reverseOrder":false,
              "charactersToIgnore":[
                {
                  "charactersToSkip":".@"
                }
              ]
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

JSON 输出:

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is ##########@#######.###."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "characterMaskConfig":{
            "maskingCharacter":"#",
            "charactersToIgnore":[
              {
                "charactersToSkip":".@"
              }
            ]
          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CharacterMaskConfig;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.ReplaceWithInfoTypeConfig;
import java.io.IOException;
import java.util.Arrays;

public class DeIdentifyWithMasking {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "My SSN is 372819127";
    deIdentifyWithMasking(projectId, textToDeIdentify);
  }

  public static void deIdentifyWithMasking(String projectId, String textToDeIdentify)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify what content you want the service to DeIdentify
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER").build();
      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(Arrays.asList(infoType)).build();

      // Specify how the info from the inspection should be masked.
      CharacterMaskConfig characterMaskConfig =
          CharacterMaskConfig.newBuilder()
              .setMaskingCharacter("X") // Character to replace the found info with
              .setNumberToMask(5) // How many characters should be masked
              .build();
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setReplaceWithInfoTypeConfig(ReplaceWithInfoTypeConfig.getDefaultInstance())
              .build();
      InfoTypeTransformation infoTypeTransformation =
          InfoTypeTransformation.newBuilder()
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      InfoTypeTransformations transformations =
          InfoTypeTransformations.newBuilder().addTransformations(infoTypeTransformation).build();

      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setInfoTypeTransformations(transformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(contentItem)
              .setInspectConfig(inspectConfig)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
      System.out.println("Text after masking: " + response.getItem().getValue());
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The string to deidentify
// const string = 'My SSN is 372819127';

// (Optional) The maximum number of sensitive characters to mask in a match
// If omitted from the request or set to 0, the API will mask any matching characters
// const numberToMask = 5;

// (Optional) The character to mask matching sensitive data with
// const maskingCharacter = 'x';

// Construct deidentification request
const item = {value: string};
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: {
            characterMaskConfig: {
              maskingCharacter: maskingCharacter,
              numberToMask: numberToMask,
            },
          },
        },
      ],
    },
  },
  item: item,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
} catch (err) {
  console.log(`Error in deidentifyWithMask: ${err.message || err}`);
}

Python

def deidentify_with_mask(
    project, input_str, info_types, masking_character=None, number_to_mask=0
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by masking it with a character.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        masking_character: The character to mask matching sensitive data with.
        number_to_mask: The maximum number of sensitive characters to mask in
            a match. If omitted or set to zero, the API will default to no
            maximum.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "character_mask_config": {
                            "masking_character": masking_character,
                            "number_to_mask": number_to_mask,
                        }
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print out the results.
    print(response.item.value)

Go

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// mask deidentifies the input by masking all provided info types with maskingCharacter
// and prints the result to w.
func mask(w io.Writer, projectID, input string, infoTypeNames []string, maskingCharacter string, numberToMask int32) error {
	// projectID := "my-project-id"
	// input := "My SSN is 111222333"
	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"}
	// maskingCharacter := "+"
	// numberToMask := 6
	// Will print "My SSN is ++++++333"

	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %v", err)
	}
	// Convert the info type strings to a list of InfoTypes.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_CharacterMaskConfig{
									CharacterMaskConfig: &dlppb.CharacterMaskConfig{
										MaskingCharacter: maskingCharacter,
										NumberToMask:     numberToMask,
									},
								},
							},
						},
					},
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %v", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

/**
 * Deidentify sensitive data in a string by masking it with a character.
 */
use Google\Cloud\Dlp\V2\CharacterMaskConfig;
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\ContentItem;

/** Uncomment and populate these variables in your code */
// $callingProjectId = 'The GCP Project ID to run the API call under';
// $string = 'The string to deidentify';
// $numberToMask = 0; // (Optional) The maximum number of sensitive characters to mask in a match
// $maskingCharacter = 'x'; // (Optional) The character to mask matching sensitive data with

// Instantiate a client.
$dlp = new DlpServiceClient();

// The infoTypes of information to mask
$ssnInfoType = (new InfoType())
    ->setName('US_SOCIAL_SECURITY_NUMBER');
$infoTypes = [$ssnInfoType];

// Create the masking configuration object
$maskConfig = (new CharacterMaskConfig())
    ->setMaskingCharacter($maskingCharacter)
    ->setNumberToMask($numberToMask);

// Create the information transform configuration objects
$primitiveTransformation = (new PrimitiveTransformation())
    ->setCharacterMaskConfig($maskConfig);

$infoTypeTransformation = (new InfoTypeTransformation())
    ->setPrimitiveTransformation($primitiveTransformation)
    ->setInfoTypes($infoTypes);

$infoTypeTransformations = (new InfoTypeTransformations())
    ->setTransformations([$infoTypeTransformation]);

// Create the deidentification configuration object
$deidentifyConfig = (new DeidentifyConfig())
    ->setInfoTypeTransformations($infoTypeTransformations);

$item = (new ContentItem())
    ->setValue($string);

$parent = "projects/$callingProjectId/locations/global";

// Run request
$response = $dlp->deidentifyContent($parent, [
    'deidentifyConfig' => $deidentifyConfig,
    'item' => $item
]);

// Print the results
$deidentifiedValue = $response->getItem()->getValue();
print($deidentifiedValue);

C#


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithMasking
{
    public static DeidentifyContentResponse Deidentify(string projectId, string text)
    {
        // Instantiate a client.
        var dlp = DlpServiceClient.Create();

        // Construct a request.
        var transformation = new InfoTypeTransformations.Types.InfoTypeTransformation
        {
            PrimitiveTransformation = new PrimitiveTransformation
            {
                CharacterMaskConfig = new CharacterMaskConfig
                {
                    MaskingCharacter = "*",
                    NumberToMask = 5,
                    ReverseOrder = false,
                }
            }
        };
        var request = new DeidentifyContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            InspectConfig = new InspectConfig
            {
                InfoTypes =
                {
                    new InfoType { Name = "US_SOCIAL_SECURITY_NUMBER" }
                }
            },
            DeidentifyConfig = new DeidentifyConfig
            {
                InfoTypeTransformations = new InfoTypeTransformations
                {
                    Transformations = { transformation }
                }
            },
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Inspect the results.
        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

cryptoHashConfig

如果把 cryptoHashConfig 设置为 CryptoHashConfig 对象,将使用加密哈希技术生成代理值来对输入值执行假名化

此方法会使用加密的“摘要”或哈希值替换输入值。 摘要是通过获取输入值的 SHA-256 哈希计算得出的。 用于获取哈希的加密密钥为 CryptoKey 对象,且大小必须为 32 或 64 字节。

此方法会输出以 base64 编码表示的哈希输出值。目前,只能对字符串和整数值进行哈希处理。

例如,假设您已为所有 EMAIL_ADDRESS infoType 指定 cryptoHashConfig,且 CryptoKey 对象由随机生成的密钥(一种 TransientCryptoKey)组成。然后,您将下列字符串发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串以加密方式生成,显示如下:

My name is Alicia Abernathy, and my email address is 41D1567F7F99F1DC2A5FAB886DEE5BEE.

当然,十六进制字符串将以加密方式生成且与此处所示不同。

dateShiftConfig

如果把 dateShiftConfig 设置为 DateShiftConfig 对象,将通过按随机天数偏移日期来对日期输入值执行日期偏移

日期偏移技术会随机偏移一组日期,但保留一段时间内的顺序和持续时间。通常在个人或实体的上下文中完成偏移日期。也就是说,您要按同一偏移差分偏移特定个人的所有日期,但对每个人使用不同的偏移差分。

如需详细了解日期偏移,请参阅日期偏移概念主题

下面是多种语言的示例代码,演示了如何通过 Cloud DLP API 使用日期偏移对日期进行去标识化。

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.common.base.Splitter;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DateShiftConfig;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.FieldTransformation;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RecordTransformations;
import com.google.privacy.dlp.v2.Table;
import com.google.privacy.dlp.v2.Value;
import com.google.type.Date;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class DeIdentifyWithDateShift {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    Path inputCsvFile = Paths.get("path/to/your/input/file.csv");
    Path outputCsvFile = Paths.get("path/to/your/output/file.csv");
    deIdentifyWithDateShift(projectId, inputCsvFile, outputCsvFile);
  }

  public static void deIdentifyWithDateShift(
      String projectId, Path inputCsvFile, Path outputCsvFile) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Read the contents of the CSV file into a Table
      List<FieldId> headers;
      List<Table.Row> rows;
      try (BufferedReader input = Files.newBufferedReader(inputCsvFile)) {
        // Parse and convert the first line into header names
        headers =
            Arrays.stream(input.readLine().split(","))
                .map(header -> FieldId.newBuilder().setName(header).build())
                .collect(Collectors.toList());
        // Parse the remainder of the file as Table.Rows
        rows =
            input.lines().map(DeIdentifyWithDateShift::parseLineAsRow).collect(Collectors.toList());
      }
      Table table = Table.newBuilder().addAllHeaders(headers).addAllRows(rows).build();
      ContentItem item = ContentItem.newBuilder().setTable(table).build();

      // Set the maximum days to shift dates backwards (lower bound) or forward (upper bound)
      DateShiftConfig dateShiftConfig =
          DateShiftConfig.newBuilder().setLowerBoundDays(5).setUpperBoundDays(5).build();
      PrimitiveTransformation transformation =
          PrimitiveTransformation.newBuilder().setDateShiftConfig(dateShiftConfig).build();
      // Specify which fields the DateShift should apply too
      List<FieldId> dateFields = Arrays.asList(headers.get(1), headers.get(3));
      FieldTransformation fieldTransformation =
          FieldTransformation.newBuilder()
              .addAllFields(dateFields)
              .setPrimitiveTransformation(transformation)
              .build();
      RecordTransformations recordTransformations =
          RecordTransformations.newBuilder().addFieldTransformations(fieldTransformation).build();
      // Specify the config for the de-identify request
      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setRecordTransformations(recordTransformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Write the results to the target CSV file
      try (BufferedWriter writer = Files.newBufferedWriter(outputCsvFile)) {
        Table outTable = response.getItem().getTable();
        String headerOut =
            outTable.getHeadersList().stream()
                .map(FieldId::getName)
                .collect(Collectors.joining(","));
        writer.write(headerOut + "\n");

        List<String> rowOutput =
            outTable.getRowsList().stream()
                .map(row -> joinRow(row.getValuesList()))
                .collect(Collectors.toList());
        for (String line : rowOutput) {
          writer.write(line + "\n");
        }
        System.out.println("Content written to file: " + outputCsvFile.toString());
      }
    }
  }

  // Convert the string from the csv file into com.google.type.Date
  public static Date parseAsDate(String s) {
    LocalDate date = LocalDate.parse(s, DateTimeFormatter.ofPattern("MM/dd/yyyy"));
    return Date.newBuilder()
        .setDay(date.getDayOfMonth())
        .setMonth(date.getMonthValue())
        .setYear(date.getYear())
        .build();
  }

  // Each row is in the format: Name,BirthDate,CreditCardNumber,RegisterDate
  public static Table.Row parseLineAsRow(String line) {
    List<String> values = Splitter.on(",").splitToList(line);
    Value name = Value.newBuilder().setStringValue(values.get(0)).build();
    Value birthDate = Value.newBuilder().setDateValue(parseAsDate(values.get(1))).build();
    Value creditCardNumber = Value.newBuilder().setStringValue(values.get(2)).build();
    Value registerDate = Value.newBuilder().setDateValue(parseAsDate(values.get(3))).build();
    return Table.Row.newBuilder()
        .addValues(name)
        .addValues(birthDate)
        .addValues(creditCardNumber)
        .addValues(registerDate)
        .build();
  }

  public static String formatDate(Date d) {
    return String.format("%s/%s/%s", d.getMonth(), d.getDay(), d.getYear());
  }

  public static String joinRow(List<Value> values) {
    String name = values.get(0).getStringValue();
    String birthDate = formatDate(values.get(1).getDateValue());
    String creditCardNumber = values.get(2).getStringValue();
    String registerDate = formatDate(values.get(3).getDateValue());
    return String.join(",", name, birthDate, creditCardNumber, registerDate);
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// Import other required libraries
const fs = require('fs');

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The path to the CSV file to deidentify
// The first row of the file must specify column names, and all other rows
// must contain valid values
// const inputCsvFile = '/path/to/input/file.csv';

// The path to save the date-shifted CSV file to
// const outputCsvFile = '/path/to/output/file.csv';

// The list of (date) fields in the CSV file to date shift
// const dateFields = [{ name: 'birth_date'}, { name: 'register_date' }];

// The maximum number of days to shift a date backward
// const lowerBoundDays = 1;

// The maximum number of days to shift a date forward
// const upperBoundDays = 1;

// (Optional) The column to determine date shift amount based on
// If this is not specified, a random shift amount will be used for every row
// If this is specified, then 'wrappedKey' and 'keyName' must also be set
// const contextFieldId = [{ name: 'user_id' }];

// (Optional) The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// If this is specified, then 'wrappedKey' and 'contextFieldId' must also be set
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// (Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
// This key should be encrypted using the Cloud KMS key specified above
// If this is specified, then 'keyName' and 'contextFieldId' must also be set
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// Helper function for converting CSV rows to Protobuf types
const rowToProto = row => {
  const values = row.split(',');
  const convertedValues = values.map(value => {
    if (Date.parse(value)) {
      const date = new Date(value);
      return {
        dateValue: {
          year: date.getFullYear(),
          month: date.getMonth() + 1,
          day: date.getDate(),
        },
      };
    } else {
      // Convert all non-date values to strings
      return {stringValue: value.toString()};
    }
  });
  return {values: convertedValues};
};

// Read and parse a CSV file
const csvLines = fs
  .readFileSync(inputCsvFile)
  .toString()
  .split('\n')
  .filter(line => line.includes(','));
const csvHeaders = csvLines[0].split(',');
const csvRows = csvLines.slice(1);

// Construct the table object
const tableItem = {
  table: {
    headers: csvHeaders.map(header => {
      return {name: header};
    }),
    rows: csvRows.map(row => rowToProto(row)),
  },
};

// Construct DateShiftConfig
const dateShiftConfig = {
  lowerBoundDays: lowerBoundDays,
  upperBoundDays: upperBoundDays,
};

if (contextFieldId && keyName && wrappedKey) {
  dateShiftConfig.context = {name: contextFieldId};
  dateShiftConfig.cryptoKey = {
    kmsWrapped: {
      wrappedKey: wrappedKey,
      cryptoKeyName: keyName,
    },
  };
} else if (contextFieldId || keyName || wrappedKey) {
  throw new Error(
    'You must set either ALL or NONE of {contextFieldId, keyName, wrappedKey}!'
  );
}

// Construct deidentification request
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    recordTransformations: {
      fieldTransformations: [
        {
          fields: dateFields,
          primitiveTransformation: {
            dateShiftConfig: dateShiftConfig,
          },
        },
      ],
    },
  },
  item: tableItem,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const tableRows = response.item.table.rows;

  // Write results to a CSV file
  tableRows.forEach((row, rowIndex) => {
    const rowValues = row.values.map(
      value =>
        value.stringValue ||
        `${value.dateValue.month}/${value.dateValue.day}/${value.dateValue.year}`
    );
    csvLines[rowIndex + 1] = rowValues.join(',');
  });
  csvLines.push('');
  fs.writeFileSync(outputCsvFile, csvLines.join('\n'));

  // Print status
  console.log(`Successfully saved date-shift output to ${outputCsvFile}`);
} catch (err) {
  console.log(`Error in deidentifyWithDateShift: ${err.message || err}`);
}

Python

def deidentify_with_date_shift(
    project,
    input_csv_file=None,
    output_csv_file=None,
    date_fields=None,
    lower_bound_days=None,
    upper_bound_days=None,
    context_field_id=None,
    wrapped_key=None,
    key_name=None,
):
    """Uses the Data Loss Prevention API to deidentify dates in a CSV file by
        pseudorandomly shifting them.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_csv_file: The path to the CSV file to deidentify. The first row
            of the file must specify column names, and all other rows must
            contain valid values.
        output_csv_file: The path to save the date-shifted CSV file.
        date_fields: The list of (date) fields in the CSV file to date shift.
            Example: ['birth_date', 'register_date']
        lower_bound_days: The maximum number of days to shift a date backward
        upper_bound_days: The maximum number of days to shift a date forward
        context_field_id: (Optional) The column to determine date shift amount
            based on. If this is not specified, a random shift amount will be
            used for every row. If this is specified, then 'wrappedKey' and
            'keyName' must also be set. Example:
            contextFieldId = [{ 'name': 'user_id' }]
        key_name: (Optional) The name of the Cloud KMS key used to encrypt
            ('wrap') the AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: (Optional) The encrypted ('wrapped') AES-256 key to use.
            This key should be encrypted using the Cloud KMS key specified by
            key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Convert date field list to Protobuf type
    def map_fields(field):
        return {"name": field}

    if date_fields:
        date_fields = map(map_fields, date_fields)
    else:
        date_fields = []

    # Read and parse the CSV file
    import csv
    from datetime import datetime

    f = []
    with open(input_csv_file, "r") as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            f.append(row)

    #  Helper function for converting CSV rows to Protobuf types
    def map_headers(header):
        return {"name": header}

    def map_data(value):
        try:
            date = datetime.strptime(value, "%m/%d/%Y")
            return {
                "date_value": {
                    "year": date.year,
                    "month": date.month,
                    "day": date.day,
                }
            }
        except ValueError:
            return {"string_value": value}

    def map_rows(row):
        return {"values": map(map_data, row)}

    # Using the helper functions, convert CSV rows to protobuf-compatible
    # dictionaries.
    csv_headers = map(map_headers, f[0])
    csv_rows = map(map_rows, f[1:])

    # Construct the table dict
    table_item = {"table": {"headers": csv_headers, "rows": csv_rows}}
    # Construct date shift config
    date_shift_config = {
        "lower_bound_days": lower_bound_days,
        "upper_bound_days": upper_bound_days,
    }

    # If using a Cloud KMS key, add it to the date_shift_config.
    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    if context_field_id and key_name and wrapped_key:
        import base64

        date_shift_config["context"] = {"name": context_field_id}
        date_shift_config["crypto_key"] = {
            "kms_wrapped": {
                "wrapped_key": base64.b64decode(wrapped_key),
                "crypto_key_name": key_name,
            }
        }
    elif context_field_id or key_name or wrapped_key:
        raise ValueError(
            """You must set either ALL or NONE of
        [context_field_id, key_name, wrapped_key]!"""
        )

    # Construct Deidentify Config
    deidentify_config = {
        "record_transformations": {
            "field_transformations": [
                {
                    "fields": date_fields,
                    "primitive_transformation": {
                        "date_shift_config": date_shift_config
                    },
                }
            ]
        }
    }

    # Write to CSV helper methods
    def write_header(header):
        return header.name

    def write_data(data):
        return data.string_value or "%s/%s/%s" % (
            data.date_value.month,
            data.date_value.day,
            data.date_value.year,
        )

    # Call the API
    response = dlp.deidentify_content(
        parent, deidentify_config=deidentify_config, item=table_item
    )

    # Write results to CSV file
    with open(output_csv_file, "w") as csvfile:
        write_file = csv.writer(csvfile, delimiter=",")
        write_file.writerow(map(write_header, response.item.table.headers))
        for row in response.item.table.rows:
            write_file.writerow(map(write_data, row.values))
    # Print status
    print("Successfully saved date-shift output to {}".format(output_csv_file))

Go

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// deidentifyDateShift shifts dates found in the input between lowerBoundDays and
// upperBoundDays.
func deidentifyDateShift(w io.Writer, projectID string, lowerBoundDays, upperBoundDays int32, input string) error {
	// projectID := "my-project-id"
	// lowerBoundDays := -1
	// upperBound := -1
	// input := "2016-01-10"
	// Will print "2016-01-09"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %v", err)
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_DateShiftConfig{
									DateShiftConfig: &dlppb.DateShiftConfig{
										LowerBoundDays: lowerBoundDays,
										UpperBoundDays: upperBoundDays,
									},
								},
							},
						},
					},
				},
			},
		},
		// The InspectConfig is used to identify the DATE fields.
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: []*dlppb.InfoType{
				{
					Name: "DATE",
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %v", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

/**
 * Deidentify dates in a CSV file by pseudorandomly shifting them.
 */
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CryptoKey;
use Google\Cloud\Dlp\V2\DateShiftConfig;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\FieldTransformation;
use Google\Cloud\Dlp\V2\KmsWrappedCryptoKey;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\RecordTransformations;
use Google\Cloud\Dlp\V2\Table;
use Google\Cloud\Dlp\V2\Table\Row;
use Google\Cloud\Dlp\V2\Value;
use Google\Type\Date;

/** Uncomment and populate these variables in your code */
// $callingProject = 'The GCP Project ID to run the API call under';
// $inputCsvFile = 'The path to the CSV file to deidentify';
// $outputCsvFile = 'The path to save the date-shifted CSV file to';
// $dateFieldNames = 'The comma-separated list of (date) fields in the CSV file to date shift';
// $lowerBoundDays = 'The maximum number of days to shift a date backward';
// $upperBoundDays = 'The maximum number of days to shift a date forward';
/**
 * If contextFieldName is not specified, a random shift amount will be used for every row.
 * If contextFieldName is specified, then 'wrappedKey' and 'keyName' must also be set
 */
// $contextFieldName = ''; (Optional) The column to determine date shift amount based on
// $keyName = ''; // Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
// $wrappedKey = ''; // (Optional) The name of the Cloud KMS key used to encrypt (wrap) the AES-256 key

// Instantiate a client.
$dlp = new DlpServiceClient();

// Read a CSV file
$csvLines = file($inputCsvFile, FILE_IGNORE_NEW_LINES);
$csvHeaders = explode(',', $csvLines[0]);
$csvRows = array_slice($csvLines, 1);

// Convert CSV file into protobuf objects
$tableHeaders = array_map(function ($csvHeader) {
    return (new FieldId)->setName($csvHeader);
}, $csvHeaders);

$tableRows = array_map(function ($csvRow) {
    $rowValues = array_map(function ($csvValue) {
        if ($csvDate = DateTime::createFromFormat('m/d/Y', $csvValue)) {
            $date = (new Date())
                ->setYear((int) $csvDate->format('Y'))
                ->setMonth((int) $csvDate->format('m'))
                ->setDay((int) $csvDate->format('d'));
            return (new Value())
                ->setDateValue($date);
        } else {
            return (new Value())
                ->setStringValue($csvValue);
        }
    }, explode(',', $csvRow));

    return (new Row())
        ->setValues($rowValues);
}, $csvRows);

// Convert date fields into protobuf objects
$dateFields = array_map(function ($dateFieldName) {
    return (new FieldId())->setName($dateFieldName);
}, explode(',', $dateFieldNames));

// Construct the table object
$table = (new Table())
    ->setHeaders($tableHeaders)
    ->setRows($tableRows);

$item = (new ContentItem())
    ->setTable($table);

// Construct dateShiftConfig
$dateShiftConfig = (new DateShiftConfig())
    ->setLowerBoundDays($lowerBoundDays)
    ->setUpperBoundDays($upperBoundDays);

if ($contextFieldName && $keyName && $wrappedKey) {
    $contextField = (new FieldId())
        ->setName($contextFieldName);

    // Create the wrapped crypto key configuration object
    $kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
        ->setWrappedKey(base64_decode($wrappedKey))
        ->setCryptoKeyName($keyName);

    $cryptoKey = (new CryptoKey())
        ->setKmsWrapped($kmsWrappedCryptoKey);

    $dateShiftConfig
        ->setContext($contextField)
        ->setCryptoKey($cryptoKey);
} elseif ($contextFieldName || $keyName || $wrappedKey) {
    throw new Exception('You must set either ALL or NONE of {$contextFieldName, $keyName, $wrappedKey}!');
}

// Create the information transform configuration objects
$primitiveTransformation = (new PrimitiveTransformation())
    ->setDateShiftConfig($dateShiftConfig);

$fieldTransformation = (new FieldTransformation())
    ->setPrimitiveTransformation($primitiveTransformation)
    ->setFields($dateFields);

$recordTransformations = (new RecordTransformations())
    ->setFieldTransformations([$fieldTransformation]);

// Create the deidentification configuration object
$deidentifyConfig = (new DeidentifyConfig())
    ->setRecordTransformations($recordTransformations);

$parent = "projects/$callingProjectId/locations/global";

// Run request
$response = $dlp->deidentifyContent($parent, [
    'deidentifyConfig' => $deidentifyConfig,
    'item' => $item
]);

// Check for errors
foreach ($response->getOverview()->getTransformationSummaries() as $summary) {
    foreach ($summary->getResults() as $result) {
        if ($details = $result->getDetails()) {
            printf('Error: %s' . PHP_EOL, $details);
            return;
        }
    }
}

// Save the results to a file
$csvRef = fopen($outputCsvFile, 'w');
fputcsv($csvRef, $csvHeaders);
foreach ($response->getItem()->getTable()->getRows() as $tableRow) {
    $values = array_map(function ($tableValue) {
        if ($tableValue->getStringValue()) {
            return $tableValue->getStringValue();
        }
        $protoDate = $tableValue->getDateValue();
        $date = mktime(0, 0, 0, $protoDate->getMonth(), $protoDate->getDay(), $protoDate->getYear());
        return strftime('%D', $date);
    }, iterator_to_array($tableRow->getValues()));
    fputcsv($csvRef, $values);
};
fclose($csvRef);
printf('Deidentified dates written to %s' . PHP_EOL, $outputCsvFile);

C#


using System;
using System.IO;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;

public class DeidentifyWithDateShift
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string inputCsvFilePath,
        int lowerBoundDays,
        int upperBoundDays,
        string dateFields,
        string contextField,
        string keyName,
        string wrappedKey)
    {
        var hasKeyName = !string.IsNullOrEmpty(keyName);
        var hasWrappedKey = !string.IsNullOrEmpty(wrappedKey);
        var hasContext = !string.IsNullOrEmpty(contextField);
        bool allFieldsSet = hasKeyName && hasWrappedKey && hasContext;
        bool noFieldsSet = !hasKeyName && !hasWrappedKey && !hasContext;
        if (!(allFieldsSet || noFieldsSet))
        {
            throw new ArgumentException("Must specify ALL or NONE of: {contextFieldId, keyName, wrappedKey}!");
        }

        var dlp = DlpServiceClient.Create();

        // Read file
        var csvLines = File.ReadAllLines(inputCsvFilePath);
        var csvHeaders = csvLines[0].Split(',');
        var csvRows = csvLines.Skip(1).ToArray();

        // Convert dates to protobuf format, and everything else to a string
        var protoHeaders = csvHeaders.Select(header => new FieldId { Name = header });
        var protoRows = csvRows.Select(csvRow =>
        {
            var rowValues = csvRow.Split(',');
            var protoValues = rowValues.Select(rowValue =>
               System.DateTime.TryParse(rowValue, out var parsedDate)
               ? new Value { DateValue = Google.Type.Date.FromDateTime(parsedDate) }
               : new Value { StringValue = rowValue });

            var rowObject = new Table.Types.Row();
            rowObject.Values.Add(protoValues);
            return rowObject;
        });

        var dateFieldList = dateFields
            .Split(',')
            .Select(field => new FieldId { Name = field });

        // Construct + execute the request
        var dateShiftConfig = new DateShiftConfig
        {
            LowerBoundDays = lowerBoundDays,
            UpperBoundDays = upperBoundDays
        };

        dateShiftConfig.Context = new FieldId { Name = contextField };
        dateShiftConfig.CryptoKey = new CryptoKey
        {
            KmsWrapped = new KmsWrappedCryptoKey
            {
                WrappedKey = ByteString.FromBase64(wrappedKey),
                CryptoKeyName = keyName
            }
        };

        var deidConfig = new DeidentifyConfig
        {
            RecordTransformations = new RecordTransformations
            {
                FieldTransformations =
                {
                    new FieldTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            DateShiftConfig = dateShiftConfig
                        },
                        Fields = { dateFieldList }
                    }
                }
            }
        };

        var response = dlp.DeidentifyContent(
            new DeidentifyContentRequest
            {
                Parent = new LocationName(projectId, "global").ToString(),
                DeidentifyConfig = deidConfig,
                Item = new ContentItem
                {
                    Table = new Table
                    {
                        Headers = { protoHeaders },
                        Rows = { protoRows }
                    }
                }
            });

        return response;
    }
}

cryptoReplaceFfxFpeConfig

如果把 cryptoReplaceFfxFpeConfig 设置为 CryptoReplaceFfxFpeConfig 对象,可通过将输入值替换为令牌来对输入值执行假名化。此令牌具有如下特点:

  • 已加密输入值。
  • 与输入值长度相同。
  • 在 FFX 模式(“FPE-FFX”)下通过保留格式加密计算得出,该模式按 cryptoKey 指定的加密密钥进行加密。
  • alphabet 指定的字符组成。有效选项包括:
    • NUMERIC
    • HEXADECIMAL
    • UPPER_CASE_ALPHA_NUMERIC
    • ALPHA_NUMERIC

输入值需满足以下条件:

  • 必须至少具有两个字符(或为空字符串)。
  • 必须由 alphabet 指定的字符组成。alphabet 可以包含 2 到 95 个字符。(有 95 个字符的 alphabet 包含 US-ASCII 字符集中的所有可打印字符。)

Cloud DLP 使用加密密钥计算替换令牌。您可通过下述三种方式之一提供此密钥:

  1. 请求 Cloud DLP 生成该密钥。
  2. 将其嵌入 API 请求并加密。在此选项中,密钥由 Cloud Key Management Service(Cloud KMS)密钥进行封装(加密)。
  3. 将其嵌入 API 请求但不加密。(不推荐。)

如需创建由 Cloud KMS 封装的密钥,请将包含 16 字节、24 字节或 32 字节的 plaintext 字段值发送到 Cloud KMS projects.locations.keyRings.cryptoKeys.encrypt 方法。封装的密钥即为方法响应的 ciphertext 字段中的值。

该值默认为一个 base64 编码的字符串。要在 Cloud DLP 中设置此值,必须将其解码为字节字符串。以下代码段重点介绍了如何使用多种语言执行此操作。 在这些代码段之后,还提供了端到端示例。

Java

KmsWrappedCryptoKey.newBuilder()
    .setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedKey)))

Python

# The wrapped key is base64-encoded, but the library expects a binary
# string, so decode it here.
import base64
wrapped_key = base64.b64decode(wrapped_key)

PHP

// Create the wrapped crypto key configuration object
$kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
    ->setWrappedKey(base64_decode($wrappedKey))
    ->setCryptoKeyName($keyName);

C#

WrappedKey = ByteString.FromBase64(wrappedKey)

如需详细了解如何使用 Cloud KMS 加密和解密数据,请参阅加密和解密数据

下面是多种语言的示例代码,演示了如何使用 Cloud DLP 将输入值替换为令牌,对敏感数据进行去标识化。

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.common.io.BaseEncoding;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CryptoKey;
import com.google.privacy.dlp.v2.CryptoReplaceFfxFpeConfig;
import com.google.privacy.dlp.v2.CryptoReplaceFfxFpeConfig.FfxCommonNativeAlphabet;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.KmsWrappedCryptoKey;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.Arrays;

public class DeIdentifyWithFpe {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "I'm Gary and my email is gary@example.com";
    String kmsKeyName =
        "projects/YOUR_PROJECT/"
            + "locations/YOUR_KEYRING_REGION/"
            + "keyRings/YOUR_KEYRING_NAME/"
            + "cryptoKeys/YOUR_KEY_NAME";
    String wrappedAesKey = "YOUR_ENCRYPTED_AES_256_KEY";
    deIdentifyWithFpe(projectId, textToDeIdentify, kmsKeyName, wrappedAesKey);
  }

  public static void deIdentifyWithFpe(
      String projectId, String textToDeIdentify, String kmsKeyName, String wrappedAesKey)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify what content you want the service to DeIdentify
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER").build();
      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(Arrays.asList(infoType)).build();

      // Specify an encrypted AES-256 key and the name of the Cloud KMS key that encrypted it
      KmsWrappedCryptoKey kmsWrappedCryptoKey =
          KmsWrappedCryptoKey.newBuilder()
              .setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedAesKey)))
              .setCryptoKeyName(kmsKeyName)
              .build();
      CryptoKey cryptoKey = CryptoKey.newBuilder().setKmsWrapped(kmsWrappedCryptoKey).build();

      // Specify how the info from the inspection should be encrypted.
      InfoType surrogateInfoType = InfoType.newBuilder().setName("SSN_TOKEN").build();
      CryptoReplaceFfxFpeConfig cryptoReplaceFfxFpeConfig =
          CryptoReplaceFfxFpeConfig.newBuilder()
              .setCryptoKey(cryptoKey)
              // Set of characters in the input text. For more info, see
              // https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#DeidentifyTemplate.FfxCommonNativeAlphabet
              .setCommonAlphabet(FfxCommonNativeAlphabet.NUMERIC)
              .setSurrogateInfoType(surrogateInfoType)
              .build();
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setCryptoReplaceFfxFpeConfig(cryptoReplaceFfxFpeConfig)
              .build();
      InfoTypeTransformation infoTypeTransformation =
          InfoTypeTransformation.newBuilder()
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      InfoTypeTransformations transformations =
          InfoTypeTransformations.newBuilder().addTransformations(infoTypeTransformation).build();

      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setInfoTypeTransformations(transformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(contentItem)
              .setInspectConfig(inspectConfig)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
      System.out.println(
          "Text after format-preserving encryption: " + response.getItem().getValue());
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The string to deidentify
// const string = 'My SSN is 372819127';

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#ffxcommonnativealphabet
// const alphabet = 'ALPHA_NUMERIC';

// The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// The encrypted ('wrapped') AES-256 key to use
// This key should be encrypted using the Cloud KMS key specified above
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// (Optional) The name of the surrogate custom info type to use
// Only necessary if you want to reverse the deidentification process
// Can be essentially any arbitrary string, as long as it doesn't appear
// in your dataset otherwise.
// const surrogateType = 'SOME_INFO_TYPE_DEID';

// Construct FPE config
const cryptoReplaceFfxFpeConfig = {
  cryptoKey: {
    kmsWrapped: {
      wrappedKey: wrappedKey,
      cryptoKeyName: keyName,
    },
  },
  commonAlphabet: alphabet,
};
if (surrogateType) {
  cryptoReplaceFfxFpeConfig.surrogateInfoType = {
    name: surrogateType,
  };
}

// Construct deidentification request
const item = {value: string};
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: {
            cryptoReplaceFfxFpeConfig: cryptoReplaceFfxFpeConfig,
          },
        },
      ],
    },
  },
  item: item,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
} catch (err) {
  console.log(`Error in deidentifyWithFpe: ${err.message || err}`);
}

Python



def deidentify_with_fpe(
    project,
    input_str,
    info_types,
    alphabet=None,
    surrogate_type=None,
    key_name=None,
    wrapped_key=None,
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string using Format Preserving Encryption (FPE).
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        alphabet: The set of characters to replace sensitive ones with. For
            more information, see https://cloud.google.com/dlp/docs/reference/
            rest/v2beta2/organizations.deidentifyTemplates#ffxcommonnativealphabet
        surrogate_type: The name of the surrogate custom info type to use. Only
            necessary if you want to reverse the deidentification process. Can
            be essentially any arbitrary string, as long as it doesn't appear
            in your dataset otherwise.
        key_name: The name of the Cloud KMS key used to encrypt ('wrap') the
            AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: The encrypted ('wrapped') AES-256 key to use. This key
            should be encrypted using the Cloud KMS key specified by key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    import base64

    wrapped_key = base64.b64decode(wrapped_key)

    # Construct FPE configuration dictionary
    crypto_replace_ffx_fpe_config = {
        "crypto_key": {
            "kms_wrapped": {
                "wrapped_key": wrapped_key,
                "crypto_key_name": key_name,
            }
        },
        "common_alphabet": alphabet,
    }

    # Add surrogate type
    if surrogate_type:
        crypto_replace_ffx_fpe_config["surrogate_info_type"] = {
            "name": surrogate_type
        }

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "crypto_replace_ffx_fpe_config": crypto_replace_ffx_fpe_config
                    }
                }
            ]
        }
    }

    # Convert string to item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print results
    print(response.item.value)

Go

import (
	"context"
	"fmt"
	"io"
	"io/ioutil"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// deidentifyFPE deidentifies the input with FPE (Format Preserving Encryption).
// keyFileName is the file name with the KMS wrapped key and cryptoKeyName is the
// full KMS key resource name used to wrap the key. surrogateInfoType is an
// optional identifier needed for reidentification. surrogateInfoType can be any
// value not found in your input.
// Info types can be found with the infoTypes.list method or on https://cloud.google.com/dlp/docs/infotypes-reference
func deidentifyFPE(w io.Writer, projectID, input string, infoTypeNames []string, keyFileName, cryptoKeyName, surrogateInfoType string) error {
	// projectID := "my-project-id"
	// input := "My SSN is 123456789"
	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"}
	// keyFileName := "projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME"
	// cryptoKeyName := "YOUR_ENCRYPTED_AES_256_KEY"
	// surrogateInfoType := "AGE"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %v", err)
	}
	// Convert the info type strings to a list of InfoTypes.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}
	// Read the key file.
	keyBytes, err := ioutil.ReadFile(keyFileName)
	if err != nil {
		return fmt.Errorf("ReadFile: %v", err)
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_CryptoReplaceFfxFpeConfig{
									CryptoReplaceFfxFpeConfig: &dlppb.CryptoReplaceFfxFpeConfig{
										CryptoKey: &dlppb.CryptoKey{
											Source: &dlppb.CryptoKey_KmsWrapped{
												KmsWrapped: &dlppb.KmsWrappedCryptoKey{
													WrappedKey:    keyBytes,
													CryptoKeyName: cryptoKeyName,
												},
											},
										},
										// Set the alphabet used for the output.
										Alphabet: &dlppb.CryptoReplaceFfxFpeConfig_CommonAlphabet{
											CommonAlphabet: dlppb.CryptoReplaceFfxFpeConfig_ALPHA_NUMERIC,
										},
										// Set the surrogate info type, used for reidentification.
										SurrogateInfoType: &dlppb.InfoType{
											Name: surrogateInfoType,
										},
									},
								},
							},
						},
					},
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %v", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

/**
 * Deidentify a string using Format-Preserving Encryption (FPE).
 */
use Google\Cloud\Dlp\V2\CryptoReplaceFfxFpeConfig;
use Google\Cloud\Dlp\V2\CryptoReplaceFfxFpeConfig\FfxCommonNativeAlphabet;
use Google\Cloud\Dlp\V2\CryptoKey;
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\KmsWrappedCryptoKey;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\ContentItem;

/** Uncomment and populate these variables in your code */
// $callingProjectId = 'The GCP Project ID to run the API call under';
// $string = 'The string to deidentify';
// $keyName = 'The name of the Cloud KMS key used to encrypt (wrap) the AES-256 key';
// $wrappedKey = 'The name of the Cloud KMS key use, encrypted with the KMS key in $keyName';
// $surrogateTypeName = ''; // (Optional) surrogate custom info type to enable reidentification

// Instantiate a client.
$dlp = new DlpServiceClient();

// The infoTypes of information to mask
$ssnInfoType = (new InfoType())
    ->setName('US_SOCIAL_SECURITY_NUMBER');
$infoTypes = [$ssnInfoType];

// Create the wrapped crypto key configuration object
$kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
    ->setWrappedKey(base64_decode($wrappedKey))
    ->setCryptoKeyName($keyName);

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/V2/organizations.deidentifyTemplates#ffxcommonnativealphabet
$commonAlphabet = FfxCommonNativeAlphabet::NUMERIC;

// Create the crypto key configuration object
$cryptoKey = (new CryptoKey())
    ->setKmsWrapped($kmsWrappedCryptoKey);

// Create the crypto FFX FPE configuration object
$cryptoReplaceFfxFpeConfig = (new CryptoReplaceFfxFpeConfig())
    ->setCryptoKey($cryptoKey)
    ->setCommonAlphabet($commonAlphabet);

if ($surrogateTypeName) {
    $surrogateType = (new InfoType())
        ->setName($surrogateTypeName);
    $cryptoReplaceFfxFpeConfig->setSurrogateInfoType($surrogateType);
}

// Create the information transform configuration objects
$primitiveTransformation = (new PrimitiveTransformation())
    ->setCryptoReplaceFfxFpeConfig($cryptoReplaceFfxFpeConfig);

$infoTypeTransformation = (new InfoTypeTransformation())
    ->setPrimitiveTransformation($primitiveTransformation)
    ->setInfoTypes($infoTypes);

$infoTypeTransformations = (new InfoTypeTransformations())
    ->setTransformations([$infoTypeTransformation]);

// Create the deidentification configuration object
$deidentifyConfig = (new DeidentifyConfig())
    ->setInfoTypeTransformations($infoTypeTransformations);

$content = (new ContentItem())
    ->setValue($string);

$parent = "projects/$callingProjectId/locations/global";

// Run request
$response = $dlp->deidentifyContent($parent, [
    'deidentifyConfig' => $deidentifyConfig,
    'item' => $content
]);

// Print the results
$deidentifiedValue = $response->getItem()->getValue();
print($deidentifiedValue);

C#


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;
using static Google.Cloud.Dlp.V2.CryptoReplaceFfxFpeConfig.Types;

public class DeidentifyWithFpe
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string dataValue,
        IEnumerable<InfoType> infoTypes,
        string keyName,
        string wrappedKey,
        FfxCommonNativeAlphabet alphabet)
    {
        var deidentifyConfig = new DeidentifyConfig
        {
            InfoTypeTransformations = new InfoTypeTransformations
            {
                Transformations =
                {
                    new InfoTypeTransformations.Types.InfoTypeTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            CryptoReplaceFfxFpeConfig = new CryptoReplaceFfxFpeConfig
                            {
                                CommonAlphabet = alphabet,
                                CryptoKey = new CryptoKey
                                {
                                    KmsWrapped = new KmsWrappedCryptoKey
                                    {
                                        CryptoKeyName = keyName,
                                        WrappedKey = ByteString.FromBase64 (wrappedKey)
                                    }
                                },
                                SurrogateInfoType = new InfoType
                                {
                                    Name = "TOKEN"
                                }
                            }
                        }
                    }
                }
            }
        };

        var dlp = DlpServiceClient.Create();
        var response = dlp.DeidentifyContent(
            new DeidentifyContentRequest
            {
                Parent = new LocationName(projectId, "global").ToString(),
                InspectConfig = new InspectConfig
                {
                    InfoTypes = { infoTypes }
                },
                DeidentifyConfig = deidentifyConfig,
                Item = new ContentItem { Value = dataValue }
            });

        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

fixedSizeBucketingConfig

分桶转换(即此转换和 bucketingConfig)通过将数值数据“分桶”到不同范围,对其进行遮盖。生成的数字范围是一个带连字符的字符串,由下限、连字符和上限组成。

如果把 fixedSizeBucketingConfig 设置为 FixedSizeBucketingConfig 对象,将根据固定大小范围对输入值进行分桶。FixedSizeBucketingConfig 对象包含以下内容:

  • lowerBound:所有分桶的下限值。小于此下限的值都将全部分组到一个分桶中。
  • upperBound:所有分桶的上限值。大于此上限的值都将全部分组到一个分桶中。
  • bucketSize:每个分桶的大小(最小和最大分桶除外)。

例如,如果 lowerBound 设置为 10,upperBound 设置为 89,bucketSize 设置为 10,则将使用以下分桶:-10、10-20、20-30、30-40、40-50、50-60、60-70、70-80、80-89、89+。

如需详细了解分桶的概念,请参阅泛化和分桶

bucketingConfig

bucketingConfig 转换比另一种分桶转换(即 fixedSizeBucketingConfig)更为灵活。取代指定上限值、下限值和用于创建等大分桶的间隔值,您为此转换要创建的每个分桶指定最大值和最小值。每个最大值和最小值对必须具有相同类型。

bucketingConfig 设置为 BucketingConfig 对象即可指定自定义分桶。BucketingConfig 对象由 Bucket 对象的buckets[] 数组构成。每个 Bucket 对象都包含以下内容:

  • min:分桶范围的下限。省略此值可创建没有下限的分桶。
  • max:分桶范围的上限。省略此值可创建没有上限的分桶。
  • replacementValue:该值用于替换介于下限与上限之间的值。如果不提供 replacementValue,则转而使用带连字符的 min-max 范围。

如果某个值不在所定义的范围内,返回的 TransformationSummary 将包含错误消息。

例如,请考虑在 bucketingConfig 转换中使用以下配置:

"bucketingConfig":{
  "buckets":[
    {
      "min":{
        "integerValue":"1"
      },
      "max":{
        "integerValue":"30"
      },
      "replacementValue":{
        "stringValue":"LOW"
      }
    },
    {
      "min":{
        "integerValue":"31"
      },
      "max":{
        "integerValue":"65"
      },
      "replacementValue":{
        "stringValue":"MEDIUM"
      }
    },
    {
      "min":{
        "integerValue":"66"
      },
      "max":{
        "integerValue":"100"
      },
      "replacementValue":{
        "stringValue":"HIGH"
      }
    }
  ]
}

这会定义以下行为:

  • 遮盖 1 至 30 的整数值的方式是将它们替换为 LOW
  • 遮盖 31 至 65 的整数值的方式是将它们替换为 MEDIUM
  • 遮盖 66 至 100 的整数值的方式是将它们替换为 HIGH

如需详细了解分桶的概念,请参阅泛化和分桶

replaceWithInfoTypeConfig

如果指定 replaceWithInfoTypeConfig,就会将每个匹配的值都替换为 infoType 的名称。replaceWithInfoTypeConfig 消息不带参数,指定它就会启用该转换。

例如,假设您已为所有 EMAIL_ADDRESS infoType 指定 replaceWithInfoTypeConfig,且已将下列字符串发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia Abernathy, and my email address is EMAIL_ADDRESS.
timePartConfig

如果把 timePartConfig 设置为 TimePartConfig 对象,将保留包含 DateTimestampTimeOfDay 值的匹配值的一部分。TimePartConfig 对象包含一个 partToExtract 参数,后者可设置为年、月、日等任意 TimePart 枚举值。

例如,假设您已通过将 partToExtract 设置为 YEAR 来配置 timePartConfig 转换。在将下述第一列中的数据发送到 Cloud DLP 之后,最终将获得第二列中显示的转换后的值:

原始值 转换后的值
9/21/1976 1976
6/7/1945 1945
1/20/2009 2009
7/4/1776 1776
8/1/1984 1984
4/21/1982 1982

记录转换

记录转换(RecordTransformations 对象)仅应用于表格数据中被识别为特定 infoType 的值。在 RecordTransformations 中,还可将转换细分成两个子类别:

  • fieldTransformations[]:应用各种字段转换的转换。
  • recordSuppressions[]:用于定义完全隐藏哪些记录的规则。输出中不显示与 recordSuppressions[] 中任何隐藏规则相匹配的记录。

字段转换

每个 FieldTransformation 对象都包含三个参数:

  • fields:需要应用转换的一个或多个输入字段(FieldID 对象)。
  • condition:计算结果必须为 true 才能应用转换的条件(RecordCondition 对象)。例如,仅当记录的邮编列是在特定范围内时,才向该记录的年龄列应用分桶转换。或者,仅当出生日期字段表明某人的年龄为 85 岁或以上时,才隐去字段。
  • 下列两个转换类型参数之一。需要指定任一项:

记录隐藏

除了向字段数据应用转换,您还能指示 Cloud DLP 在数据的特定隐藏条件计算结果为 true 时将其隐藏,从而实现去标识化。您可在同一请求中同时应用字段转换和记录隐藏。

您需要将 RecordTransformations 对象的 recordSuppressions 消息设置为一个或多个 RecordSuppression 对象的数组。

每个 RecordSuppression 对象都包含一个 RecordCondition 对象,后者又包含一个 Expressions 对象。

Expressions 对象包含以下内容:

  • logicalOperatorLogicalOperator 枚举类型之一。
  • conditions:一个 Conditions 对象,其中有一个包含一个或多个 Condition 对象的数组。Condition 是一个字段值和另一个值的比较,两者类型为 stringbooleanintegerdoubleTimestampTimeofDay

如果比较结果为 true,将隐藏记录;反之则不隐藏。 如果比较的值类型不同,将发出警告,同时条件计算结果为 false。