转换参考文档

本主题介绍 Cloud DLP 中提供的去标识化方法(或称转换)。

去标识化方法的类型

应根据要去标识化的数据种类以及对数据进行去标识化的目的来选择使用的去标识化转换。Cloud DLP 支持的去标识化方法分为以下几大类别:

  • 隐去:删除检测到的全部或部分敏感值。
  • 替换:使用指定的代理值替换检测到的敏感值。
  • 遮盖:使用指定的代理字符(例如井号 (#) 或星号 (*))替换敏感值中的若干字符。
  • 基于加密的令牌化:使用加密密钥加密原始敏感数据值。Cloud DLP 支持多种类型的令牌化,包括可以逆转(也称“重标识”)的转换。
  • 分桶:使用某个范围或系列值替换敏感值来对其进行“泛化”处理。(例如,使用年龄范围来替换特定年龄值,或者使用“热”、“温”和“冷”替换相应范围内的温度值。)
  • 日期偏移:将敏感日期值移动一段随机时间。
  • 时间提取:提取或保留日期和时间值的指定部分。

本主题的其余部分将介绍各种不同类型的去标识化转换,并提供其使用示例。

转换方法

下表列出了 Cloud DLP 提供的用于对敏感数据进行去标识化的转换方法:

转换 对象 说明 可以逆转1 参照完整性2 输入类型
隐去 RedactConfig 通过移除来隐去值。 不限
替换 ReplaceValueConfig 使用给定值替换每个输入值。 不限
使用 infoType 替换 ReplaceWithInfoTypeConfig 将输入值替换为其 infoType 的名称。 不限
使用字符遮盖 CharacterMaskConfig 通过将给定数量的字符替换为所指定的固定字符,全部或部分遮盖字符串。 不限
通过将输入值替换为加密哈希实现假名化 CryptoHashConfig 将输入值替换为使用给定数据加密密钥生成的 32 字节十六进制字符串。请参阅假名化概念文档了解详情。 字符串或整数
通过替换为加密保留格式令牌实现假名化 CryptoReplaceFfxFpeConfig 使用 FFX 运算模式的保留格式加密 (FPE) 将输入值替换为长度相同的令牌或代理值。这样就能在具有长度格式验证的系统中使用输出。这对于必须保留字符串长度的旧系统很有用。重要提示:对于长度不等或长度超过 32 个字节的输入,请使用 CryptoDeterministicConfig。 为确保安全,美国国家标准与技术研究院建议遵循以下限制:
  • radix^max_size <= 2^128.
  • radix^min_len >= 100
我们建议对不要求保留输入的字母表空间和大小,且保证参照完整性的所有用例使用 CryptoDeterministicConfig。 请参阅假名化概念文档了解详情。
具有有限字符数或长度统一的字符串或整数。字母表必须至少包含 2 个字符,且不得超过 95 个字符。
通过替换为加密令牌实现假名化 CryptoDeterministicConfig 使用合成初始化矢量模式下的 AES (AES-SIV) 将输入值替换为长度相同的令牌或代理值。与保留格式令牌化不同,此转换方法对受支持的字符串字符集没有任何限制,为相同输入值的每个实例生成相同令牌,并在给定原始加密密钥的情况下使用代理来启用重标识。 不限
基于固定大小范围的分桶值 FixedSizeBucketingConfig 将输入值替换为输入值所在的分桶(或范围)的值。 不限
基于自定义大小范围的分桶值 BucketingConfig 根据用户可配置的范围和替换值将输入值替换为分桶值。 不限
日期偏移 DateShiftConfig 按随机天数偏移日期,可使同一上下文保持一致。
保留顺序和持续时间
日期/时间
提取时间数据 TimePartConfig 提取或保留 DateTimestampTimeOfDay 值的一部分。 日期/时间

脚注

1 可以对可逆转换进行逆转,以使用 content.reidentify 方法重新标识敏感数据。
2 参照完整性允许在对数据去标识化的同时保持记录之间的关系。例如,给定相同的加密密钥和上下文,数据在每次转换时都将被替换为相同的去标识化形式,从而保留记录之间的联系。

隐去

如果只想从输入内容中移除敏感数据,Cloud DLP 支持隐去转换(DLP API 中的 RedactConfig)。

例如,假设您要对所有 EMAIL_ADDRESS infoType 执行简单的隐去处理,且下列字符串已发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia Abernathy, and my email address is .

多种语言的以下 JSON 示例和代码演示了如何构建 API 请求以及 Cloud DLP API 会返回哪些内容:

协议

JSON 输入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "redactConfig":{

            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

JSON 输出:

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is ."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "redactConfig":{

          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RedactConfig;

public class DeIdentifyWithRedaction {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect =
        "My name is Alicia Abernathy, and my email address is aabernathy@example.com.";
    deIdentifyWithRedaction(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void deIdentifyWithRedaction(String projectId, String textToRedact) {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder()
          .setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Define type of deidentification.
      PrimitiveTransformation primitiveTransformation = PrimitiveTransformation.newBuilder()
          .setRedactConfig(RedactConfig.getDefaultInstance())
          .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation = InfoTypeTransformation.newBuilder()
          .addInfoTypes(infoType)
          .setPrimitiveTransformation(primitiveTransformation)
          .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig = DeidentifyConfig.newBuilder()
          .setInfoTypeTransformations(InfoTypeTransformations.newBuilder()
              .addTransformations(transformation))
          .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    } catch (Exception e) {
      System.out.println("Error during inspectString: \n" + e.toString());
    }
  }
}

Python

def deidentify_with_redact(
    project,
    input_str,
    info_types,
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by redacting matched input values.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "redact_config": {}
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print out the results.
    print(response.item.value)

替换

替换转换使用给定令牌值或其 infoType 的名称替换每个输入值。

基本替换

基本替换转换(DLP API 中的 ReplaceValueConfig)会使用您指定的值替换检测到的敏感数据值。例如,假设您已让 Cloud DLP 使用“[fake@example.com]”替换所有检测到的 EMAIL_ADDRESS infoType,且下列字符串已发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia Abernathy, and my email address is [fake@example.com].

多种语言的以下 JSON 示例和代码演示了如何构建 API 请求以及 Cloud DLP API 会返回哪些内容:

协议

如需详细了解如何将 Cloud DLP API 与 JSON 配合使用,请参阅 JSON 快速入门

JSON 输入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "replaceConfig":{
              "newValue":{
                "stringValue":"[email-address]"
              }
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

JSON 输出:

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is [email-address]."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "replaceConfig":{
            "newValue":{
              "stringValue":"[email-address]"
            }
          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RedactConfig;
import com.google.privacy.dlp.v2.ReplaceValueConfig;
import com.google.privacy.dlp.v2.Value;

public class DeIdentifyWithReplacement {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect =
        "My name is Alicia Abernathy, and my email address is aabernathy@example.com.";
    deIdentifyWithReplacement(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void deIdentifyWithReplacement(String projectId, String textToRedact) {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder()
          .setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Specify replacement string to be used for the finding.
      ReplaceValueConfig replaceValueConfig = ReplaceValueConfig.newBuilder()
          .setNewValue(Value.newBuilder().setStringValue("[email-address]").build())
          .build();
      // Define type of deidentification as replacement.
      PrimitiveTransformation primitiveTransformation = PrimitiveTransformation.newBuilder()
          .setReplaceConfig(replaceValueConfig)
          .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation = InfoTypeTransformation.newBuilder()
          .addInfoTypes(infoType)
          .setPrimitiveTransformation(primitiveTransformation)
          .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig = DeidentifyConfig.newBuilder()
          .setInfoTypeTransformations(InfoTypeTransformations.newBuilder()
              .addTransformations(transformation))
          .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    } catch (Exception e) {
      System.out.println("Error during inspectString: \n" + e.toString());
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The string to deidentify
// const string = 'My SSN is 372819127';

// The string to replace sensitive information with
// const replacement = "[REDACTED]"

// Construct deidentification request
const item = {value: string};
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: {
            replaceConfig: {
              newValue: {
                stringValue: replacement,
              },
            },
          },
        },
      ],
    },
  },
  item: item,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
} catch (err) {
  console.log(`Error in deidentifyWithReplacement: ${err.message || err}`);
}

Python

def deidentify_with_replace(
    project,
    input_str,
    info_types,
    replacement_str="REPLACEMENT_STR",
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by replacing matched input values with a value you specify.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
        replacement_str: The string to replace all values that match given
            info types.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "replace_config": {
                            "new_value": {
                                "string_value": replacement_str,
                            }
                        }
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print out the results.
    print(response.item.value)

infoType 替换

您还可以指定 infoType 替换(DLP API 中的 ReplaceWithInfoTypeConfig)。此转换与基本替换转换的作用相同,但它使用检测到的值的 infoType 替换检测到的每个敏感数据值。

例如,假设您已让 Cloud DLP 检测电子邮件地址和姓氏,并使用值的 infoType 替换每个检测到的值。您将以下字符串发送到 Cloud DLP:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

返回的字符串如下:

My name is Alicia LAST_NAME, and my email address is EMAIL_ADDRESS.

遮盖

您可以将 Cloud DLP 配置为使用固定的单个遮盖字符(例如星号 (*) 或井号 (#))替换每个字符,以完全或部分遮盖检测到的敏感值(DLP API 中的 CharacterMaskConfig)。可以从字符串的开头或结尾开始遮盖。此转换也适用于数字类型,例如长整数。

Cloud DLP 遮盖转换提供了如下可以指定的选项:

  • 遮盖字符(DLP API 中的 maskingCharacter 参数):用于遮盖敏感值中每个字符的字符。例如,您可以指定用星号 (*) 或美元符号 ($) 来遮盖信用卡号中的一连串数字。
  • 需要遮盖的字符数 (numberToMask):如果未指定此值,将遮盖所有字符。
  • 是否逆转顺序 (reverseOrder):是否按反向顺序遮盖字符。如果逆转顺序,将从匹配值的末尾朝着开头遮盖值中的字符。
  • 需要忽略的字符 (charactersToIgnore):遮盖值时要跳过的一个或多个字符。例如,您可以让 Cloud DLP 在遮盖电话号码时保留连字符。您还可以指定一组要在遮盖时忽略的常见字符 (CharsToIgnore)。

假设您将以下字符串发送到 Cloud DLP 并指示其对电子邮件地址使用字符遮盖转换:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

在将遮盖字符发送到“#”,要忽略的字符设置为公共字符集,其他字符设置为默认设置后,Cloud DLP 返回以下内容:

My name is Alicia Abernathy, and my email address is ##########@#######.###.

以下 JSON 和代码示例演示了遮盖转换的工作原理。

协议

JSON 输入:

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "characterMaskConfig":{
              "maskingCharacter":"#",
              "reverseOrder":false,
              "charactersToIgnore":[
                {
                  "charactersToSkip":".@"
                }
              ]
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

JSON 输出:

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is ##########@#######.###."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "characterMaskConfig":{
            "maskingCharacter":"#",
            "charactersToIgnore":[
              {
                "charactersToSkip":".@"
              }
            ]
          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CharacterMaskConfig;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.ReplaceWithInfoTypeConfig;
import java.io.IOException;
import java.util.Arrays;

public class DeIdentifyWithMasking {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "My SSN is 372819127";
    deIdentifyWithMasking(projectId, textToDeIdentify);
  }

  public static void deIdentifyWithMasking(String projectId, String textToDeIdentify)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify what content you want the service to DeIdentify
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER").build();
      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(Arrays.asList(infoType)).build();

      // Specify how the info from the inspection should be masked.
      CharacterMaskConfig characterMaskConfig =
          CharacterMaskConfig.newBuilder()
              .setMaskingCharacter("X") // Character to replace the found info with
              .setNumberToMask(5) // How many characters should be masked
              .build();
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setReplaceWithInfoTypeConfig(ReplaceWithInfoTypeConfig.getDefaultInstance())
              .build();
      InfoTypeTransformation infoTypeTransformation =
          InfoTypeTransformation.newBuilder()
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      InfoTypeTransformations transformations =
          InfoTypeTransformations.newBuilder().addTransformations(infoTypeTransformation).build();

      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setInfoTypeTransformations(transformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(contentItem)
              .setInspectConfig(inspectConfig)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
      System.out.println("Text after masking: " + response.getItem().getValue());
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The string to deidentify
// const string = 'My SSN is 372819127';

// (Optional) The maximum number of sensitive characters to mask in a match
// If omitted from the request or set to 0, the API will mask any matching characters
// const numberToMask = 5;

// (Optional) The character to mask matching sensitive data with
// const maskingCharacter = 'x';

// Construct deidentification request
const item = {value: string};
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: {
            characterMaskConfig: {
              maskingCharacter: maskingCharacter,
              numberToMask: numberToMask,
            },
          },
        },
      ],
    },
  },
  item: item,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
} catch (err) {
  console.log(`Error in deidentifyWithMask: ${err.message || err}`);
}

Python

def deidentify_with_mask(
    project, input_str, info_types, masking_character=None, number_to_mask=0
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by masking it with a character.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        masking_character: The character to mask matching sensitive data with.
        number_to_mask: The maximum number of sensitive characters to mask in
            a match. If omitted or set to zero, the API will default to no
            maximum.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "character_mask_config": {
                            "masking_character": masking_character,
                            "number_to_mask": number_to_mask,
                        }
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print out the results.
    print(response.item.value)

Go

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// mask deidentifies the input by masking all provided info types with maskingCharacter
// and prints the result to w.
func mask(w io.Writer, projectID, input string, infoTypeNames []string, maskingCharacter string, numberToMask int32) error {
	// projectID := "my-project-id"
	// input := "My SSN is 111222333"
	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"}
	// maskingCharacter := "+"
	// numberToMask := 6
	// Will print "My SSN is ++++++333"

	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %v", err)
	}
	// Convert the info type strings to a list of InfoTypes.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_CharacterMaskConfig{
									CharacterMaskConfig: &dlppb.CharacterMaskConfig{
										MaskingCharacter: maskingCharacter,
										NumberToMask:     numberToMask,
									},
								},
							},
						},
					},
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %v", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

/**
 * Deidentify sensitive data in a string by masking it with a character.
 */
use Google\Cloud\Dlp\V2\CharacterMaskConfig;
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\ContentItem;

/** Uncomment and populate these variables in your code */
// $callingProjectId = 'The GCP Project ID to run the API call under';
// $string = 'The string to deidentify';
// $numberToMask = 0; // (Optional) The maximum number of sensitive characters to mask in a match
// $maskingCharacter = 'x'; // (Optional) The character to mask matching sensitive data with

// Instantiate a client.
$dlp = new DlpServiceClient();

// The infoTypes of information to mask
$ssnInfoType = (new InfoType())
    ->setName('US_SOCIAL_SECURITY_NUMBER');
$infoTypes = [$ssnInfoType];

// Create the masking configuration object
$maskConfig = (new CharacterMaskConfig())
    ->setMaskingCharacter($maskingCharacter)
    ->setNumberToMask($numberToMask);

// Create the information transform configuration objects
$primitiveTransformation = (new PrimitiveTransformation())
    ->setCharacterMaskConfig($maskConfig);

$infoTypeTransformation = (new InfoTypeTransformation())
    ->setPrimitiveTransformation($primitiveTransformation)
    ->setInfoTypes($infoTypes);

$infoTypeTransformations = (new InfoTypeTransformations())
    ->setTransformations([$infoTypeTransformation]);

// Create the deidentification configuration object
$deidentifyConfig = (new DeidentifyConfig())
    ->setInfoTypeTransformations($infoTypeTransformations);

$item = (new ContentItem())
    ->setValue($string);

$parent = $dlp->projectName($callingProjectId);

// Run request
$response = $dlp->deidentifyContent($parent, [
    'deidentifyConfig' => $deidentifyConfig,
    'item' => $item
]);

// Print the results
$deidentifiedValue = $response->getItem()->getValue();
print($deidentifiedValue);

C#


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithMasking
{
    public static DeidentifyContentResponse Deidentify(string projectId, string text)
    {
        // Instantiate a client.
        var dlp = DlpServiceClient.Create();

        // Construct a request.
        var transformation = new InfoTypeTransformations.Types.InfoTypeTransformation
        {
            PrimitiveTransformation = new PrimitiveTransformation
            {
                CharacterMaskConfig = new CharacterMaskConfig
                {
                    MaskingCharacter = "*",
                    NumberToMask = 5,
                    ReverseOrder = false,
                }
            }
        };
        var request = new DeidentifyContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            InspectConfig = new InspectConfig
            {
                InfoTypes =
                {
                    new InfoType { Name = "US_SOCIAL_SECURITY_NUMBER" }
                }
            },
            DeidentifyConfig = new DeidentifyConfig
            {
                InfoTypeTransformations = new InfoTypeTransformations
                {
                    Transformations = { transformation }
                }
            },
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Inspect the results.
        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

基于加密的令牌化转换

基于加密的令牌化(也称为“假名化”)转换是指将原始敏感数据值替换为加密值的去标识化方法。Cloud DLP 支持以下类型的令牌化,包括可以逆转并允许重标识的转换:

  • 加密哈希:在给定 CryptoKey 的情况下,Cloud DLP 使用基于 SHA-256 的消息身份验证代码 (HMAC-SHA-256),然后将输入值替换为以 Base64 编码的哈希值。
  • 保留格式加密:使用 FFX 运算模式的保留格式加密 (FPE) 生成令牌替换输入值。此转换方法会生成一个令牌,该令牌被限制为与输入值的字母系统相同,并且与输入值的长度相同。FPE 还支持在给定原始加密密钥的情况下进行重标识。
  • 确定性加密:使用合成初始化矢量模式下的 AES (AES-SIV) 生成令牌替换输入值。此转换方法对受支持的字符串字符集没有任何限制,为相同输入值的每个实例生成相同令牌,并使用代理在给定原始加密密钥的情况下启用重标识。

加密哈希

加密哈希转换(DLP API 中的 CryptoHashConfig)会获取输入值(Cloud DLP 检测到的敏感数据)并将其替换为哈希值。哈希值是通过利用 CryptoKey 对输入值应用基于 SHA-256 的消息身份验证代码 (HMAC-SHA-256) 生成的。

Cloud DLP 会输出以 Base64 编码表示的哈希输入值来替换原始值。

在使用加密哈希技术转换之前,需要考虑以下方面:

  • 输入值使用哈希处理,而不是加密。
  • 此转换无法逆转。也就是说,即使给定转换的哈希输出值和原始密钥,也无法恢复原始值。
  • 目前,只能对字符串和整数值进行哈希处理。
  • 转换的哈希输出始终具有相同长度,具体取决于密钥的大小。例如,如果您对 10 位电话号码使用加密哈希技术转换,每个电话号码都将被替换为固定长度的 Base64 编码的哈希值。

保留格式加密

保留格式加密 (FPE) 转换方法(DLP API 中的 CryptoReplaceFfxFpeConfig)会获取输入值(Cloud DLP 检测到的敏感数据),使用 FFX 模式中的保留格式加密和 CryptoKey 对其进行加密,然后使用加密的值(或称令牌)替换原始值。

输入值需满足以下条件:

  • 必须至少具有两个字符(或为空字符串)。
  • 必须是 ASCII 编码字符。
  • 由按“字母表”指定的字符组成,“字母表”是指允许在输入值中使用的 2 到 64 个字符的集合。如需了解详情,请参阅 CryptoReplaceFfxFpeConfig 中的字母表字段。

生成的令牌:

  • 是加密的输入值。
  • 加密后保留输入值的字符集(“字母表”)和长度。
  • 使用指定加密密钥通过 FFX 模式的保留格式加密计算得出。
  • 不一定是唯一的,因为每个相同的输入值在去标识化后都得到相同的令牌。这实现了参照完整性,可以更高效地搜索去标识化数据。 如上下文中所述,您可以使用上下文“调整”来更改此行为。

如果源内容中的一个输入值存在多个实例,则每个实例都将被去标识化为相同令牌。FPE 会保留长度和字母表空间(字符集),字母表空间被限制为 62 个字符。您可以使用上下文“调整”来更改此行为,这可以提高安全性。通过向转换添加上下文调整,Cloud DLP 可以将相同输入值的多个实例去标识化为不同令牌。如果您不需要保留原始值的长度和字母表空间,请使用确定性加密,如下所述。

Cloud DLP 使用加密密钥计算替换令牌。 您可通过下述三种方式之一提供此密钥:

  1. 将其嵌入 API 请求但不加密。
  2. 请求 Cloud DLP 生成该密钥。
  3. 将其嵌入 API 请求并加密。在此选项中,密钥由 Cloud Key Management Service (Cloud KMS) 密钥进行封装(加密)。

如需创建由 Cloud KMS 封装的密钥,请将包含 16 字节、24 字节或 32 字节的明文字段值的请求发送到 Cloud KMS projects.locations.keyRings.cryptoKeys.encrypt 方法。封装的密钥即为方法响应的 ciphertext 字段中的值。

该值默认为一个 Base64 编码的字符串。如需在 Cloud DLP 中设置此值,必须将其解码为字节字符串。以下代码段重点介绍了如何使用多种语言执行此操作。在这些代码段之后,还提供了端到端示例。

Java

KmsWrappedCryptoKey.newBuilder()
    .setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedKey)))

Python

# The wrapped key is Base64-encoded, but the library expects a binary
# string, so decode it here.
import base64
wrapped_key = base64.b64decode(wrapped_key)

PHP

// Create the wrapped crypto key configuration object
$kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
    ->setWrappedKey(base64_decode($wrappedKey))
    ->setCryptoKeyName($keyName);

C#

WrappedKey = ByteString.FromBase64(wrappedKey)

如需详细了解如何使用 Cloud KMS 加密和解密数据,请参阅加密和解密数据

下面是多种语言的示例代码,演示了如何使用 Cloud DLP 将输入值替换为令牌,对敏感数据进行去标识化。

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.common.io.BaseEncoding;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CryptoKey;
import com.google.privacy.dlp.v2.CryptoReplaceFfxFpeConfig;
import com.google.privacy.dlp.v2.CryptoReplaceFfxFpeConfig.FfxCommonNativeAlphabet;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.KmsWrappedCryptoKey;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.Arrays;

public class DeIdentifyWithFpe {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "I'm Gary and my email is gary@example.com";
    String kmsKeyName =
        "projects/YOUR_PROJECT/"
            + "locations/YOUR_KEYRING_REGION/"
            + "keyRings/YOUR_KEYRING_NAME/"
            + "cryptoKeys/YOUR_KEY_NAME";
    String wrappedAesKey = "YOUR_ENCRYPTED_AES_256_KEY";
    deIdentifyWithFpe(projectId, textToDeIdentify, kmsKeyName, wrappedAesKey);
  }

  public static void deIdentifyWithFpe(
      String projectId, String textToDeIdentify, String kmsKeyName, String wrappedAesKey)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify what content you want the service to DeIdentify
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER").build();
      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(Arrays.asList(infoType)).build();

      // Specify an encrypted AES-256 key and the name of the Cloud KMS key that encrypted it
      KmsWrappedCryptoKey kmsWrappedCryptoKey =
          KmsWrappedCryptoKey.newBuilder()
              .setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedAesKey)))
              .setCryptoKeyName(kmsKeyName)
              .build();
      CryptoKey cryptoKey = CryptoKey.newBuilder().setKmsWrapped(kmsWrappedCryptoKey).build();

      // Specify how the info from the inspection should be encrypted.
      InfoType surrogateInfoType = InfoType.newBuilder().setName("SSN_TOKEN").build();
      CryptoReplaceFfxFpeConfig cryptoReplaceFfxFpeConfig =
          CryptoReplaceFfxFpeConfig.newBuilder()
              .setCryptoKey(cryptoKey)
              // Set of characters in the input text. For more info, see
              // https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#DeidentifyTemplate.FfxCommonNativeAlphabet
              .setCommonAlphabet(FfxCommonNativeAlphabet.NUMERIC)
              .setSurrogateInfoType(surrogateInfoType)
              .build();
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setCryptoReplaceFfxFpeConfig(cryptoReplaceFfxFpeConfig)
              .build();
      InfoTypeTransformation infoTypeTransformation =
          InfoTypeTransformation.newBuilder()
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      InfoTypeTransformations transformations =
          InfoTypeTransformations.newBuilder().addTransformations(infoTypeTransformation).build();

      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setInfoTypeTransformations(transformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(contentItem)
              .setInspectConfig(inspectConfig)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
      System.out.println(
          "Text after format-preserving encryption: " + response.getItem().getValue());
    }
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The string to deidentify
// const string = 'My SSN is 372819127';

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#ffxcommonnativealphabet
// const alphabet = 'ALPHA_NUMERIC';

// The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// The encrypted ('wrapped') AES-256 key to use
// This key should be encrypted using the Cloud KMS key specified above
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// (Optional) The name of the surrogate custom info type to use
// Only necessary if you want to reverse the deidentification process
// Can be essentially any arbitrary string, as long as it doesn't appear
// in your dataset otherwise.
// const surrogateType = 'SOME_INFO_TYPE_DEID';

// Construct FPE config
const cryptoReplaceFfxFpeConfig = {
  cryptoKey: {
    kmsWrapped: {
      wrappedKey: wrappedKey,
      cryptoKeyName: keyName,
    },
  },
  commonAlphabet: alphabet,
};
if (surrogateType) {
  cryptoReplaceFfxFpeConfig.surrogateInfoType = {
    name: surrogateType,
  };
}

// Construct deidentification request
const item = {value: string};
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    infoTypeTransformations: {
      transformations: [
        {
          primitiveTransformation: {
            cryptoReplaceFfxFpeConfig: cryptoReplaceFfxFpeConfig,
          },
        },
      ],
    },
  },
  item: item,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
} catch (err) {
  console.log(`Error in deidentifyWithFpe: ${err.message || err}`);
}

Python



def deidentify_with_fpe(
    project,
    input_str,
    info_types,
    alphabet=None,
    surrogate_type=None,
    key_name=None,
    wrapped_key=None,
):
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string using Format Preserving Encryption (FPE).
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        alphabet: The set of characters to replace sensitive ones with. For
            more information, see https://cloud.google.com/dlp/docs/reference/
            rest/v2beta2/organizations.deidentifyTemplates#ffxcommonnativealphabet
        surrogate_type: The name of the surrogate custom info type to use. Only
            necessary if you want to reverse the deidentification process. Can
            be essentially any arbitrary string, as long as it doesn't appear
            in your dataset otherwise.
        key_name: The name of the Cloud KMS key used to encrypt ('wrap') the
            AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: The encrypted ('wrapped') AES-256 key to use. This key
            should be encrypted using the Cloud KMS key specified by key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    import base64

    wrapped_key = base64.b64decode(wrapped_key)

    # Construct FPE configuration dictionary
    crypto_replace_ffx_fpe_config = {
        "crypto_key": {
            "kms_wrapped": {
                "wrapped_key": wrapped_key,
                "crypto_key_name": key_name,
            }
        },
        "common_alphabet": alphabet,
    }

    # Add surrogate type
    if surrogate_type:
        crypto_replace_ffx_fpe_config["surrogate_info_type"] = {
            "name": surrogate_type
        }

    # Construct inspect configuration dictionary
    inspect_config = {
        "info_types": [{"name": info_type} for info_type in info_types]
    }

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "crypto_replace_ffx_fpe_config": crypto_replace_ffx_fpe_config
                    }
                }
            ]
        }
    }

    # Convert string to item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        parent,
        inspect_config=inspect_config,
        deidentify_config=deidentify_config,
        item=item,
    )

    # Print results
    print(response.item.value)

Go

import (
	"context"
	"fmt"
	"io"
	"io/ioutil"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// deidentifyFPE deidentifies the input with FPE (Format Preserving Encryption).
// keyFileName is the file name with the KMS wrapped key and cryptoKeyName is the
// full KMS key resource name used to wrap the key. surrogateInfoType is an
// optional identifier needed for reidentification. surrogateInfoType can be any
// value not found in your input.
// Info types can be found with the infoTypes.list method or on https://cloud.google.com/dlp/docs/infotypes-reference
func deidentifyFPE(w io.Writer, projectID, input string, infoTypeNames []string, keyFileName, cryptoKeyName, surrogateInfoType string) error {
	// projectID := "my-project-id"
	// input := "My SSN is 123456789"
	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"}
	// keyFileName := "projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME"
	// cryptoKeyName := "YOUR_ENCRYPTED_AES_256_KEY"
	// surrogateInfoType := "AGE"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %v", err)
	}
	// Convert the info type strings to a list of InfoTypes.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}
	// Read the key file.
	keyBytes, err := ioutil.ReadFile(keyFileName)
	if err != nil {
		return fmt.Errorf("ReadFile: %v", err)
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_CryptoReplaceFfxFpeConfig{
									CryptoReplaceFfxFpeConfig: &dlppb.CryptoReplaceFfxFpeConfig{
										CryptoKey: &dlppb.CryptoKey{
											Source: &dlppb.CryptoKey_KmsWrapped{
												KmsWrapped: &dlppb.KmsWrappedCryptoKey{
													WrappedKey:    keyBytes,
													CryptoKeyName: cryptoKeyName,
												},
											},
										},
										// Set the alphabet used for the output.
										Alphabet: &dlppb.CryptoReplaceFfxFpeConfig_CommonAlphabet{
											CommonAlphabet: dlppb.CryptoReplaceFfxFpeConfig_ALPHA_NUMERIC,
										},
										// Set the surrogate info type, used for reidentification.
										SurrogateInfoType: &dlppb.InfoType{
											Name: surrogateInfoType,
										},
									},
								},
							},
						},
					},
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %v", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

/**
 * Deidentify a string using Format-Preserving Encryption (FPE).
 */
use Google\Cloud\Dlp\V2\CryptoReplaceFfxFpeConfig;
use Google\Cloud\Dlp\V2\CryptoReplaceFfxFpeConfig\FfxCommonNativeAlphabet;
use Google\Cloud\Dlp\V2\CryptoKey;
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\KmsWrappedCryptoKey;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\ContentItem;

/** Uncomment and populate these variables in your code */
// $callingProjectId = 'The GCP Project ID to run the API call under';
// $string = 'The string to deidentify';
// $keyName = 'The name of the Cloud KMS key used to encrypt (wrap) the AES-256 key';
// $wrappedKey = 'The name of the Cloud KMS key use, encrypted with the KMS key in $keyName';
// $surrogateTypeName = ''; // (Optional) surrogate custom info type to enable reidentification

// Instantiate a client.
$dlp = new DlpServiceClient();

// The infoTypes of information to mask
$ssnInfoType = (new InfoType())
    ->setName('US_SOCIAL_SECURITY_NUMBER');
$infoTypes = [$ssnInfoType];

// Create the wrapped crypto key configuration object
$kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
    ->setWrappedKey(base64_decode($wrappedKey))
    ->setCryptoKeyName($keyName);

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/V2/organizations.deidentifyTemplates#ffxcommonnativealphabet
$commonAlphabet = FfxCommonNativeAlphabet::NUMERIC;

// Create the crypto key configuration object
$cryptoKey = (new CryptoKey())
    ->setKmsWrapped($kmsWrappedCryptoKey);

// Create the crypto FFX FPE configuration object
$cryptoReplaceFfxFpeConfig = (new CryptoReplaceFfxFpeConfig())
    ->setCryptoKey($cryptoKey)
    ->setCommonAlphabet($commonAlphabet);

if ($surrogateTypeName) {
    $surrogateType = (new InfoType())
        ->setName($surrogateTypeName);
    $cryptoReplaceFfxFpeConfig->setSurrogateInfoType($surrogateType);
}

// Create the information transform configuration objects
$primitiveTransformation = (new PrimitiveTransformation())
    ->setCryptoReplaceFfxFpeConfig($cryptoReplaceFfxFpeConfig);

$infoTypeTransformation = (new InfoTypeTransformation())
    ->setPrimitiveTransformation($primitiveTransformation)
    ->setInfoTypes($infoTypes);

$infoTypeTransformations = (new InfoTypeTransformations())
    ->setTransformations([$infoTypeTransformation]);

// Create the deidentification configuration object
$deidentifyConfig = (new DeidentifyConfig())
    ->setInfoTypeTransformations($infoTypeTransformations);

$content = (new ContentItem())
    ->setValue($string);

$parent = $dlp->projectName($callingProjectId);

// Run request
$response = $dlp->deidentifyContent($parent, [
    'deidentifyConfig' => $deidentifyConfig,
    'item' => $content
]);

// Print the results
$deidentifiedValue = $response->getItem()->getValue();
print($deidentifiedValue);

C#


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;
using static Google.Cloud.Dlp.V2.CryptoReplaceFfxFpeConfig.Types;

public class DeidentifyWithFpe
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string dataValue,
        IEnumerable<InfoType> infoTypes,
        string keyName,
        string wrappedKey,
        FfxCommonNativeAlphabet alphabet)
    {
        var deidentifyConfig = new DeidentifyConfig
        {
            InfoTypeTransformations = new InfoTypeTransformations
            {
                Transformations =
                {
                    new InfoTypeTransformations.Types.InfoTypeTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            CryptoReplaceFfxFpeConfig = new CryptoReplaceFfxFpeConfig
                            {
                                CommonAlphabet = alphabet,
                                CryptoKey = new CryptoKey
                                {
                                    KmsWrapped = new KmsWrappedCryptoKey
                                    {
                                        CryptoKeyName = keyName,
                                        WrappedKey = ByteString.FromBase64 (wrappedKey)
                                    }
                                },
                                SurrogateInfoType = new InfoType
                                {
                                    Name = "TOKEN"
                                }
                            }
                        }
                    }
                }
            }
        };

        var dlp = DlpServiceClient.Create();
        var response = dlp.DeidentifyContent(
            new DeidentifyContentRequest
            {
                Parent = new LocationName(projectId, "global").ToString(),
                InspectConfig = new InspectConfig
                {
                    InfoTypes = { infoTypes }
                },
                DeidentifyConfig = deidentifyConfig,
                Item = new ContentItem { Value = dataValue }
            });

        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

确定性加密

DLP API 中的确定性加密转换方法 CryptoDeterministicConfig 会获取输入值(Cloud DLP 检测到的敏感数据),使用 AES-SIVCryptoKey 对其进行加密,然后使用以 Base64 编码表示的加密值替换原始值。

使用确定性加密转换可以更高效地搜索加密数据。

输入值需满足以下条件:

  • 长度不得少于 1 个字符。
  • 没有字符集限制。

生成的令牌:

  • 是以 Base64 编码表示的加密值。
  • 加密后不保留输入值的字符集(“字母表”)或长度。
  • 使用 CryptoKey 通过 SIV 模式的 AES 加密 (AES-SIV) 计算得出。
  • 不一定是唯一的,因为每个相同的输入值在去标识化后都得到相同的令牌。这样可以更高效地搜索加密数据。如上下文中所述,您可以使用上下文“调整”来更改此行为。
  • 生成时添加了前缀,形式为 [SURROGATE_TYPE]([LENGTH]):,其中 [SURROGATE_TYPE] 表示用于说明输入值的代理 infoType[LENGTH] 指示其字符长度。借助代理,可使用用于去标识化的原始加密密钥来重标识令牌。

以下是使用确定性加密进行去标识化的 JSON 配置示例。请注意,由于我们是对电话号码进行去标识化,所以选择使用“PHONE_SURROGATE”作为描述性代理类型。 [CRYPTO_KEY] 表示从 Cloud KMS 中获取的未封装的加密密钥。(如需详细了解如何获取 CryptoKey,请参阅上一部分“保留格式加密”。)

{
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"PHONE_NUMBER"
            }
          ],
          "primitiveTransformation":{
            "cryptoDeterministicConfig":{
              "cryptoKey":{
                "unwrapped":{
                  "key":"[CRYPTO_KEY]"
                }
              },
              "surrogateInfoType":{
                "name":"PHONE_SURROGATE"
              }
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"PHONE_NUMBER"
      }
    ]
  },
  "item":{
    "value":"My phone number is 206-555-0574, call me"
  }
}

使用此转换对字符串“My phone number is 206-555-0574”进行去标识化会产生去标识化的字符串,如下所示:

My phone number is PHONE_SURROGATE(36):ATZBu5OCCSwo+e94xSYnKYljk1OQpkW7qhzx, call me

如需重标识此字符串,您可以使用如下所示的 JSON 请求,其中 [CRYPTO_KEY] 是用于对内容进行去标识化的同一加密密钥。

{
  "reidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"PHONE_SURROGATE"
            }
          ],
          "primitiveTransformation":{
            "cryptoDeterministicConfig":{
              "cryptoKey":{
                "unwrapped":{
                  "key":"[CRYPTO_KEY]"
                }
              },
              "surrogateInfoType":{
                "name":"PHONE_SURROGATE"
              }
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "customInfoTypes":[
      {
        "infoType":{
          "name":"PHONE_SURROGATE"
        },
        "surrogateType":{

        }
      }
    ]
  },
  "item":{
    "value":"My phone number is [PHONE_SURROGATE](36):ATZBu5OCCSwo+e94xSYnKYljk1OQpkW7qhzx, call me"
  }
}

重标识此字符串会生成原始字符串:

My phone number is 206-555-0574, call me

分桶

分桶转换用于通过将数值数据“分桶”到不同范围来对其进行去标识化处理。生成的数字范围是一个带连字符的字符串,由下限、连字符和上限组成。

固定大小的分桶

Cloud DLP 可以根据固定大小的范围(DLP API 中的 FixedSizeBucketingConfig)对数字输入值进行分桶。您可以指定以下内容以配置固定大小的分桶:

  • 所有分桶的下限值。任何小于此下限的值都将全部分组到一个分桶中。
  • 所有分桶的上限值。任何大于此上限的值都将全部分组到一个分桶中。
  • 每个分桶的大小(最小和最大分桶除外)。

例如,如果下限设置为 10,上限设置为 89,分桶大小设置为 10,则将使用以下分桶:-10、10-20、20-30、30-40、40-50、50-60、60-70、70-80、80-89、89+。

如需详细了解分桶的概念,请参阅泛化和分桶

可自定义的分桶

与固定大小的分桶相比,可自定义的分桶(DLP API 中的 BucketingConfig)具有更大的灵活性。您无需指定上限值、下限值和用于创建等大分桶的间隔值,而只需为要创建的每个分桶指定最大值和最小值。每个最大值和最小值对必须具有相同类型。

可以通过指定各个分桶来设置可自定义的分桶。每个分桶具有以下属性:

  • 分桶范围的下限。省略此值可创建没有下限的分桶。
  • 分桶范围的上限。省略此值可创建没有上限的分桶。
  • 此分桶范围的替换值。该值用于替换介于下限与上限之间的所有检测到的值。 如果未提供替换值,则会生成带连字符的最小-最大范围。

例如,请考虑以下针对此分桶转换的 JSON 配置:

"bucketingConfig":{
  "buckets":[
    {
      "min":{
        "integerValue":"1"
      },
      "max":{
        "integerValue":"30"
      },
      "replacementValue":{
        "stringValue":"LOW"
      }
    },
    {
      "min":{
        "integerValue":"31"
      },
      "max":{
        "integerValue":"65"
      },
      "replacementValue":{
        "stringValue":"MEDIUM"
      }
    },
    {
      "min":{
        "integerValue":"66"
      },
      "max":{
        "integerValue":"100"
      },
      "replacementValue":{
        "stringValue":"HIGH"
      }
    }
  ]
}

这会定义以下行为:

  • 遮盖 1 至 30 的整数值的方式是将它们替换为 LOW
  • 遮盖 31 至 65 的整数值的方式是将它们替换为 MEDIUM
  • 遮盖 66 至 100 的整数值的方式是将它们替换为 HIGH

如需详细了解分桶的概念,请参阅泛化和分桶

日期偏移

在对日期输入值使用日期偏移转换(DLP API 中的 DateShiftConfig)时,Cloud DLP 会按随机天数偏移日期。

日期偏移技术会随机偏移一组日期,但保留一段时间内的顺序和持续时间。通常在个人或实体的上下文中完成偏移日期。也就是说,您要按同一偏移差分偏移特定个人的所有日期,但对每个人使用不同的偏移差分。

如需详细了解日期偏移,请参阅日期偏移

下面是多种语言的示例代码,演示了如何通过 Cloud DLP API 使用日期偏移对日期进行去标识化。

Java


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.common.base.Splitter;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DateShiftConfig;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.FieldTransformation;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RecordTransformations;
import com.google.privacy.dlp.v2.Table;
import com.google.privacy.dlp.v2.Value;
import com.google.type.Date;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class DeIdentifyWithDateShift {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    Path inputCsvFile = Paths.get("path/to/your/input/file.csv");
    Path outputCsvFile = Paths.get("path/to/your/output/file.csv");
    deIdentifyWithDateShift(projectId, inputCsvFile, outputCsvFile);
  }

  public static void deIdentifyWithDateShift(
      String projectId, Path inputCsvFile, Path outputCsvFile) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Read the contents of the CSV file into a Table
      List<FieldId> headers;
      List<Table.Row> rows;
      try (BufferedReader input = Files.newBufferedReader(inputCsvFile)) {
        // Parse and convert the first line into header names
        headers =
            Arrays.stream(input.readLine().split(","))
                .map(header -> FieldId.newBuilder().setName(header).build())
                .collect(Collectors.toList());
        // Parse the remainder of the file as Table.Rows
        rows =
            input.lines().map(DeIdentifyWithDateShift::parseLineAsRow).collect(Collectors.toList());
      }
      Table table = Table.newBuilder().addAllHeaders(headers).addAllRows(rows).build();
      ContentItem item = ContentItem.newBuilder().setTable(table).build();

      // Set the maximum days to shift dates backwards (lower bound) or forward (upper bound)
      DateShiftConfig dateShiftConfig =
          DateShiftConfig.newBuilder().setLowerBoundDays(5).setUpperBoundDays(5).build();
      PrimitiveTransformation transformation =
          PrimitiveTransformation.newBuilder().setDateShiftConfig(dateShiftConfig).build();
      // Specify which fields the DateShift should apply too
      List<FieldId> dateFields = Arrays.asList(headers.get(1), headers.get(3));
      FieldTransformation fieldTransformation =
          FieldTransformation.newBuilder()
              .addAllFields(dateFields)
              .setPrimitiveTransformation(transformation)
              .build();
      RecordTransformations recordTransformations =
          RecordTransformations.newBuilder().addFieldTransformations(fieldTransformation).build();
      // Specify the config for the de-identify request
      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setRecordTransformations(recordTransformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Write the results to the target CSV file
      try (BufferedWriter writer = Files.newBufferedWriter(outputCsvFile)) {
        Table outTable = response.getItem().getTable();
        String headerOut =
            outTable.getHeadersList().stream()
                .map(FieldId::getName)
                .collect(Collectors.joining(","));
        writer.write(headerOut + "\n");

        List<String> rowOutput =
            outTable.getRowsList().stream()
                .map(row -> joinRow(row.getValuesList()))
                .collect(Collectors.toList());
        for (String line : rowOutput) {
          writer.write(line + "\n");
        }
        System.out.println("Content written to file: " + outputCsvFile.toString());
      }
    }
  }

  // Convert the string from the csv file into com.google.type.Date
  public static Date parseAsDate(String s) {
    LocalDate date = LocalDate.parse(s, DateTimeFormatter.ofPattern("MM/dd/yyyy"));
    return Date.newBuilder()
        .setDay(date.getDayOfMonth())
        .setMonth(date.getMonthValue())
        .setYear(date.getYear())
        .build();
  }

  // Each row is in the format: Name,BirthDate,CreditCardNumber,RegisterDate
  public static Table.Row parseLineAsRow(String line) {
    List<String> values = Splitter.on(",").splitToList(line);
    Value name = Value.newBuilder().setStringValue(values.get(0)).build();
    Value birthDate = Value.newBuilder().setDateValue(parseAsDate(values.get(1))).build();
    Value creditCardNumber = Value.newBuilder().setStringValue(values.get(2)).build();
    Value registerDate = Value.newBuilder().setDateValue(parseAsDate(values.get(3))).build();
    return Table.Row.newBuilder()
        .addValues(name)
        .addValues(birthDate)
        .addValues(creditCardNumber)
        .addValues(registerDate)
        .build();
  }

  public static String formatDate(Date d) {
    return String.format("%s/%s/%s", d.getMonth(), d.getDay(), d.getYear());
  }

  public static String joinRow(List<Value> values) {
    String name = values.get(0).getStringValue();
    String birthDate = formatDate(values.get(1).getDateValue());
    String creditCardNumber = values.get(2).getStringValue();
    String registerDate = formatDate(values.get(3).getDateValue());
    return String.join(",", name, birthDate, creditCardNumber, registerDate);
  }
}

Node.js

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// Import other required libraries
const fs = require('fs');

// The project ID to run the API call under
// const callingProjectId = process.env.GCLOUD_PROJECT;

// The path to the CSV file to deidentify
// The first row of the file must specify column names, and all other rows
// must contain valid values
// const inputCsvFile = '/path/to/input/file.csv';

// The path to save the date-shifted CSV file to
// const outputCsvFile = '/path/to/output/file.csv';

// The list of (date) fields in the CSV file to date shift
// const dateFields = [{ name: 'birth_date'}, { name: 'register_date' }];

// The maximum number of days to shift a date backward
// const lowerBoundDays = 1;

// The maximum number of days to shift a date forward
// const upperBoundDays = 1;

// (Optional) The column to determine date shift amount based on
// If this is not specified, a random shift amount will be used for every row
// If this is specified, then 'wrappedKey' and 'keyName' must also be set
// const contextFieldId = [{ name: 'user_id' }];

// (Optional) The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// If this is specified, then 'wrappedKey' and 'contextFieldId' must also be set
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// (Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
// This key should be encrypted using the Cloud KMS key specified above
// If this is specified, then 'keyName' and 'contextFieldId' must also be set
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// Helper function for converting CSV rows to Protobuf types
const rowToProto = row => {
  const values = row.split(',');
  const convertedValues = values.map(value => {
    if (Date.parse(value)) {
      const date = new Date(value);
      return {
        dateValue: {
          year: date.getFullYear(),
          month: date.getMonth() + 1,
          day: date.getDate(),
        },
      };
    } else {
      // Convert all non-date values to strings
      return {stringValue: value.toString()};
    }
  });
  return {values: convertedValues};
};

// Read and parse a CSV file
const csvLines = fs
  .readFileSync(inputCsvFile)
  .toString()
  .split('\n')
  .filter(line => line.includes(','));
const csvHeaders = csvLines[0].split(',');
const csvRows = csvLines.slice(1);

// Construct the table object
const tableItem = {
  table: {
    headers: csvHeaders.map(header => {
      return {name: header};
    }),
    rows: csvRows.map(row => rowToProto(row)),
  },
};

// Construct DateShiftConfig
const dateShiftConfig = {
  lowerBoundDays: lowerBoundDays,
  upperBoundDays: upperBoundDays,
};

if (contextFieldId && keyName && wrappedKey) {
  dateShiftConfig.context = {name: contextFieldId};
  dateShiftConfig.cryptoKey = {
    kmsWrapped: {
      wrappedKey: wrappedKey,
      cryptoKeyName: keyName,
    },
  };
} else if (contextFieldId || keyName || wrappedKey) {
  throw new Error(
    'You must set either ALL or NONE of {contextFieldId, keyName, wrappedKey}!'
  );
}

// Construct deidentification request
const request = {
  parent: `projects/${callingProjectId}/locations/global`,
  deidentifyConfig: {
    recordTransformations: {
      fieldTransformations: [
        {
          fields: dateFields,
          primitiveTransformation: {
            dateShiftConfig: dateShiftConfig,
          },
        },
      ],
    },
  },
  item: tableItem,
};

try {
  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const tableRows = response.item.table.rows;

  // Write results to a CSV file
  tableRows.forEach((row, rowIndex) => {
    const rowValues = row.values.map(
      value =>
        value.stringValue ||
        `${value.dateValue.month}/${value.dateValue.day}/${value.dateValue.year}`
    );
    csvLines[rowIndex + 1] = rowValues.join(',');
  });
  csvLines.push('');
  fs.writeFileSync(outputCsvFile, csvLines.join('\n'));

  // Print status
  console.log(`Successfully saved date-shift output to ${outputCsvFile}`);
} catch (err) {
  console.log(`Error in deidentifyWithDateShift: ${err.message || err}`);
}

Python

def deidentify_with_date_shift(
    project,
    input_csv_file=None,
    output_csv_file=None,
    date_fields=None,
    lower_bound_days=None,
    upper_bound_days=None,
    context_field_id=None,
    wrapped_key=None,
    key_name=None,
):
    """Uses the Data Loss Prevention API to deidentify dates in a CSV file by
        pseudorandomly shifting them.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_csv_file: The path to the CSV file to deidentify. The first row
            of the file must specify column names, and all other rows must
            contain valid values.
        output_csv_file: The path to save the date-shifted CSV file.
        date_fields: The list of (date) fields in the CSV file to date shift.
            Example: ['birth_date', 'register_date']
        lower_bound_days: The maximum number of days to shift a date backward
        upper_bound_days: The maximum number of days to shift a date forward
        context_field_id: (Optional) The column to determine date shift amount
            based on. If this is not specified, a random shift amount will be
            used for every row. If this is specified, then 'wrappedKey' and
            'keyName' must also be set. Example:
            contextFieldId = [{ 'name': 'user_id' }]
        key_name: (Optional) The name of the Cloud KMS key used to encrypt
            ('wrap') the AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: (Optional) The encrypted ('wrapped') AES-256 key to use.
            This key should be encrypted using the Cloud KMS key specified by
            key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """
    # Import the client library
    import google.cloud.dlp

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = dlp.project_path(project)

    # Convert date field list to Protobuf type
    def map_fields(field):
        return {"name": field}

    if date_fields:
        date_fields = map(map_fields, date_fields)
    else:
        date_fields = []

    # Read and parse the CSV file
    import csv
    from datetime import datetime

    f = []
    with open(input_csv_file, "r") as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            f.append(row)

    #  Helper function for converting CSV rows to Protobuf types
    def map_headers(header):
        return {"name": header}

    def map_data(value):
        try:
            date = datetime.strptime(value, "%m/%d/%Y")
            return {
                "date_value": {
                    "year": date.year,
                    "month": date.month,
                    "day": date.day,
                }
            }
        except ValueError:
            return {"string_value": value}

    def map_rows(row):
        return {"values": map(map_data, row)}

    # Using the helper functions, convert CSV rows to protobuf-compatible
    # dictionaries.
    csv_headers = map(map_headers, f[0])
    csv_rows = map(map_rows, f[1:])

    # Construct the table dict
    table_item = {"table": {"headers": csv_headers, "rows": csv_rows}}
    # Construct date shift config
    date_shift_config = {
        "lower_bound_days": lower_bound_days,
        "upper_bound_days": upper_bound_days,
    }

    # If using a Cloud KMS key, add it to the date_shift_config.
    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    if context_field_id and key_name and wrapped_key:
        import base64

        date_shift_config["context"] = {"name": context_field_id}
        date_shift_config["crypto_key"] = {
            "kms_wrapped": {
                "wrapped_key": base64.b64decode(wrapped_key),
                "crypto_key_name": key_name,
            }
        }
    elif context_field_id or key_name or wrapped_key:
        raise ValueError(
            """You must set either ALL or NONE of
        [context_field_id, key_name, wrapped_key]!"""
        )

    # Construct Deidentify Config
    deidentify_config = {
        "record_transformations": {
            "field_transformations": [
                {
                    "fields": date_fields,
                    "primitive_transformation": {
                        "date_shift_config": date_shift_config
                    },
                }
            ]
        }
    }

    # Write to CSV helper methods
    def write_header(header):
        return header.name

    def write_data(data):
        return data.string_value or "%s/%s/%s" % (
            data.date_value.month,
            data.date_value.day,
            data.date_value.year,
        )

    # Call the API
    response = dlp.deidentify_content(
        parent, deidentify_config=deidentify_config, item=table_item
    )

    # Write results to CSV file
    with open(output_csv_file, "w") as csvfile:
        write_file = csv.writer(csvfile, delimiter=",")
        write_file.writerow(map(write_header, response.item.table.headers))
        for row in response.item.table.rows:
            write_file.writerow(map(write_data, row.values))
    # Print status
    print("Successfully saved date-shift output to {}".format(output_csv_file))

Go

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	dlppb "google.golang.org/genproto/googleapis/privacy/dlp/v2"
)

// deidentifyDateShift shifts dates found in the input between lowerBoundDays and
// upperBoundDays.
func deidentifyDateShift(w io.Writer, projectID string, lowerBoundDays, upperBoundDays int32, input string) error {
	// projectID := "my-project-id"
	// lowerBoundDays := -1
	// upperBound := -1
	// input := "2016-01-10"
	// Will print "2016-01-09"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %v", err)
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_DateShiftConfig{
									DateShiftConfig: &dlppb.DateShiftConfig{
										LowerBoundDays: lowerBoundDays,
										UpperBoundDays: upperBoundDays,
									},
								},
							},
						},
					},
				},
			},
		},
		// The InspectConfig is used to identify the DATE fields.
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: []*dlppb.InfoType{
				{
					Name: "DATE",
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %v", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

/**
 * Deidentify dates in a CSV file by pseudorandomly shifting them.
 */
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CryptoKey;
use Google\Cloud\Dlp\V2\DateShiftConfig;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DlpServiceClient;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\FieldTransformation;
use Google\Cloud\Dlp\V2\KmsWrappedCryptoKey;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\RecordTransformations;
use Google\Cloud\Dlp\V2\Table;
use Google\Cloud\Dlp\V2\Table\Row;
use Google\Cloud\Dlp\V2\Value;
use Google\Type\Date;

/** Uncomment and populate these variables in your code */
// $callingProject = 'The GCP Project ID to run the API call under';
// $inputCsvFile = 'The path to the CSV file to deidentify';
// $outputCsvFile = 'The path to save the date-shifted CSV file to';
// $dateFieldNames = 'The comma-separated list of (date) fields in the CSV file to date shift';
// $lowerBoundDays = 'The maximum number of days to shift a date backward';
// $upperBoundDays = 'The maximum number of days to shift a date forward';
/**
 * If contextFieldName is not specified, a random shift amount will be used for every row.
 * If contextFieldName is specified, then 'wrappedKey' and 'keyName' must also be set
 */
// $contextFieldName = ''; (Optional) The column to determine date shift amount based on
// $keyName = ''; // Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
// $wrappedKey = ''; // (Optional) The name of the Cloud KMS key used to encrypt (wrap) the AES-256 key

// Instantiate a client.
$dlp = new DlpServiceClient();

// Read a CSV file
$csvLines = file($inputCsvFile, FILE_IGNORE_NEW_LINES);
$csvHeaders = explode(',', $csvLines[0]);
$csvRows = array_slice($csvLines, 1);

// Convert CSV file into protobuf objects
$tableHeaders = array_map(function ($csvHeader) {
    return (new FieldId)->setName($csvHeader);
}, $csvHeaders);

$tableRows = array_map(function ($csvRow) {
    $rowValues = array_map(function ($csvValue) {
        if ($csvDate = DateTime::createFromFormat('m/d/Y', $csvValue)) {
            $date = (new Date())
                ->setYear((int) $csvDate->format('Y'))
                ->setMonth((int) $csvDate->format('m'))
                ->setDay((int) $csvDate->format('d'));
            return (new Value())
                ->setDateValue($date);
        } else {
            return (new Value())
                ->setStringValue($csvValue);
        }
    }, explode(',', $csvRow));

    return (new Row())
        ->setValues($rowValues);
}, $csvRows);

// Convert date fields into protobuf objects
$dateFields = array_map(function ($dateFieldName) {
    return (new FieldId())->setName($dateFieldName);
}, explode(',', $dateFieldNames));

// Construct the table object
$table = (new Table())
    ->setHeaders($tableHeaders)
    ->setRows($tableRows);

$item = (new ContentItem())
    ->setTable($table);

// Construct dateShiftConfig
$dateShiftConfig = (new DateShiftConfig())
    ->setLowerBoundDays($lowerBoundDays)
    ->setUpperBoundDays($upperBoundDays);

if ($contextFieldName && $keyName && $wrappedKey) {
    $contextField = (new FieldId())
        ->setName($contextFieldName);

    // Create the wrapped crypto key configuration object
    $kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
        ->setWrappedKey(base64_decode($wrappedKey))
        ->setCryptoKeyName($keyName);

    $cryptoKey = (new CryptoKey())
        ->setKmsWrapped($kmsWrappedCryptoKey);

    $dateShiftConfig
        ->setContext($contextField)
        ->setCryptoKey($cryptoKey);
} elseif ($contextFieldName || $keyName || $wrappedKey) {
    throw new Exception('You must set either ALL or NONE of {$contextFieldName, $keyName, $wrappedKey}!');
}

// Create the information transform configuration objects
$primitiveTransformation = (new PrimitiveTransformation())
    ->setDateShiftConfig($dateShiftConfig);

$fieldTransformation = (new FieldTransformation())
    ->setPrimitiveTransformation($primitiveTransformation)
    ->setFields($dateFields);

$recordTransformations = (new RecordTransformations())
    ->setFieldTransformations([$fieldTransformation]);

// Create the deidentification configuration object
$deidentifyConfig = (new DeidentifyConfig())
    ->setRecordTransformations($recordTransformations);

$parent = $dlp->projectName($callingProjectId);

// Run request
$response = $dlp->deidentifyContent($parent, [
    'deidentifyConfig' => $deidentifyConfig,
    'item' => $item
]);

// Check for errors
foreach ($response->getOverview()->getTransformationSummaries() as $summary) {
    foreach ($summary->getResults() as $result) {
        if ($details = $result->getDetails()) {
            printf('Error: %s' . PHP_EOL, $details);
            return;
        }
    }
}

// Save the results to a file
$csvRef = fopen($outputCsvFile, 'w');
fputcsv($csvRef, $csvHeaders);
foreach ($response->getItem()->getTable()->getRows() as $tableRow) {
    $values = array_map(function ($tableValue) {
        if ($tableValue->getStringValue()) {
            return $tableValue->getStringValue();
        }
        $protoDate = $tableValue->getDateValue();
        $date = mktime(0, 0, 0, $protoDate->getMonth(), $protoDate->getDay(), $protoDate->getYear());
        return strftime('%D', $date);
    }, iterator_to_array($tableRow->getValues()));
    fputcsv($csvRef, $values);
};
fclose($csvRef);
printf('Deidentified dates written to %s' . PHP_EOL, $outputCsvFile);

C#


using System;
using System.IO;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;

public class DeidentifyWithDateShift
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string inputCsvFilePath,
        int lowerBoundDays,
        int upperBoundDays,
        string dateFields,
        string contextField,
        string keyName,
        string wrappedKey)
    {
        var hasKeyName = !string.IsNullOrEmpty(keyName);
        var hasWrappedKey = !string.IsNullOrEmpty(wrappedKey);
        var hasContext = !string.IsNullOrEmpty(contextField);
        bool allFieldsSet = hasKeyName && hasWrappedKey && hasContext;
        bool noFieldsSet = !hasKeyName && !hasWrappedKey && !hasContext;
        if (!(allFieldsSet || noFieldsSet))
        {
            throw new ArgumentException("Must specify ALL or NONE of: {contextFieldId, keyName, wrappedKey}!");
        }

        var dlp = DlpServiceClient.Create();

        // Read file
        var csvLines = File.ReadAllLines(inputCsvFilePath);
        var csvHeaders = csvLines[0].Split(',');
        var csvRows = csvLines.Skip(1).ToArray();

        // Convert dates to protobuf format, and everything else to a string
        var protoHeaders = csvHeaders.Select(header => new FieldId { Name = header });
        var protoRows = csvRows.Select(csvRow =>
        {
            var rowValues = csvRow.Split(',');
            var protoValues = rowValues.Select(rowValue =>
               System.DateTime.TryParse(rowValue, out var parsedDate)
               ? new Value { DateValue = Google.Type.Date.FromDateTime(parsedDate) }
               : new Value { StringValue = rowValue });

            var rowObject = new Table.Types.Row();
            rowObject.Values.Add(protoValues);
            return rowObject;
        });

        var dateFieldList = dateFields
            .Split(',')
            .Select(field => new FieldId { Name = field });

        // Construct + execute the request
        var dateShiftConfig = new DateShiftConfig
        {
            LowerBoundDays = lowerBoundDays,
            UpperBoundDays = upperBoundDays
        };

        dateShiftConfig.Context = new FieldId { Name = contextField };
        dateShiftConfig.CryptoKey = new CryptoKey
        {
            KmsWrapped = new KmsWrappedCryptoKey
            {
                WrappedKey = ByteString.FromBase64(wrappedKey),
                CryptoKeyName = keyName
            }
        };

        var deidConfig = new DeidentifyConfig
        {
            RecordTransformations = new RecordTransformations
            {
                FieldTransformations =
                {
                    new FieldTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            DateShiftConfig = dateShiftConfig
                        },
                        Fields = { dateFieldList }
                    }
                }
            }
        };

        var response = dlp.DeidentifyContent(
            new DeidentifyContentRequest
            {
                Parent = new LocationName(projectId, "global").ToString(),
                DeidentifyConfig = deidConfig,
                Item = new ContentItem
                {
                    Table = new Table
                    {
                        Headers = { protoHeaders },
                        Rows = { protoRows }
                    }
                }
            });

        return response;
    }
}

时间提取

执行时间提取(DLP API 中的 TimePartConfig)对象会保留日期、时间或时间戳匹配值的一部分。您应向 Cloud DLP 指定要提取的时间值类型,包括年、月、日等(在 TimePart 对象中枚举)。

例如,假设您已通过将要提取的时间部分设为 YEAR 来配置 timePartConfig 转换。在将下面第一列中的数据发送到 Cloud DLP 后,您最终将获得第二列中显示的转换后的值:

原始值 转换后的值
9/21/1976 1976
6/7/1945 1945
1/20/2009 2009
7/4/1776 1776
8/1/1984 1984
4/21/1982 1982