Cloud Data Loss Prevention (Cloud DLP) fait désormais partie de la protection des données sensibles. Le nom de l'API reste le même: API Cloud Data Loss Prevention (API DLP). Pour en savoir plus sur les services qui composent la protection des données sensibles, consultez Présentation de la protection des données sensibles.

Anonymiser les données sensibles

La protection des données sensibles peut anonymiser ces données dans du contenu textuel, y compris dans du texte stocké dans des structures de conteneurs telles que des tables. L'anonymisation est le processus qui consiste à éliminer les informations personnelles contenues dans les données. L'API détecte les données sensibles telles que les informations personnelles, puis procède à une transformation d'anonymisation pour masquer, supprimer ou dissimuler les données. Voici des exemples de techniques de suppression :

Masquer les données sensibles en remplaçant partiellement ou entièrement les caractères par un symbole, tel qu'un astérisque (*) ou un dièse (#)
Remplacer chaque instance de données sensibles par une chaîne de type "jeton" ou de substitution
Chiffrer et remplacer les données sensibles à l'aide d'une clé générée de manière aléatoire ou prédéterminée

Vous pouvez transmettre des informations à l'API en utilisant JSON sur HTTPS, ainsi que la CLI et plusieurs langages de programmation à l'aide des bibliothèques clientes de protection des données sensibles. Pour configurer l'interface de ligne de commande, consultez le quickstart. Pour en savoir plus sur l'envoi d'informations au format JSON, consultez le guide de démarrage rapide JSON.

Présentation de l'API

Pour anonymiser les données sensibles, utilisez la méthode content.deidentify de la protection des données sensibles.

Un appel d'API d'anonymisation est construit autour de trois éléments :

Les données à inspecter : Structure de chaîne ou de table (objet ContentItem) que l'API doit inspecter.
Les éléments à rechercher : Informations de configuration de la détection (InspectConfig ) telles que les types de données (ou InfoTypes), le filtrage des résultats dépassant un certain seuil de probabilité, les limitations éventuelles sur le nombre de résultats à renvoyer, etc. Si vous ne spécifiez aucun infoType dans un argument InspectConfig, cela équivaut à spécifier tous les infoTypes intégrés. Cette pratique n'est pas recommandée, car cela peut entraîner une baisse des performances et une augmentation des coûts.
Que faire des résultats de l'inspection : Informations de configuration (DeidentifyConfig) qui définissent la manière dont vous souhaitez anonymiser les données sensibles. Cet argument est abordé plus en détail dans la section suivante.

L'API renvoie des éléments quasiment identiques à ceux que vous lui avez fournis et dans le même format. Cependant, le texte reconnu comme contenant des informations sensibles (selon les critères que vous avez définis) a été anonymisé.

Spécifier les critères de détection

Les détecteurs de types d'informations (ou "infoTypes") sont les mécanismes utilisés par le service de protection des données sensibles pour identifier ces données.

La protection des données sensibles comprend plusieurs types de détecteurs d'infoTypes, qui sont tous résumés ici:

Les détecteurs d'infoTypes intégrés font partie intégrante du service de protection des données sensibles. Ils incluent des détecteurs pour les types de données sensibles spécifiques à un pays ou à une région, ainsi que les types de données applicables au niveau mondial.
Les détecteurs d'infoTypes personnalisés sont des détecteurs que vous créez vous-même. Il existe trois types de détecteurs d'infoTypes personnalisés :
- Les détecteurs de dictionnaires personnalisés standards sont de simples listes de mots dont se base la protection des données sensibles. Utilisez des détecteurs de dictionnaires personnalisés standards lorsque vous avez une liste qui contient au maximum plusieurs dizaines de milliers de mots ou d'expressions. Les détecteurs de dictionnaires personnalisés standards sont recommandés si vous pensez que votre liste de mots ne changera pas de manière significative.
- Les détecteurs de dictionnaires personnalisés stockés sont générés par la protection des données sensibles à l'aide de listes volumineuses de mots ou d'expressions stockés dans Cloud Storage ou BigQuery. Utilisez des détecteurs de dictionnaires personnalisés stockés lorsque vous avez une longue liste de mots ou d'expressions, pouvant atteindre plusieurs dizaines de millions d'éléments.
- Les détecteurs d'expressions régulières permettent à la protection des données sensibles de détecter les correspondances basées sur un modèle d'expression régulière.

En outre, la protection des données sensibles inclut le concept de règles d'inspection, qui vous permettent d'affiner les résultats de l'analyse à l'aide des éléments suivants:

Les règles d'exclusion vous permettent de réduire le nombre de résultats renvoyés en ajoutant des règles à un détecteur d'infoType intégré ou personnalisé.
Les règles relatives aux mots clés vous permettent d'augmenter la quantité ou de modifier la valeur de probabilité des résultats renvoyés en ajoutant des règles à un détecteur d'infoType intégré ou personnalisé.

Transformations d'anonymisation

Lorsque vous définissez la configuration d'anonymisation (DeidentifyConfig), vous devez spécifier une ou plusieurs transformations. Il existe deux catégories de transformations.

InfoTypeTransformations : transformations qui ne sont appliquées qu'aux valeurs identifiées comme un infoType spécifique au sein du texte envoyé
RecordTransformations : transformations qui ne sont appliquées qu'aux valeurs identifiées comme un infoType spécifique au sein des données textuelles tabulaires envoyées, ou à une colonne entière de données tabulaires

Transformations d'infoType

Vous pouvez spécifier une ou plusieurs transformations d'infoTypes par requête. Dans chaque objet InfoTypeTransformation, vous spécifiez les deux éléments suivants :

Un ou plusieurs infoTypes auxquels une transformation doit être appliquée (objet de tableau infoTypes[])
Une transformation primitive (objet PrimitiveTransformation)

Notez que vous n'êtes pas obligé de spécifier un infoType. Toutefois, si vous ne définissez aucun infoType dans un argument InspectConfig, la transformation s'applique à tous les infoTypes intégrés pour lesquels aucune transformation n'est fournie. Cette pratique n'est pas recommandée, car cela peut entraîner une baisse des performances et une augmentation des coûts.

Transformations primitives

Vous devez spécifier au moins une transformation primitive à appliquer à l'entrée, qu'elle soit appliquée uniquement à certains infoTypes ou à l'ensemble de la chaîne de texte. Les sections suivantes décrivent des exemples de méthodes de transformation que vous pouvez utiliser. Pour obtenir la liste de toutes les méthodes de transformation proposées par la protection des données sensibles, consultez la documentation de référence sur les transformations.

replaceConfig

Si vous définissez la valeur de replaceConfig sur un objet ReplaceValueConfig, les valeurs détectées sont remplacées par une valeur que vous spécifiez.

Supposons par exemple que vous ayez défini replaceConfig sur "[email-address]" pour tous les infoTypes EMAIL_ADDRESS et que la chaîne suivante soit envoyée au service de protection des données sensibles:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

La chaîne renvoyée sera la suivante :

My name is Alicia Abernathy, and my email address is [email-address].

L'exemple de code JSON suivant et le code dans plusieurs langages montrent comment former la requête API et ce que renvoie l'API DLP:

Python

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Pour vous authentifier auprès de la protection des données sensibles, configurez les Identifiants par défaut de l'application. Pour en savoir plus, consultez Configurer l'authentification pour un environnement de développement local.

Afficher sur GitHub Commentaires

from typing import List

import google.cloud.dlp

def deidentify_with_replace(
    project: str,
    input_str: str,
    info_types: List[str],
    replacement_str: str = "REPLACEMENT_STR",
) -> None:
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by replacing matched input values with a value you specify.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
        replacement_str: The string to replace all values that match given
            info types.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Construct inspect configuration dictionary
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "replace_config": {
                            "new_value": {"string_value": replacement_str}
                        }
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print out the results.
    print(response.item.value)

Java

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.ReplaceValueConfig;
import com.google.privacy.dlp.v2.Value;

public class DeIdentifyWithReplacement {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect =
        "My name is Alicia Abernathy, and my email address is aabernathy@example.com.";
    deIdentifyWithReplacement(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void deIdentifyWithReplacement(String projectId, String textToRedact) {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder().setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Specify replacement string to be used for the finding.
      ReplaceValueConfig replaceValueConfig =
          ReplaceValueConfig.newBuilder()
              .setNewValue(Value.newBuilder().setStringValue("[email-address]").build())
              .build();
      // Define type of deidentification as replacement.
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder().setReplaceConfig(replaceValueConfig).build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation =
          InfoTypeTransformation.newBuilder()
              .addInfoTypes(infoType)
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig =
          DeidentifyConfig.newBuilder()
              .setInfoTypeTransformations(
                  InfoTypeTransformations.newBuilder().addTransformations(transformation))
              .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    } catch (Exception e) {
      System.out.println("Error during inspectString: \n" + e.toString());
    }
  }
}

REST

Pour découvrir comment utiliser le format JSON avec l'API DLP, consultez le démarrage rapide JSON.

Entrée JSON :

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "replaceConfig":{
              "newValue":{
                "stringValue":"[email-address]"
              }
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

Sortie JSON :

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is [email-address]."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "replaceConfig":{
            "newValue":{
              "stringValue":"[email-address]"
            }
          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

redactConfig

Si vous spécifiez redactConfig, vous masquez une valeur donnée en la supprimant complètement. Le message redactConfig n'accepte aucun argument, sa simple présence active la transformation.

Supposons par exemple que vous ayez spécifié redactConfig pour tous les infoTypes EMAIL_ADDRESS et que la chaîne suivante soit envoyée à la protection des données sensibles:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

La chaîne renvoyée sera la suivante :

My name is Alicia Abernathy, and my email address is .

Les exemples suivants montrent comment former la requête API et ce que renvoie l'API DLP:

C#

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyDataUsingRedactWithMatchedInputValues
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string text,
        IEnumerable<InfoType> infoTypes = null)
    {
        // Instantiate the client.
        var dlp = DlpServiceClient.Create();

        // Construct inspect config.
        var inspectConfig = new InspectConfig
        {
            InfoTypes = { infoTypes ?? new InfoType[] { new InfoType { Name = "EMAIL_ADDRESS" } } },
        };

        // Construct redact config.
        var redactConfig = new RedactConfig();

        // Construct deidentify config using redact config.
        var deidentifyConfig = new DeidentifyConfig
        {
            InfoTypeTransformations = new InfoTypeTransformations
            {
                Transformations =
                {
                    new InfoTypeTransformations.Types.InfoTypeTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            RedactConfig = redactConfig
                        }
                    }
                }
            }
        };

        // Construct a request.
        var request = new DeidentifyContentRequest
        {
            ParentAsLocationName = new LocationName(projectId, "global"),
            DeidentifyConfig = deidentifyConfig,
            InspectConfig = inspectConfig,
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Check the deidentified content.
        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

Go

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// deidentifyWithRedact de-identify the data by redacting with matched input values
func deidentifyWithRedact(w io.Writer, projectID, inputStr string, infoTypeNames []string) error {
	// projectID := "my-project-id"
	// inputStr := "My name is Alicia Abernathy, and my email address is aabernathy@example.com."
	// infoTypeNames := []string{"EMAIL_ADDRESS"}

	ctx := context.Background()

	// Initialize a client once and reuse it to send multiple requests. Clients
	// are safe to use across goroutines. When the client is no longer needed,
	// call the Close method to cleanup its resources.
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %w", err)
	}

	// Closing the client safely cleans up background resources.
	defer client.Close()

	// Specify the content to be inspected.
	contentItem := &dlppb.ContentItem{
		DataItem: &dlppb.ContentItem_Value{
			Value: inputStr,
		},
	}

	// Specify the type of info the inspection will look for.
	// See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}
	inspectConfig := &dlppb.InspectConfig{
		InfoTypes: infoTypes,
	}

	// Define type of de-identification.
	primitiveTransformation := &dlppb.PrimitiveTransformation{
		Transformation: &dlppb.PrimitiveTransformation_RedactConfig{
			RedactConfig: &dlppb.RedactConfig{},
		},
	}

	// Associate de-identification type with info type.
	transformation := &dlppb.InfoTypeTransformations_InfoTypeTransformation{
		InfoTypes:               infoTypes,
		PrimitiveTransformation: primitiveTransformation,
	}

	// Construct the configuration for the Redact request and list all desired transformations.
	redactConfig := &dlppb.DeidentifyConfig{
		Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
			InfoTypeTransformations: &dlppb.InfoTypeTransformations{
				Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
					transformation,
				},
			},
		},
	}

	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent:           fmt.Sprintf("projects/%s/locations/global", projectID),
		DeidentifyConfig: redactConfig,
		InspectConfig:    inspectConfig,
		Item:             contentItem,
	}

	// Send the request.
	resp, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return err
	}

	// Print the result.
	fmt.Fprintf(w, "output: %v", resp.GetItem().GetValue())
	return nil
}

Java

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RedactConfig;

public class DeIdentifyWithRedaction {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToInspect =
        "My name is Alicia Abernathy, and my email address is aabernathy@example.com.";
    deIdentifyWithRedaction(projectId, textToInspect);
  }

  // Inspects the provided text.
  public static void deIdentifyWithRedaction(String projectId, String textToRedact) {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify the content to be inspected.
      ContentItem item = ContentItem.newBuilder().setValue(textToRedact).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("EMAIL_ADDRESS").build();
      InspectConfig inspectConfig = InspectConfig.newBuilder().addInfoTypes(infoType).build();
      // Define type of deidentification.
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setRedactConfig(RedactConfig.getDefaultInstance())
              .build();
      // Associate deidentification type with info type.
      InfoTypeTransformation transformation =
          InfoTypeTransformation.newBuilder()
              .addInfoTypes(infoType)
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      // Construct the configuration for the Redact request and list all desired transformations.
      DeidentifyConfig redactConfig =
          DeidentifyConfig.newBuilder()
              .setInfoTypeTransformations(
                  InfoTypeTransformations.newBuilder().addTransformations(transformation))
              .build();

      // Construct the Redact request to be sent by the client.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(redactConfig)
              .setInspectConfig(inspectConfig)
              .build();

      // Use the client to send the API request.
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Parse the response and process results
      System.out.println("Text after redaction: " + response.getItem().getValue());
    } catch (Exception e) {
      System.out.println("Error during inspectString: \n" + e.toString());
    }
  }
}

Node.js

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// TODO(developer): Replace these variables before running the sample.
// const projectId = "your-project-id";

// The string to deidentify
// const string =
//   'My name is Alicia Abernathy, and my email address is aabernathy@example.com.';

// The infoTypes of information to match
// See https://cloud.google.com/dlp/docs/concepts-infotypes for more information
// about supported infoTypes.
// const infoTypes = [{name: 'EMAIL_ADDRESS'}];

async function deIdentifyRedaction() {
  // Construct deidentify configuration
  const deidentifyConfig = {
    infoTypeTransformations: {
      transformations: [
        {
          infoTypes: infoTypes,
          primitiveTransformation: {
            redactConfig: {},
          },
        },
      ],
    },
  };

  // Construct inspect configuration
  const inspectConfig = {
    infoTypes: infoTypes,
  };

  // Construct Item
  const item = {
    value: string,
  };

  // Combine configurations into a request for the service.
  const request = {
    parent: `projects/${projectId}/locations/global`,
    item,
    deidentifyConfig,
    inspectConfig,
  };

  // Send the request and receive response from the service
  const [response] = await dlp.deidentifyContent(request);

  // Print the results
  console.log(`Text after redaction: ${response.item.value}`);
}

deIdentifyRedaction();

PHP

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DeidentifyContentRequest;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\InspectConfig;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\RedactConfig;

/**
 * De-identify data: Redacting with matched input values
 * Uses the Data Loss Prevention API to de-identify sensitive data in a string by redacting matched input values.
 *
 * @param string $callingProjectId      The Google Cloud project id to use as a parent resource.
 * @param string $textToInspect         The string to deidentify (will be treated as text).
 */
function deidentify_redact(
    // TODO(developer): Replace sample parameters before running the code.
    string $callingProjectId,
    string $textToInspect = 'My name is Alicia Abernathy, and my email address is aabernathy@example.com.'

): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Specify the content to be de-identify.
    $contentItem = (new ContentItem())
        ->setValue($textToInspect);

    // Specify the type of info the inspection will look for.
    $infoType = (new InfoType())
        ->setName('EMAIL_ADDRESS');
    $inspectConfig = (new InspectConfig())
        ->setInfoTypes([$infoType]);

    // Define type of de-identification.
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setRedactConfig(new RedactConfig());

    // Associate de-identification type with info type.
    $transformation = (new InfoTypeTransformation())
        ->setInfoTypes([$infoType])
        ->setPrimitiveTransformation($primitiveTransformation);

    // Construct the configuration for the Redact request and list all desired transformations.
    $deidentifyConfig = (new DeidentifyConfig())
        ->setInfoTypeTransformations((new InfoTypeTransformations())
            ->setTransformations([$transformation]));

    $parent = "projects/$callingProjectId/locations/global";

    // Run request
    $deidentifyContentRequest = (new DeidentifyContentRequest())
        ->setParent($parent)
        ->setDeidentifyConfig($deidentifyConfig)
        ->setInspectConfig($inspectConfig)
        ->setItem($contentItem);
    $response = $dlp->deidentifyContent($deidentifyContentRequest);

    // Print results
    printf('Text after redaction: %s', $response->getItem()->getValue());
}

Python

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

from typing import List

import google.cloud.dlp

def deidentify_with_redact(
    project: str,
    input_str: str,
    info_types: List[str],
) -> None:
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by redacting matched input values.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Construct inspect configuration dictionary
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [{"primitive_transformation": {"redact_config": {}}}]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print out the results.
    print(response.item.value)

REST

Entrée JSON :

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "redactConfig":{

            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

Sortie JSON :

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is ."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "redactConfig":{

          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

characterMaskConfig

Si vous définissez characterMaskConfig sur un objet CharacterMaskConfig, vous masquez une partie de la chaîne en remplaçant un nombre donné de caractères par un caractère fixe. Le masquage peut être effectué à partir du début ou de la fin de la chaîne. Cette transformation fonctionne également avec les types de nombres tels que les entiers longs.

L'objet CharacterMaskConfig possède plusieurs arguments qui lui sont propres :

maskingCharacter : caractère permettant de masquer chaque caractère d'une valeur sensible. Par exemple, vous pouvez spécifier un astérisque (*) ou un dièse (#) pour masquer une série de nombres, comme un numéro de carte de crédit.
numberToMask : nombre de caractères à masquer. Si vous ne définissez pas cette valeur, tous les caractères correspondants sont masqués.
reverseOrder : indique si les caractères doivent être masqués dans l'ordre inverse ou non. Vous pouvez définir reverseOrder sur "true" pour masquer les caractères des valeurs détectées en commençant par la fin de la valeur. Si vous indiquez "false", le masquage s'effectue du début vers la fin de la valeur.
charactersToIgnore[] : un ou plusieurs caractères à ignorer lors du masquage des valeurs. Vous pouvez par exemple spécifier un trait d'union. Ainsi, les traits d'union figurant dans un numéro de téléphone ne sont pas masqués. Vous pouvez également spécifier un groupe de caractères communs (CharsToIgnore) à ignorer lors du masquage.

Par exemple, supposons que vous ayez défini characterMaskConfig pour qu'il soit masqué avec "#" pour les infotypes EMAIL_ADDRESS, à l'exception des caractères ".". et "@". Si la chaîne suivante est envoyée à la protection des données sensibles:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

La chaîne renvoyée sera la suivante :

My name is Alicia Abernathy, and my email address is ##########@#######.###.

Vous trouverez ci-dessous des exemples qui montrent comment utiliser l'API DLP pour anonymiser des données sensibles à l'aide de techniques de masquage.

Java

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CharacterMaskConfig;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import java.io.IOException;
import java.util.Arrays;

public class DeIdentifyWithMasking {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "My SSN is 372819127";
    deIdentifyWithMasking(projectId, textToDeIdentify);
  }

  public static void deIdentifyWithMasking(String projectId, String textToDeIdentify)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {

      // Specify what content you want the service to DeIdentify
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER").build();
      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(Arrays.asList(infoType)).build();

      // Specify how the info from the inspection should be masked.
      CharacterMaskConfig characterMaskConfig =
          CharacterMaskConfig.newBuilder()
              .setMaskingCharacter("X") // Character to replace the found info with
              .setNumberToMask(5) // How many characters should be masked
              .build();
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setCharacterMaskConfig(characterMaskConfig)
              .build();
      InfoTypeTransformation infoTypeTransformation =
          InfoTypeTransformation.newBuilder()
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      InfoTypeTransformations transformations =
          InfoTypeTransformations.newBuilder().addTransformations(infoTypeTransformation).build();

      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setInfoTypeTransformations(transformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(contentItem)
              .setInspectConfig(inspectConfig)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
      System.out.println("Text after masking: " + response.getItem().getValue());
    }
  }
}

Node.js

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const projectId = 'my-project-id';

// The string to deidentify
// const string = 'My SSN is 372819127';

// (Optional) The maximum number of sensitive characters to mask in a match
// If omitted from the request or set to 0, the API will mask any matching characters
// const numberToMask = 5;

// (Optional) The character to mask matching sensitive data with
// const maskingCharacter = 'x';

// Construct deidentification request
const item = {value: string};

async function deidentifyWithMask() {
  const request = {
    parent: `projects/${projectId}/locations/global`,
    deidentifyConfig: {
      infoTypeTransformations: {
        transformations: [
          {
            primitiveTransformation: {
              characterMaskConfig: {
                maskingCharacter: maskingCharacter,
                numberToMask: numberToMask,
              },
            },
          },
        ],
      },
    },
    item: item,
  };

  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
}

deidentifyWithMask();

Python

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

from typing import List

import google.cloud.dlp

def deidentify_with_mask(
    project: str,
    input_str: str,
    info_types: List[str],
    masking_character: str = None,
    number_to_mask: int = 0,
) -> None:
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string by masking it with a character.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
            A full list of info type categories can be fetched from the API.
        masking_character: The character to mask matching sensitive data with.
        number_to_mask: The maximum number of sensitive characters to mask in
            a match. If omitted or set to zero, the API will default to no
            maximum.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Construct inspect configuration dictionary
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "character_mask_config": {
                            "masking_character": masking_character,
                            "number_to_mask": number_to_mask,
                        }
                    }
                }
            ]
        }
    }

    # Construct item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print out the results.
    print(response.item.value)

Go

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// mask deidentifies the input by masking all provided info types with maskingCharacter
// and prints the result to w.
func mask(w io.Writer, projectID, input string, infoTypeNames []string, maskingCharacter string, numberToMask int32) error {
	// projectID := "my-project-id"
	// input := "My SSN is 111222333"
	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"}
	// maskingCharacter := "+"
	// numberToMask := 6
	// Will print "My SSN is ++++++333"

	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %w", err)
	}
	defer client.Close()
	// Convert the info type strings to a list of InfoTypes.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_CharacterMaskConfig{
									CharacterMaskConfig: &dlppb.CharacterMaskConfig{
										MaskingCharacter: maskingCharacter,
										NumberToMask:     numberToMask,
									},
								},
							},
						},
					},
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %w", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

use Google\Cloud\Dlp\V2\CharacterMaskConfig;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DeidentifyContentRequest;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;

/**
 * Deidentify sensitive data in a string by masking it with a character.
 *
 * @param string $callingProjectId The GCP Project ID to run the API call under
 * @param string $string           The string to deidentify
 * @param int    $numberToMask     (Optional) The maximum number of sensitive characters to mask in a match
 * @param string $maskingCharacter (Optional) The character to mask matching sensitive data with (defaults to "x")
 */
function deidentify_mask(
    string $callingProjectId,
    string $string,
    int $numberToMask = 0,
    string $maskingCharacter = 'x'
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // The infoTypes of information to mask
    $ssnInfoType = (new InfoType())
        ->setName('US_SOCIAL_SECURITY_NUMBER');
    $infoTypes = [$ssnInfoType];

    // Create the masking configuration object
    $maskConfig = (new CharacterMaskConfig())
        ->setMaskingCharacter($maskingCharacter)
        ->setNumberToMask($numberToMask);

    // Create the information transform configuration objects
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setCharacterMaskConfig($maskConfig);

    $infoTypeTransformation = (new InfoTypeTransformation())
        ->setPrimitiveTransformation($primitiveTransformation)
        ->setInfoTypes($infoTypes);

    $infoTypeTransformations = (new InfoTypeTransformations())
        ->setTransformations([$infoTypeTransformation]);

    // Create the deidentification configuration object
    $deidentifyConfig = (new DeidentifyConfig())
        ->setInfoTypeTransformations($infoTypeTransformations);

    $item = (new ContentItem())
        ->setValue($string);

    $parent = "projects/$callingProjectId/locations/global";

    // Run request
    $deidentifyContentRequest = (new DeidentifyContentRequest())
        ->setParent($parent)
        ->setDeidentifyConfig($deidentifyConfig)
        ->setItem($item);
    $response = $dlp->deidentifyContent($deidentifyContentRequest);

    // Print the results
    $deidentifiedValue = $response->getItem()->getValue();
    print($deidentifiedValue);
}

C#

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


using System;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;

public class DeidentifyWithMasking
{
    public static DeidentifyContentResponse Deidentify(string projectId, string text)
    {
        // Instantiate a client.
        var dlp = DlpServiceClient.Create();

        // Construct a request.
        var transformation = new InfoTypeTransformations.Types.InfoTypeTransformation
        {
            PrimitiveTransformation = new PrimitiveTransformation
            {
                CharacterMaskConfig = new CharacterMaskConfig
                {
                    MaskingCharacter = "*",
                    NumberToMask = 5,
                    ReverseOrder = false,
                }
            }
        };
        var request = new DeidentifyContentRequest
        {
            Parent = new LocationName(projectId, "global").ToString(),
            InspectConfig = new InspectConfig
            {
                InfoTypes =
                {
                    new InfoType { Name = "US_SOCIAL_SECURITY_NUMBER" }
                }
            },
            DeidentifyConfig = new DeidentifyConfig
            {
                InfoTypeTransformations = new InfoTypeTransformations
                {
                    Transformations = { transformation }
                }
            },
            Item = new ContentItem { Value = text }
        };

        // Call the API.
        var response = dlp.DeidentifyContent(request);

        // Inspect the results.
        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

REST

L'exemple JSON suivant montre comment former la requête API et ce que renvoie l'API DLP :

Entrée JSON :

POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
  },
  "deidentifyConfig":{
    "infoTypeTransformations":{
      "transformations":[
        {
          "infoTypes":[
            {
              "name":"EMAIL_ADDRESS"
            }
          ],
          "primitiveTransformation":{
            "characterMaskConfig":{
              "maskingCharacter":"#",
              "reverseOrder":false,
              "charactersToIgnore":[
                {
                  "charactersToSkip":".@"
                }
              ]
            }
          }
        }
      ]
    }
  },
  "inspectConfig":{
    "infoTypes":[
      {
        "name":"EMAIL_ADDRESS"
      }
    ]
  }
}

Sortie JSON :

{
  "item":{
    "value":"My name is Alicia Abernathy, and my email address is ##########@#######.###."
  },
  "overview":{
    "transformedBytes":"22",
    "transformationSummaries":[
      {
        "infoType":{
          "name":"EMAIL_ADDRESS"
        },
        "transformation":{
          "characterMaskConfig":{
            "maskingCharacter":"#",
            "charactersToIgnore":[
              {
                "charactersToSkip":".@"
              }
            ]
          }
        },
        "results":[
          {
            "count":"1",
            "code":"SUCCESS"
          }
        ],
        "transformedBytes":"22"
      }
    ]
  }
}

cryptoHashConfig

Si vous définissez cryptoHashConfig sur un objet CryptoHashConfig, vous effectuez une pseudonymisation sur une valeur d'entrée en générant une valeur de substitution à l'aide d'un hachage cryptographique.

Cette méthode remplace la valeur d'entrée par un "condensé" chiffré ou une valeur de hachage. Le condensé est calculé d'après le hachage SHA-256 de la valeur d'entrée. La clé cryptographique utilisée pour créer le hachage est un objet CryptoKey, qui doit avoir une taille de 32 ou 64 octets.

La méthode génère une représentation encodée en base64 du résultat haché. Actuellement, seules les chaînes et les valeurs entières peuvent être hachées.

Par exemple, supposons que vous ayez spécifié cryptoHashConfig pour tous les infoTypes EMAIL_ADDRESS et que l'objet CryptoKey consiste en une clé générée de manière aléatoire (une TransientCryptoKey). La chaîne suivante est ensuite envoyée à la protection des données sensibles:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

La chaîne générée de manière chiffrée qui est renvoyée ressemblera à ceci :

My name is Alicia Abernathy, and my email address is 41D1567F7F99F1DC2A5FAB886DEE5BEE.

Bien sûr, la chaîne hexadécimale sera générée de manière chiffrée et sera différente de celle montrée ici.

dateShiftConfig

Si vous définissez dateShiftConfig sur un objet DateShiftConfig, vous effectuez un changement de date sur une valeur d'entrée en décalant les dates d'un nombre aléatoire de jours.

Les techniques de changement de date modifient un ensemble de dates de manière aléatoire, tout en préservant la séquence et la durée d'une période. Le changement de date s'effectue généralement dans le contexte d'un individu ou d'une entité. Vous pouvez par exemple modifier toutes les dates pour un individu spécifique à l'aide du même différentiel de modification, mais appliquer un différentiel distinct pour un autre individu.

Pour savoir comment changer de date, consultez la page Changement de date.

Voici un exemple de code dans plusieurs langages qui montre comment utiliser l'API DLP pour anonymiser des dates à l'aide du changement de date.

Java

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.common.base.Splitter;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.DateShiftConfig;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.FieldTransformation;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.privacy.dlp.v2.RecordTransformations;
import com.google.privacy.dlp.v2.Table;
import com.google.privacy.dlp.v2.Value;
import com.google.type.Date;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class DeIdentifyWithDateShift {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    Path inputCsvFile = Paths.get("path/to/your/input/file.csv");
    Path outputCsvFile = Paths.get("path/to/your/output/file.csv");
    deIdentifyWithDateShift(projectId, inputCsvFile, outputCsvFile);
  }

  public static void deIdentifyWithDateShift(
      String projectId, Path inputCsvFile, Path outputCsvFile) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Read the contents of the CSV file into a Table
      List<FieldId> headers;
      List<Table.Row> rows;
      try (BufferedReader input = Files.newBufferedReader(inputCsvFile)) {
        // Parse and convert the first line into header names
        headers =
            Arrays.stream(input.readLine().split(","))
                .map(header -> FieldId.newBuilder().setName(header).build())
                .collect(Collectors.toList());
        // Parse the remainder of the file as Table.Rows
        rows =
            input.lines().map(DeIdentifyWithDateShift::parseLineAsRow).collect(Collectors.toList());
      }
      Table table = Table.newBuilder().addAllHeaders(headers).addAllRows(rows).build();
      ContentItem item = ContentItem.newBuilder().setTable(table).build();

      // Set the maximum days to shift dates backwards (lower bound) or forward (upper bound)
      DateShiftConfig dateShiftConfig =
          DateShiftConfig.newBuilder().setLowerBoundDays(5).setUpperBoundDays(5).build();
      PrimitiveTransformation transformation =
          PrimitiveTransformation.newBuilder().setDateShiftConfig(dateShiftConfig).build();
      // Specify which fields the DateShift should apply too
      List<FieldId> dateFields = Arrays.asList(headers.get(1), headers.get(3));
      FieldTransformation fieldTransformation =
          FieldTransformation.newBuilder()
              .addAllFields(dateFields)
              .setPrimitiveTransformation(transformation)
              .build();
      RecordTransformations recordTransformations =
          RecordTransformations.newBuilder().addFieldTransformations(fieldTransformation).build();
      // Specify the config for the de-identify request
      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setRecordTransformations(recordTransformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(item)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Write the results to the target CSV file
      try (BufferedWriter writer = Files.newBufferedWriter(outputCsvFile)) {
        Table outTable = response.getItem().getTable();
        String headerOut =
            outTable.getHeadersList().stream()
                .map(FieldId::getName)
                .collect(Collectors.joining(","));
        writer.write(headerOut + "\n");

        List<String> rowOutput =
            outTable.getRowsList().stream()
                .map(row -> joinRow(row.getValuesList()))
                .collect(Collectors.toList());
        for (String line : rowOutput) {
          writer.write(line + "\n");
        }
        System.out.println("Content written to file: " + outputCsvFile.toString());
      }
    }
  }

  // Convert the string from the csv file into com.google.type.Date
  public static Date parseAsDate(String s) {
    LocalDate date = LocalDate.parse(s, DateTimeFormatter.ofPattern("MM/dd/yyyy"));
    return Date.newBuilder()
        .setDay(date.getDayOfMonth())
        .setMonth(date.getMonthValue())
        .setYear(date.getYear())
        .build();
  }

  // Each row is in the format: Name,BirthDate,CreditCardNumber,RegisterDate
  public static Table.Row parseLineAsRow(String line) {
    List<String> values = Splitter.on(",").splitToList(line);
    Value name = Value.newBuilder().setStringValue(values.get(0)).build();
    Value birthDate = Value.newBuilder().setDateValue(parseAsDate(values.get(1))).build();
    Value creditCardNumber = Value.newBuilder().setStringValue(values.get(2)).build();
    Value registerDate = Value.newBuilder().setDateValue(parseAsDate(values.get(3))).build();
    return Table.Row.newBuilder()
        .addValues(name)
        .addValues(birthDate)
        .addValues(creditCardNumber)
        .addValues(registerDate)
        .build();
  }

  public static String formatDate(Date d) {
    return String.format("%s/%s/%s", d.getMonth(), d.getDay(), d.getYear());
  }

  public static String joinRow(List<Value> values) {
    String name = values.get(0).getStringValue();
    String birthDate = formatDate(values.get(1).getDateValue());
    String creditCardNumber = values.get(2).getStringValue();
    String registerDate = formatDate(values.get(3).getDateValue());
    return String.join(",", name, birthDate, creditCardNumber, registerDate);
  }
}

Node.js

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// Import other required libraries
const fs = require('fs');

// The project ID to run the API call under
// const projectId = 'my-project';

// The path to the CSV file to deidentify
// The first row of the file must specify column names, and all other rows
// must contain valid values
// const inputCsvFile = '/path/to/input/file.csv';

// The path to save the date-shifted CSV file to
// const outputCsvFile = '/path/to/output/file.csv';

// The list of (date) fields in the CSV file to date shift
// const dateFields = [{ name: 'birth_date'}, { name: 'register_date' }];

// The maximum number of days to shift a date backward
// const lowerBoundDays = 1;

// The maximum number of days to shift a date forward
// const upperBoundDays = 1;

// (Optional) The column to determine date shift amount based on
// If this is not specified, a random shift amount will be used for every row
// If this is specified, then 'wrappedKey' and 'keyName' must also be set
// const contextFieldId = [{ name: 'user_id' }];

// (Optional) The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// If this is specified, then 'wrappedKey' and 'contextFieldId' must also be set
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// (Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
// This key should be encrypted using the Cloud KMS key specified above
// If this is specified, then 'keyName' and 'contextFieldId' must also be set
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// Helper function for converting CSV rows to Protobuf types
const rowToProto = row => {
  const values = row.split(',');
  const convertedValues = values.map(value => {
    if (Date.parse(value)) {
      const date = new Date(value);
      return {
        dateValue: {
          year: date.getFullYear(),
          month: date.getMonth() + 1,
          day: date.getDate(),
        },
      };
    } else {
      // Convert all non-date values to strings
      return {stringValue: value.toString()};
    }
  });
  return {values: convertedValues};
};

async function deidentifyWithDateShift() {
  // Read and parse a CSV file
  const csvLines = fs
    .readFileSync(inputCsvFile)
    .toString()
    .split('\n')
    .filter(line => line.includes(','));
  const csvHeaders = csvLines[0].split(',');
  const csvRows = csvLines.slice(1);

  // Construct the table object
  const tableItem = {
    table: {
      headers: csvHeaders.map(header => {
        return {name: header};
      }),
      rows: csvRows.map(row => rowToProto(row)),
    },
  };

  // Construct DateShiftConfig
  const dateShiftConfig = {
    lowerBoundDays: lowerBoundDays,
    upperBoundDays: upperBoundDays,
  };

  if (contextFieldId && keyName && wrappedKey) {
    dateShiftConfig.context = {name: contextFieldId};
    dateShiftConfig.cryptoKey = {
      kmsWrapped: {
        wrappedKey: wrappedKey,
        cryptoKeyName: keyName,
      },
    };
  } else if (contextFieldId || keyName || wrappedKey) {
    throw new Error(
      'You must set either ALL or NONE of {contextFieldId, keyName, wrappedKey}!'
    );
  }

  // Construct deidentification request
  const request = {
    parent: `projects/${projectId}/locations/global`,
    deidentifyConfig: {
      recordTransformations: {
        fieldTransformations: [
          {
            fields: dateFields,
            primitiveTransformation: {
              dateShiftConfig: dateShiftConfig,
            },
          },
        ],
      },
    },
    item: tableItem,
  };

  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const tableRows = response.item.table.rows;

  // Write results to a CSV file
  tableRows.forEach((row, rowIndex) => {
    const rowValues = row.values.map(
      value =>
        value.stringValue ||
        `${value.dateValue.month}/${value.dateValue.day}/${value.dateValue.year}`
    );
    csvLines[rowIndex + 1] = rowValues.join(',');
  });
  csvLines.push('');
  fs.writeFileSync(outputCsvFile, csvLines.join('\n'));

  // Print status
  console.log(`Successfully saved date-shift output to ${outputCsvFile}`);
}

deidentifyWithDateShift();

Python

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

import base64
import csv
from datetime import datetime
from typing import List

import google.cloud.dlp
from google.cloud.dlp_v2 import types

def deidentify_with_date_shift(
    project: str,
    input_csv_file: str = None,
    output_csv_file: str = None,
    date_fields: List[str] = None,
    lower_bound_days: int = None,
    upper_bound_days: int = None,
    context_field_id: str = None,
    wrapped_key: str = None,
    key_name: str = None,
) -> None:
    """Uses the Data Loss Prevention API to deidentify dates in a CSV file by
        pseudorandomly shifting them.
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_csv_file: The path to the CSV file to deidentify. The first row
            of the file must specify column names, and all other rows must
            contain valid values.
        output_csv_file: The path to save the date-shifted CSV file.
        date_fields: The list of (date) fields in the CSV file to date shift.
            Example: ['birth_date', 'register_date']
        lower_bound_days: The maximum number of days to shift a date backward
        upper_bound_days: The maximum number of days to shift a date forward
        context_field_id: (Optional) The column to determine date shift amount
            based on. If this is not specified, a random shift amount will be
            used for every row. If this is specified, then 'wrappedKey' and
            'keyName' must also be set. Example:
            contextFieldId = [{ 'name': 'user_id' }]
        key_name: (Optional) The name of the Cloud KMS key used to encrypt
            ('wrap') the AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: (Optional) The encrypted ('wrapped') AES-256 key to use.
            This key should be encrypted using the Cloud KMS key specified by
            key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # Convert date field list to Protobuf type
    def map_fields(field: str) -> dict:
        return {"name": field}

    if date_fields:
        date_fields = map(map_fields, date_fields)
    else:
        date_fields = []

    f = []
    with open(input_csv_file) as csvfile:
        reader = csv.reader(csvfile)
        for row in reader:
            f.append(row)

    #  Helper function for converting CSV rows to Protobuf types
    def map_headers(header: str) -> dict:
        return {"name": header}

    def map_data(value: str) -> dict:
        try:
            date = datetime.strptime(value, "%m/%d/%Y")
            return {
                "date_value": {"year": date.year, "month": date.month, "day": date.day}
            }
        except ValueError:
            return {"string_value": value}

    def map_rows(row: str) -> dict:
        return {"values": map(map_data, row)}

    # Using the helper functions, convert CSV rows to protobuf-compatible
    # dictionaries.
    csv_headers = map(map_headers, f[0])
    csv_rows = map(map_rows, f[1:])

    # Construct the table dict
    table_item = {"table": {"headers": csv_headers, "rows": csv_rows}}

    # Construct date shift config
    date_shift_config = {
        "lower_bound_days": lower_bound_days,
        "upper_bound_days": upper_bound_days,
    }

    # If using a Cloud KMS key, add it to the date_shift_config.
    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    if context_field_id and key_name and wrapped_key:
        date_shift_config["context"] = {"name": context_field_id}
        date_shift_config["crypto_key"] = {
            "kms_wrapped": {
                "wrapped_key": base64.b64decode(wrapped_key),
                "crypto_key_name": key_name,
            }
        }
    elif context_field_id or key_name or wrapped_key:
        raise ValueError(
            """You must set either ALL or NONE of
        [context_field_id, key_name, wrapped_key]!"""
        )

    # Construct Deidentify Config
    deidentify_config = {
        "record_transformations": {
            "field_transformations": [
                {
                    "fields": date_fields,
                    "primitive_transformation": {
                        "date_shift_config": date_shift_config
                    },
                }
            ]
        }
    }

    # Write to CSV helper methods
    def write_header(header: types.storage.FieldId) -> str:
        return header.name

    def write_data(data: types.storage.Value) -> str:
        return data.string_value or "{}/{}/{}".format(
            data.date_value.month,
            data.date_value.day,
            data.date_value.year,
        )

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "item": table_item,
        }
    )

    # Write results to CSV file
    with open(output_csv_file, "w") as csvfile:
        write_file = csv.writer(csvfile, delimiter=",")
        write_file.writerow(map(write_header, response.item.table.headers))
        for row in response.item.table.rows:
            write_file.writerow(map(write_data, row.values))
    # Print status
    print(f"Successfully saved date-shift output to {output_csv_file}")

Go

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

import (
	"context"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// deidentifyDateShift shifts dates found in the input between lowerBoundDays and
// upperBoundDays.
func deidentifyDateShift(w io.Writer, projectID string, lowerBoundDays, upperBoundDays int32, input string) error {
	// projectID := "my-project-id"
	// lowerBoundDays := -1
	// upperBound := -1
	// input := "2016-01-10"
	// Will print "2016-01-09"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %w", err)
	}
	defer client.Close()
	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_DateShiftConfig{
									DateShiftConfig: &dlppb.DateShiftConfig{
										LowerBoundDays: lowerBoundDays,
										UpperBoundDays: upperBoundDays,
									},
								},
							},
						},
					},
				},
			},
		},
		// The InspectConfig is used to identify the DATE fields.
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: []*dlppb.InfoType{
				{
					Name: "DATE",
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %w", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

use DateTime;
use Exception;
use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CryptoKey;
use Google\Cloud\Dlp\V2\DateShiftConfig;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DeidentifyContentRequest;
use Google\Cloud\Dlp\V2\FieldId;
use Google\Cloud\Dlp\V2\FieldTransformation;
use Google\Cloud\Dlp\V2\KmsWrappedCryptoKey;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;
use Google\Cloud\Dlp\V2\RecordTransformations;
use Google\Cloud\Dlp\V2\Table;
use Google\Cloud\Dlp\V2\Table\Row;
use Google\Cloud\Dlp\V2\Value;
use Google\Type\Date;

/**
 * Deidentify dates in a CSV file by pseudorandomly shifting them.
 * If contextFieldName is not specified, a random shift amount will be used for every row.
 * If contextFieldName is specified, then 'wrappedKey' and 'keyName' must also be set.
 *
 * @param string $callingProjectId  The GCP Project ID to run the API call under
 * @param string $inputCsvFile      The path to the CSV file to deidentify
 * @param string $outputCsvFile     The path to save the date-shifted CSV file to
 * @param string $dateFieldNames    The comma-separated list of (date) fields in the CSV file to date shift
 * @param int    $lowerBoundDays    The maximum number of days to shift a date backward
 * @param int    $upperBoundDays    The maximum number of days to shift a date forward
 * @param string $contextFieldName  (Optional) The column to determine date shift amount based on
 * @param string $keyName           (Optional) The encrypted ('wrapped') AES-256 key to use when shifting dates
 * @param string $wrappedKey        (Optional) The name of the Cloud KMS key used to encrypt (wrap) the AES-256 key
 */
function deidentify_dates(
    string $callingProjectId,
    string $inputCsvFile,
    string $outputCsvFile,
    string $dateFieldNames,
    int $lowerBoundDays,
    int $upperBoundDays,
    string $contextFieldName = '',
    string $keyName = '',
    string $wrappedKey = ''
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // Read a CSV file
    $csvLines = file($inputCsvFile, FILE_IGNORE_NEW_LINES);
    $csvHeaders = explode(',', $csvLines[0]);
    $csvRows = array_slice($csvLines, 1);

    // Convert CSV file into protobuf objects
    $tableHeaders = array_map(function ($csvHeader) {
        return (new FieldId)->setName($csvHeader);
    }, $csvHeaders);

    $tableRows = array_map(function ($csvRow) {
        $rowValues = array_map(function ($csvValue) {
            if ($csvDate = DateTime::createFromFormat('m/d/Y', $csvValue)) {
                $date = (new Date())
                    ->setYear((int) $csvDate->format('Y'))
                    ->setMonth((int) $csvDate->format('m'))
                    ->setDay((int) $csvDate->format('d'));
                return (new Value())
                    ->setDateValue($date);
            } else {
                return (new Value())
                    ->setStringValue($csvValue);
            }
        }, explode(',', $csvRow));

        return (new Row())
            ->setValues($rowValues);
    }, $csvRows);

    // Convert date fields into protobuf objects
    $dateFields = array_map(function ($dateFieldName) {
        return (new FieldId())->setName($dateFieldName);
    }, explode(',', $dateFieldNames));

    // Construct the table object
    $table = (new Table())
        ->setHeaders($tableHeaders)
        ->setRows($tableRows);

    $item = (new ContentItem())
        ->setTable($table);

    // Construct dateShiftConfig
    $dateShiftConfig = (new DateShiftConfig())
        ->setLowerBoundDays($lowerBoundDays)
        ->setUpperBoundDays($upperBoundDays);

    if ($contextFieldName && $keyName && $wrappedKey) {
        $contextField = (new FieldId())
            ->setName($contextFieldName);

        // Create the wrapped crypto key configuration object
        $kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
            ->setWrappedKey(base64_decode($wrappedKey))
            ->setCryptoKeyName($keyName);

        $cryptoKey = (new CryptoKey())
            ->setKmsWrapped($kmsWrappedCryptoKey);

        $dateShiftConfig
            ->setContext($contextField)
            ->setCryptoKey($cryptoKey);
    } elseif ($contextFieldName || $keyName || $wrappedKey) {
        throw new Exception('You must set either ALL or NONE of {$contextFieldName, $keyName, $wrappedKey}!');
    }

    // Create the information transform configuration objects
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setDateShiftConfig($dateShiftConfig);

    $fieldTransformation = (new FieldTransformation())
        ->setPrimitiveTransformation($primitiveTransformation)
        ->setFields($dateFields);

    $recordTransformations = (new RecordTransformations())
        ->setFieldTransformations([$fieldTransformation]);

    // Create the deidentification configuration object
    $deidentifyConfig = (new DeidentifyConfig())
        ->setRecordTransformations($recordTransformations);

    $parent = "projects/$callingProjectId/locations/global";

    // Run request
    $deidentifyContentRequest = (new DeidentifyContentRequest())
        ->setParent($parent)
        ->setDeidentifyConfig($deidentifyConfig)
        ->setItem($item);
    $response = $dlp->deidentifyContent($deidentifyContentRequest);

    // Check for errors
    foreach ($response->getOverview()->getTransformationSummaries() as $summary) {
        foreach ($summary->getResults() as $result) {
            if ($details = $result->getDetails()) {
                printf('Error: %s' . PHP_EOL, $details);
                return;
            }
        }
    }

    // Save the results to a file
    $csvRef = fopen($outputCsvFile, 'w');
    fputcsv($csvRef, $csvHeaders);
    foreach ($response->getItem()->getTable()->getRows() as $tableRow) {
        $values = array_map(function ($tableValue) {
            if ($tableValue->getStringValue()) {
                return $tableValue->getStringValue();
            }
            $protoDate = $tableValue->getDateValue();
            $date = mktime(0, 0, 0, $protoDate->getMonth(), $protoDate->getDay(), $protoDate->getYear());
            return strftime('%D', $date);
        }, iterator_to_array($tableRow->getValues()));
        fputcsv($csvRef, $values);
    };
    fclose($csvRef);
    printf('Deidentified dates written to %s' . PHP_EOL, $outputCsvFile);
}

C#

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


using System;
using System.IO;
using System.Linq;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;

public class DeidentifyWithDateShift
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string inputCsvFilePath,
        int lowerBoundDays,
        int upperBoundDays,
        string dateFields,
        string contextField,
        string keyName,
        string wrappedKey)
    {
        var hasKeyName = !string.IsNullOrEmpty(keyName);
        var hasWrappedKey = !string.IsNullOrEmpty(wrappedKey);
        var hasContext = !string.IsNullOrEmpty(contextField);
        bool allFieldsSet = hasKeyName && hasWrappedKey && hasContext;
        bool noFieldsSet = !hasKeyName && !hasWrappedKey && !hasContext;
        if (!(allFieldsSet || noFieldsSet))
        {
            throw new ArgumentException("Must specify ALL or NONE of: {contextFieldId, keyName, wrappedKey}!");
        }

        var dlp = DlpServiceClient.Create();

        // Read file
        var csvLines = File.ReadAllLines(inputCsvFilePath);
        var csvHeaders = csvLines[0].Split(',');
        var csvRows = csvLines.Skip(1).ToArray();

        // Convert dates to protobuf format, and everything else to a string
        var protoHeaders = csvHeaders.Select(header => new FieldId { Name = header });
        var protoRows = csvRows.Select(csvRow =>
        {
            var rowValues = csvRow.Split(',');
            var protoValues = rowValues.Select(rowValue =>
               System.DateTime.TryParse(rowValue, out var parsedDate)
               ? new Value { DateValue = Google.Type.Date.FromDateTime(parsedDate) }
               : new Value { StringValue = rowValue });

            var rowObject = new Table.Types.Row();
            rowObject.Values.Add(protoValues);
            return rowObject;
        });

        var dateFieldList = dateFields
            .Split(',')
            .Select(field => new FieldId { Name = field });

        // Construct + execute the request
        var dateShiftConfig = new DateShiftConfig
        {
            LowerBoundDays = lowerBoundDays,
            UpperBoundDays = upperBoundDays
        };

        dateShiftConfig.Context = new FieldId { Name = contextField };
        dateShiftConfig.CryptoKey = new CryptoKey
        {
            KmsWrapped = new KmsWrappedCryptoKey
            {
                WrappedKey = ByteString.FromBase64(wrappedKey),
                CryptoKeyName = keyName
            }
        };

        var deidConfig = new DeidentifyConfig
        {
            RecordTransformations = new RecordTransformations
            {
                FieldTransformations =
                {
                    new FieldTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            DateShiftConfig = dateShiftConfig
                        },
                        Fields = { dateFieldList }
                    }
                }
            }
        };

        var response = dlp.DeidentifyContent(
            new DeidentifyContentRequest
            {
                Parent = new LocationName(projectId, "global").ToString(),
                DeidentifyConfig = deidConfig,
                Item = new ContentItem
                {
                    Table = new Table
                    {
                        Headers = { protoHeaders },
                        Rows = { protoRows }
                    }
                }
            });

        return response;
    }
}

cryptoReplaceFfxFpeConfig

Si vous définissez cryptoReplaceFfxFpeConfig sur un objet CryptoReplaceFfxFpeConfig, vous effectuez une pseudonymisation d'une valeur d'entrée en la remplaçant par un jeton. Celui-ci est défini de la manière suivante :

Il correspond à la valeur d'entrée chiffrée.
Il est aussi long que la valeur saisie.
Calculé à l'aide du chiffrement préservant le format en mode FFX ("FPE-FFX") sur la clé cryptographique spécifiée par cryptoKey.
Il est composé des caractères spécifiés par alphabet. Les options valides sont :
- NUMERIC
- HEXADECIMAL
- UPPER_CASE_ALPHA_NUMERIC
- ALPHA_NUMERIC

La valeur d'entrée est soumise aux conditions suivantes :

Elle doit comporter au moins deux caractères (ou la chaîne vide).
Elle doit être composée des caractères spécifiés par un alphabet. L'alphabet peut comporter entre 2 et 95 caractères. (Un alphabet de 95 caractères inclut tous les caractères imprimables du jeu de caractères US-ASCII.)

La protection des données sensibles calcule le jeton de remplacement à l'aide d'une clé cryptographique. Vous fournissez cette clé de l'une des trois manières suivantes :

En l'incorporant sous forme non chiffrée à la requête API. Cette action n'est pas recommandée.
En demandant à la protection des données sensibles de la générer.
En l'incorporant sous forme chiffrée à la requête API.

Si vous choisissez d'intégrer la clé dans la requête API, vous devez créer une clé et l'encapsuler (chiffrer) à l'aide d'une clé Cloud Key Management Service (Cloud KMS). La valeur renvoyée par défaut est une chaîne codée en base64. Pour définir cette valeur dans la protection des données sensibles, vous devez la décoder en une chaîne d'octets. Les extraits de code suivants indiquent comment procéder dans plusieurs langages. Des exemples de bout en bout sont fournis à la suite de ces extraits.

Java

KmsWrappedCryptoKey.newBuilder()
    .setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedKey)))

Python

# The wrapped key is base64-encoded, but the library expects a binary
# string, so decode it here.
import base64
wrapped_key = base64.b64decode(wrapped_key)

PHP

// Create the wrapped crypto key configuration object
$kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
    ->setWrappedKey(base64_decode($wrappedKey))
    ->setCryptoKeyName($keyName);

C#

WrappedKey = ByteString.FromBase64(wrappedKey)

Pour en savoir plus sur le chiffrement et le déchiffrement des données à l'aide de Cloud KMS, consultez la page Chiffrer et déchiffrer des données .

Par sa conception, ce mode préserve la longueur et le jeu de caractères du texte d'entrée. Cela signifie qu'il manque d'authentification et de vecteur d'initialisation, ce qui entraînerait une extension de longueur dans le jeton de sortie. D'autres méthodes, telles que AES-SIV, fournissent ces garanties de sécurité plus strictes et sont recommandées pour les cas d'utilisation de tokenisation, à moins que des exigences de longueur et de définition de jeu de caractères ne soient strictes, par exemple pour assurer la rétrocompatibilité avec un ancien système de données.

Voici un exemple de code dans plusieurs langages qui montre comment anonymiser des données sensibles en remplaçant une valeur d'entrée par un jeton à l'aide de la protection des données sensibles.

Java

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.common.io.BaseEncoding;
import com.google.privacy.dlp.v2.ContentItem;
import com.google.privacy.dlp.v2.CryptoKey;
import com.google.privacy.dlp.v2.CryptoReplaceFfxFpeConfig;
import com.google.privacy.dlp.v2.CryptoReplaceFfxFpeConfig.FfxCommonNativeAlphabet;
import com.google.privacy.dlp.v2.DeidentifyConfig;
import com.google.privacy.dlp.v2.DeidentifyContentRequest;
import com.google.privacy.dlp.v2.DeidentifyContentResponse;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeTransformations;
import com.google.privacy.dlp.v2.InfoTypeTransformations.InfoTypeTransformation;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.KmsWrappedCryptoKey;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.PrimitiveTransformation;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.util.Arrays;

public class DeIdentifyWithFpe {

  public static void main(String[] args) throws Exception {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String textToDeIdentify = "I'm Gary and my SSN is 552096781";
    String kmsKeyName =
        "projects/YOUR_PROJECT/"
            + "locations/YOUR_KEYRING_REGION/"
            + "keyRings/YOUR_KEYRING_NAME/"
            + "cryptoKeys/YOUR_KEY_NAME";
    String wrappedAesKey = "YOUR_ENCRYPTED_AES_256_KEY";
    deIdentifyWithFpe(projectId, textToDeIdentify, kmsKeyName, wrappedAesKey);
  }

  public static void deIdentifyWithFpe(
      String projectId, String textToDeIdentify, String kmsKeyName, String wrappedAesKey)
      throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      // Specify what content you want the service to DeIdentify
      ContentItem contentItem = ContentItem.newBuilder().setValue(textToDeIdentify).build();

      // Specify the type of info the inspection will look for.
      // See https://cloud.google.com/dlp/docs/infotypes-reference for complete list of info types
      InfoType infoType = InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER").build();
      InspectConfig inspectConfig =
          InspectConfig.newBuilder().addAllInfoTypes(Arrays.asList(infoType)).build();

      // Specify an encrypted AES-256 key and the name of the Cloud KMS key that encrypted it
      KmsWrappedCryptoKey kmsWrappedCryptoKey =
          KmsWrappedCryptoKey.newBuilder()
              .setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedAesKey)))
              .setCryptoKeyName(kmsKeyName)
              .build();
      CryptoKey cryptoKey = CryptoKey.newBuilder().setKmsWrapped(kmsWrappedCryptoKey).build();

      // Specify how the info from the inspection should be encrypted.
      InfoType surrogateInfoType = InfoType.newBuilder().setName("SSN_TOKEN").build();
      CryptoReplaceFfxFpeConfig cryptoReplaceFfxFpeConfig =
          CryptoReplaceFfxFpeConfig.newBuilder()
              .setCryptoKey(cryptoKey)
              // Set of characters in the input text. For more info, see
              // https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#DeidentifyTemplate.FfxCommonNativeAlphabet
              .setCommonAlphabet(FfxCommonNativeAlphabet.NUMERIC)
              .setSurrogateInfoType(surrogateInfoType)
              .build();
      PrimitiveTransformation primitiveTransformation =
          PrimitiveTransformation.newBuilder()
              .setCryptoReplaceFfxFpeConfig(cryptoReplaceFfxFpeConfig)
              .build();
      InfoTypeTransformation infoTypeTransformation =
          InfoTypeTransformation.newBuilder()
              .setPrimitiveTransformation(primitiveTransformation)
              .build();
      InfoTypeTransformations transformations =
          InfoTypeTransformations.newBuilder().addTransformations(infoTypeTransformation).build();

      DeidentifyConfig deidentifyConfig =
          DeidentifyConfig.newBuilder().setInfoTypeTransformations(transformations).build();

      // Combine configurations into a request for the service.
      DeidentifyContentRequest request =
          DeidentifyContentRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setItem(contentItem)
              .setInspectConfig(inspectConfig)
              .setDeidentifyConfig(deidentifyConfig)
              .build();

      // Send the request and receive response from the service
      DeidentifyContentResponse response = dlp.deidentifyContent(request);

      // Print the results
      System.out.println(
          "Text after format-preserving encryption: " + response.getItem().getValue());
    }
  }
}

Node.js

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

// Imports the Google Cloud Data Loss Prevention library
const DLP = require('@google-cloud/dlp');

// Instantiates a client
const dlp = new DLP.DlpServiceClient();

// The project ID to run the API call under
// const projectId = 'my-project';

// The string to deidentify
// const string = 'My SSN is 372819127';

// The set of characters to replace sensitive ones with
// For more information, see https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#ffxcommonnativealphabet
// const alphabet = 'ALPHA_NUMERIC';

// The name of the Cloud KMS key used to encrypt ('wrap') the AES-256 key
// const keyName = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME';

// The encrypted ('wrapped') AES-256 key to use
// This key should be encrypted using the Cloud KMS key specified above
// const wrappedKey = 'YOUR_ENCRYPTED_AES_256_KEY'

// (Optional) The name of the surrogate custom info type to use
// Only necessary if you want to reverse the deidentification process
// Can be essentially any arbitrary string, as long as it doesn't appear
// in your dataset otherwise.
// const surrogateType = 'SOME_INFO_TYPE_DEID';

async function deidentifyWithFpe() {
  // Construct FPE config
  const cryptoReplaceFfxFpeConfig = {
    cryptoKey: {
      kmsWrapped: {
        wrappedKey: wrappedKey,
        cryptoKeyName: keyName,
      },
    },
    commonAlphabet: alphabet,
  };
  if (surrogateType) {
    cryptoReplaceFfxFpeConfig.surrogateInfoType = {
      name: surrogateType,
    };
  }

  // Construct deidentification request
  const item = {value: string};
  const request = {
    parent: `projects/${projectId}/locations/global`,
    deidentifyConfig: {
      infoTypeTransformations: {
        transformations: [
          {
            primitiveTransformation: {
              cryptoReplaceFfxFpeConfig: cryptoReplaceFfxFpeConfig,
            },
          },
        ],
      },
    },
    item: item,
  };

  // Run deidentification request
  const [response] = await dlp.deidentifyContent(request);
  const deidentifiedItem = response.item;
  console.log(deidentifiedItem.value);
}
deidentifyWithFpe();

Python

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

import base64
from typing import List

import google.cloud.dlp

def deidentify_with_fpe(
    project: str,
    input_str: str,
    info_types: List[str],
    alphabet: str = None,
    surrogate_type: str = None,
    key_name: str = None,
    wrapped_key: str = None,
) -> None:
    """Uses the Data Loss Prevention API to deidentify sensitive data in a
    string using Format Preserving Encryption (FPE).
    Args:
        project: The Google Cloud project id to use as a parent resource.
        input_str: The string to deidentify (will be treated as text).
        info_types: A list of strings representing info types to look for.
        alphabet: The set of characters to replace sensitive ones with. For
            more information, see https://cloud.google.com/dlp/docs/reference/
            rest/v2beta2/organizations.deidentifyTemplates#ffxcommonnativealphabet
        surrogate_type: The name of the surrogate custom info type to use. Only
            necessary if you want to reverse the deidentification process. Can
            be essentially any arbitrary string, as long as it doesn't appear
            in your dataset otherwise.
        key_name: The name of the Cloud KMS key used to encrypt ('wrap') the
            AES-256 key. Example:
            key_name = 'projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/
            keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME'
        wrapped_key: The encrypted ('wrapped') AES-256 key to use. This key
            should be encrypted using the Cloud KMS key specified by key_name.
    Returns:
        None; the response from the API is printed to the terminal.
    """

    # Instantiate a client
    dlp = google.cloud.dlp_v2.DlpServiceClient()

    # Convert the project id into a full resource id.
    parent = f"projects/{project}/locations/global"

    # The wrapped key is base64-encoded, but the library expects a binary
    # string, so decode it here.
    wrapped_key = base64.b64decode(wrapped_key)

    # Construct FPE configuration dictionary
    crypto_replace_ffx_fpe_config = {
        "crypto_key": {
            "kms_wrapped": {"wrapped_key": wrapped_key, "crypto_key_name": key_name}
        },
        "common_alphabet": alphabet,
    }

    # Add surrogate type
    if surrogate_type:
        crypto_replace_ffx_fpe_config["surrogate_info_type"] = {"name": surrogate_type}

    # Construct inspect configuration dictionary
    inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}

    # Construct deidentify configuration dictionary
    deidentify_config = {
        "info_type_transformations": {
            "transformations": [
                {
                    "primitive_transformation": {
                        "crypto_replace_ffx_fpe_config": crypto_replace_ffx_fpe_config
                    }
                }
            ]
        }
    }

    # Convert string to item
    item = {"value": input_str}

    # Call the API
    response = dlp.deidentify_content(
        request={
            "parent": parent,
            "deidentify_config": deidentify_config,
            "inspect_config": inspect_config,
            "item": item,
        }
    )

    # Print results
    print(response.item.value)

Go

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

import (
	"context"
	"encoding/base64"
	"fmt"
	"io"

	dlp "cloud.google.com/go/dlp/apiv2"
	"cloud.google.com/go/dlp/apiv2/dlppb"
)

// deidentifyFPE deidentifies the input with FPE (Format Preserving Encryption).
// keyFileName is the file name with the KMS wrapped key and cryptoKeyName is the
// full KMS key resource name used to wrap the key. surrogateInfoType is an
// optional identifier needed for reidentification. surrogateInfoType can be any
// value not found in your input.
// Info types can be found with the infoTypes.list method or on https://cloud.google.com/dlp/docs/infotypes-reference
func deidentifyFPE(w io.Writer, projectID, input string, infoTypeNames []string, kmsKeyName, wrappedAESKey, surrogateInfoType string) error {
	// projectID := "my-project-id"
	// input := "My SSN is 123456789"
	// infoTypeNames := []string{"US_SOCIAL_SECURITY_NUMBER"}
	// kmsKeyName := "projects/YOUR_GCLOUD_PROJECT/locations/YOUR_LOCATION/keyRings/YOUR_KEYRING_NAME/cryptoKeys/YOUR_KEY_NAME"
	// wrappedAESKey := "YOUR_ENCRYPTED_AES_256_KEY"
	// surrogateInfoType := "AGE"
	ctx := context.Background()
	client, err := dlp.NewClient(ctx)
	if err != nil {
		return fmt.Errorf("dlp.NewClient: %w", err)
	}
	defer client.Close()
	// Convert the info type strings to a list of InfoTypes.
	var infoTypes []*dlppb.InfoType
	for _, it := range infoTypeNames {
		infoTypes = append(infoTypes, &dlppb.InfoType{Name: it})
	}

	// Specify an encrypted AES-256 key and the name of the Cloud KMS key that encrypted it.
	kmsWrappedCryptoKey, err := base64.StdEncoding.DecodeString(wrappedAESKey)
	if err != nil {
		return err
	}

	// Create a configured request.
	req := &dlppb.DeidentifyContentRequest{
		Parent: fmt.Sprintf("projects/%s/locations/global", projectID),
		InspectConfig: &dlppb.InspectConfig{
			InfoTypes: infoTypes,
		},
		DeidentifyConfig: &dlppb.DeidentifyConfig{
			Transformation: &dlppb.DeidentifyConfig_InfoTypeTransformations{
				InfoTypeTransformations: &dlppb.InfoTypeTransformations{
					Transformations: []*dlppb.InfoTypeTransformations_InfoTypeTransformation{
						{
							InfoTypes: []*dlppb.InfoType{}, // Match all info types.
							PrimitiveTransformation: &dlppb.PrimitiveTransformation{
								Transformation: &dlppb.PrimitiveTransformation_CryptoReplaceFfxFpeConfig{
									CryptoReplaceFfxFpeConfig: &dlppb.CryptoReplaceFfxFpeConfig{
										CryptoKey: &dlppb.CryptoKey{
											Source: &dlppb.CryptoKey_KmsWrapped{
												KmsWrapped: &dlppb.KmsWrappedCryptoKey{
													WrappedKey:    kmsWrappedCryptoKey,
													CryptoKeyName: kmsKeyName,
												},
											},
										},
										// Set the alphabet used for the output.
										Alphabet: &dlppb.CryptoReplaceFfxFpeConfig_CommonAlphabet{
											CommonAlphabet: dlppb.CryptoReplaceFfxFpeConfig_ALPHA_NUMERIC,
										},
										// Set the surrogate info type, used for reidentification.
										SurrogateInfoType: &dlppb.InfoType{
											Name: surrogateInfoType,
										},
									},
								},
							},
						},
					},
				},
			},
		},
		// The item to analyze.
		Item: &dlppb.ContentItem{
			DataItem: &dlppb.ContentItem_Value{
				Value: input,
			},
		},
	}
	// Send the request.
	r, err := client.DeidentifyContent(ctx, req)
	if err != nil {
		return fmt.Errorf("DeidentifyContent: %w", err)
	}
	// Print the result.
	fmt.Fprint(w, r.GetItem().GetValue())
	return nil
}

PHP

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires

use Google\Cloud\Dlp\V2\Client\DlpServiceClient;
use Google\Cloud\Dlp\V2\ContentItem;
use Google\Cloud\Dlp\V2\CryptoKey;
use Google\Cloud\Dlp\V2\CryptoReplaceFfxFpeConfig;
use Google\Cloud\Dlp\V2\CryptoReplaceFfxFpeConfig\FfxCommonNativeAlphabet;
use Google\Cloud\Dlp\V2\DeidentifyConfig;
use Google\Cloud\Dlp\V2\DeidentifyContentRequest;
use Google\Cloud\Dlp\V2\InfoType;
use Google\Cloud\Dlp\V2\InfoTypeTransformations;
use Google\Cloud\Dlp\V2\InfoTypeTransformations\InfoTypeTransformation;
use Google\Cloud\Dlp\V2\KmsWrappedCryptoKey;
use Google\Cloud\Dlp\V2\PrimitiveTransformation;

/**
 * Deidentify a string using Format-Preserving Encryption (FPE).
 *
 * @param string $callingProjectId  The GCP Project ID to run the API call under
 * @param string $string            The string to deidentify
 * @param string $keyName           The name of the Cloud KMS key used to encrypt (wrap) the AES-256 key
 * @param string $wrappedKey        The name of the Cloud KMS key use, encrypted with the KMS key in $keyName
 * @param string $surrogateTypeName (Optional) surrogate custom info type to enable reidentification
 */
function deidentify_fpe(
    string $callingProjectId,
    string $string,
    string $keyName,
    string $wrappedKey,
    string $surrogateTypeName = ''
): void {
    // Instantiate a client.
    $dlp = new DlpServiceClient();

    // The infoTypes of information to mask
    $ssnInfoType = (new InfoType())
        ->setName('US_SOCIAL_SECURITY_NUMBER');
    $infoTypes = [$ssnInfoType];

    // Create the wrapped crypto key configuration object
    $kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
        ->setWrappedKey(base64_decode($wrappedKey))
        ->setCryptoKeyName($keyName);

    // The set of characters to replace sensitive ones with
    // For more information, see https://cloud.google.com/dlp/docs/reference/rest/V2/organizations.deidentifyTemplates#ffxcommonnativealphabet
    $commonAlphabet = FfxCommonNativeAlphabet::NUMERIC;

    // Create the crypto key configuration object
    $cryptoKey = (new CryptoKey())
        ->setKmsWrapped($kmsWrappedCryptoKey);

    // Create the crypto FFX FPE configuration object
    $cryptoReplaceFfxFpeConfig = (new CryptoReplaceFfxFpeConfig())
        ->setCryptoKey($cryptoKey)
        ->setCommonAlphabet($commonAlphabet);

    if ($surrogateTypeName) {
        $surrogateType = (new InfoType())
            ->setName($surrogateTypeName);
        $cryptoReplaceFfxFpeConfig->setSurrogateInfoType($surrogateType);
    }

    // Create the information transform configuration objects
    $primitiveTransformation = (new PrimitiveTransformation())
        ->setCryptoReplaceFfxFpeConfig($cryptoReplaceFfxFpeConfig);

    $infoTypeTransformation = (new InfoTypeTransformation())
        ->setPrimitiveTransformation($primitiveTransformation)
        ->setInfoTypes($infoTypes);

    $infoTypeTransformations = (new InfoTypeTransformations())
        ->setTransformations([$infoTypeTransformation]);

    // Create the deidentification configuration object
    $deidentifyConfig = (new DeidentifyConfig())
        ->setInfoTypeTransformations($infoTypeTransformations);

    $content = (new ContentItem())
        ->setValue($string);

    $parent = "projects/$callingProjectId/locations/global";

    // Run request
    $deidentifyContentRequest = (new DeidentifyContentRequest())
        ->setParent($parent)
        ->setDeidentifyConfig($deidentifyConfig)
        ->setItem($content);
    $response = $dlp->deidentifyContent($deidentifyContentRequest);

    // Print the results
    $deidentifiedValue = $response->getItem()->getValue();
    print($deidentifiedValue);
}

C#

Pour savoir comment installer et utiliser la bibliothèque cliente pour la protection des données sensibles, consultez Bibliothèques clientes pour la protection des données sensibles.

Afficher sur GitHub Commentaires


using System;
using System.Collections.Generic;
using Google.Api.Gax.ResourceNames;
using Google.Cloud.Dlp.V2;
using Google.Protobuf;
using static Google.Cloud.Dlp.V2.CryptoReplaceFfxFpeConfig.Types;

public class DeidentifyWithFpe
{
    public static DeidentifyContentResponse Deidentify(
        string projectId,
        string dataValue,
        IEnumerable<InfoType> infoTypes,
        string keyName,
        string wrappedKey,
        FfxCommonNativeAlphabet alphabet)
    {
        var deidentifyConfig = new DeidentifyConfig
        {
            InfoTypeTransformations = new InfoTypeTransformations
            {
                Transformations =
                {
                    new InfoTypeTransformations.Types.InfoTypeTransformation
                    {
                        PrimitiveTransformation = new PrimitiveTransformation
                        {
                            CryptoReplaceFfxFpeConfig = new CryptoReplaceFfxFpeConfig
                            {
                                CommonAlphabet = alphabet,
                                CryptoKey = new CryptoKey
                                {
                                    KmsWrapped = new KmsWrappedCryptoKey
                                    {
                                        CryptoKeyName = keyName,
                                        WrappedKey = ByteString.FromBase64 (wrappedKey)
                                    }
                                },
                                SurrogateInfoType = new InfoType
                                {
                                    Name = "TOKEN"
                                }
                            }
                        }
                    }
                }
            }
        };

        var dlp = DlpServiceClient.Create();
        var response = dlp.DeidentifyContent(
            new DeidentifyContentRequest
            {
                Parent = new LocationName(projectId, "global").ToString(),
                InspectConfig = new InspectConfig
                {
                    InfoTypes = { infoTypes }
                },
                DeidentifyConfig = deidentifyConfig,
                Item = new ContentItem { Value = dataValue }
            });

        Console.WriteLine($"Deidentified content: {response.Item.Value}");
        return response;
    }
}

Pour obtenir des exemples de code montrant comment utiliser la protection des données sensibles afin de restaurer l'identification des données sensibles anonymisées via la méthode de transformation CryptoReplaceFfxFpeConfig, consultez Chiffrement préservant le format: exemples de restauration de l'identification.

fixedSizeBucketingConfig

Les transformations de binning (celle-ci et bucketingConfig) permettent de masquer des données numériques en les convertissant en plages de buckets. La plage de nombres qui en résulte est une chaîne composée d'une limite inférieure, d'un trait d'union et d'une limite supérieure.

Si vous définissez fixedSizeBucketingConfig sur un objet FixedSizeBucketingConfig, les valeurs d'entrée seront présentées sous forme de buckets en fonction de plages de taille fixe. L'objet FixedSizeBucketingConfig comprend les éléments suivants :

lowerBound : limite inférieure de tous les buckets. Les valeurs inférieures à celle-ci sont regroupées dans un même bucket.
upperBound : limite supérieure de tous les buckets. Les valeurs supérieures à celle-ci sont regroupées dans un même bucket.
bucketSize : taille de chaque bucket autre que les buckets présentant la valeur minimale et maximale.

Par exemple, si lowerBound est défini sur 10, upperBoundsur 89 et bucketSize sur 10, les buckets suivants doivent être définis sur -10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-89, 89+.

Pour en savoir plus sur le concept de binning, consultez la page Généralisation et binning.

bucketingConfig

La transformation de type bucketingConfig offre une plus grande flexibilité que l'autre transformation de binning fixedSizeBucketingConfig. Plutôt que de spécifier des limites supérieure et inférieure ainsi qu'une valeur d'intervalle avec laquelle créer de buckets de taille égale, vous spécifiez les valeurs maximale et minimale pour chaque bucket que vous souhaitez créer. Chaque paire de valeurs maximale et minimale doit partager le même type.

Vous pouvez définir bucketingConfig sur un objet BucketingConfig pour spécifier des buckets personnalisés. L'objet BucketingConfig se compose d'un tableau buckets[] d'objets Bucket. Chaque objet Bucket comprend les éléments suivants :

min : limite inférieure de la plage du bucket. Omettez cette valeur pour créer un bucket sans limite inférieure.
max : limite supérieure de la plage du bucket. Omettez cette valeur pour créer un bucket sans limite supérieure.
replacementValue : valeur permettant de remplacer les valeurs comprises entre les limites inférieure et supérieure. Si vous ne spécifiez pas de valeur replacementValue, une plage min-max est utilisée à la place.

Si une valeur se situe en dehors des plages définies, le TransformationSummary renvoyé contient un message d'erreur.

Prenons l'exemple de configuration suivant pour la transformation bucketingConfig :

"bucketingConfig":{
  "buckets":[
    {
      "min":{
        "integerValue":"1"
      },
      "max":{
        "integerValue":"30"
      },
      "replacementValue":{
        "stringValue":"LOW"
      }
    },
    {
      "min":{
        "integerValue":"31"
      },
      "max":{
        "integerValue":"65"
      },
      "replacementValue":{
        "stringValue":"MEDIUM"
      }
    },
    {
      "min":{
        "integerValue":"66"
      },
      "max":{
        "integerValue":"100"
      },
      "replacementValue":{
        "stringValue":"HIGH"
      }
    }
  ]
}

Cette configuration définit le comportement suivant :

Les valeurs entières comprises entre 1 et 30 sont masquées et remplacées par LOW.
Les valeurs entières comprises entre 31 et 65 sont masquées et remplacées par MEDIUM.
Les valeurs entières comprises entre 66 et 100 sont masquées et remplacées par HIGH.

Pour en savoir plus sur le concept de binning, consultez la page Généralisation et binning.

replaceWithInfoTypeConfig

Si vous spécifiez replaceWithInfoTypeConfig, vous remplacez chaque valeur détectée par le nom de l'infoType. Le message replaceWithInfoTypeConfig n'accepte aucun argument, sa simple présence active la transformation.

Supposons par exemple que vous ayez spécifié replaceWithInfoTypeConfig pour tous les infoTypes EMAIL_ADDRESS et que la chaîne suivante soit envoyée à la protection des données sensibles:

My name is Alicia Abernathy, and my email address is aabernathy@example.com.

La chaîne renvoyée sera la suivante :

My name is Alicia Abernathy, and my email address is EMAIL_ADDRESS.

timePartConfig

Si vous définissez timePartConfig sur un objet TimePartConfig, une partie de la valeur détectée est conservée, ce qui inclut les valeurs Date, Timestamp et TimeOfDay. L'objet TimePartConfig comporte un argument partToExtract qui peut être défini sur l'une des valeurs énumérées TimePart, y compris l'année, le mois, le jour du mois, etc.

Supposons par exemple que vous ayez configuré une transformation timePartConfig en définissant partToExtract sur YEAR. Après avoir envoyé les données de la première colonne en dessous à la protection des données sensibles, vous obtenez les valeurs transformées dans la deuxième colonne:

Valeurs d'origine	Valeurs transformées
`9/21/1976`	`1976`
`6/7/1945`	`1945`
`1/20/2009`	`2009`
`7/4/1776`	`1776`
`8/1/1984`	`1984`
`4/21/1982`	`1982`

Transformations d'enregistrement

Les transformations d'enregistrement (objet RecordTransformations) ne sont appliquées qu'aux valeurs identifiées comme un infoType spécifique au sein de données tabulaires. L'objet RecordTransformations se divise en deux autres sous-catégories de transformations :

fieldTransformations[] : transformations qui appliquent diverses transformations de champ.
recordSuppressions[] : règles définissant les enregistrements qui sont complètement supprimés. Les enregistrements qui correspondent à une règle de suppression spécifiée dans recordSuppressions[] sont omis du résultat.

Transformations de champ

Chaque objet FieldTransformation comprend trois arguments :

fields : un ou plusieurs champs d'entrée (objets FieldID) auxquels appliquer la transformation.
condition : une condition (objet RecordCondition) qui doit renvoyer la valeur "true" pour que la transformation soit appliquée. Vous pouvez par exemple appliquer une transformation de bucket à une colonne "Âge" d'un enregistrement seulement si la colonne "Code postal" du même enregistrement se situe dans une plage spécifique. Vous pouvez également ne masquer un champ que si le champ "Date de naissance" indique qu'une personne a 85 ans ou plus.
L'un des deux arguments suivants de type transformation. Vous devez en spécifier au moins un :
- infoTypeTransformations : traite le contenu du champ en tant que texte libre et n'applique une transformation PrimitiveTransformation qu'au contenu correspondant à un InfoType. Nous avons abordé ces transformations plus haut sur cette page.
- primitiveTransformation : applique la transformation primitive spécifiée (objet PrimitiveTransformation) à l'ensemble du champ. Nous avons abordé ces transformations plus haut sur cette page.

Exemple de transformations de champs

L'exemple suivant envoie une requête projects.content.deidentify avec deux transformations de champ:

La première transformation de champ s'applique aux deux premières colonnes (column1 et column2). Comme son type de transformation est un objet primitiveTransformation (plus précisément, un CryptoDeterministicConfig), la protection des données sensibles transforme l'ensemble du champ.

Avertissement :Cet exemple utilise une clé non encapsulée à des fins de démonstration. N'utilisez pas cette clé pour des charges de travail sensibles réelles. Google vous recommande d'utiliser plutôt une clé encapsulée. Pour savoir comment créer une clé encapsulée et l'utiliser dans des requêtes d'anonymisation et de restauration de l'identification, consultez la section Anonymiser et restaurer l'identification des données sensibles.
La deuxième transformation de champ s'applique à la troisième colonne (column3). Comme son type de transformation est un objet infoTypeTransformations, la protection des données sensibles applique la transformation primitive (plus précisément, un ReplaceWithInfoTypeConfig) uniquement au contenu correspondant à l'infoType défini dans la configuration d'inspection.

Avant d'utiliser les données de la requête, effectuez les remplacements suivants:

PROJECT_ID : ID de votre projet Google Cloud. Les ID de projet sont des chaînes alphanumériques, comme my-project.

Méthode HTTP et URL :

POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:deidentify

Corps JSON de la requête :

{
  "item": {
    "table": {
      "headers": [
        {
          "name": "column1"
        },
        {
          "name": "column2"
        },
        {
          "name": "column3"
        }
      ],
      "rows": [
        {
          "values": [
            {
              "stringValue": "Example string 1"
            },
            {
              "stringValue": "Example string 2"
            },
            {
              "stringValue": "My email address is dani@example.org"
            }
          ]
        },
        {
          "values": [
            {
              "stringValue": "Example string 1"
            },
            {
              "stringValue": "Example string 3"
            },
            {
              "stringValue": "My email address is cruz@example.org"
            }
          ]
        }
      ]
    }
  },
  "deidentifyConfig": {
    "recordTransformations": {
      "fieldTransformations": [
        {
          "fields": [
            {
              "name": "column1"
            },
            {
              "name": "column2"
            }
          ],
          "primitiveTransformation": {
            "cryptoDeterministicConfig": {
              "cryptoKey": {
                "unwrapped": {
                  "key": "YWJjZGVmZ2hpamtsbW5vcA=="
                }
              }
            }
          }
        },
        {
          "fields": [
            {
              "name": "column3"
            }
          ],
          "infoTypeTransformations": {
            "transformations": [
              {
                "primitiveTransformation": {
                  "replaceWithInfoTypeConfig": {}
                }
              }
            ]
          }
        }
      ]
    }
  },
  "inspectConfig": {
    "infoTypes": [
      {
        "name": "EMAIL_ADDRESS"
      }
    ]
  }
}

Pour envoyer votre requête, développez l'une des options suivantes :

curl (Linux, macOS ou Cloud Shell)

Remarque : La commande suivante suppose que vous êtes connecté à la CLI gcloud avec votre compte utilisateur en exécutant la commande gcloud init ou gcloud auth login, ou en utilisant Cloud Shell, qui vous connecte automatiquement à la CLI gcloud. Vous pouvez exécuter gcloud auth list pour vérifier le compte actuellement actif.

Enregistrez le corps de la requête dans un fichier nommé deidentify-tabular-content-request.json, puis exécutez la commande suivante:

curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "x-goog-user-project: PROJECT_ID" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @deidentify-tabular-content-request.json \
    "https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:deidentify"

PowerShell (Windows)

Enregistrez le corps de la requête dans un fichier nommé deidentify-tabular-content-request.json, puis exécutez la commande suivante:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile deidentify-tabular-content-request.json `
    -Uri "https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:deidentify" | Select-Object -Expand Content

Vous devriez recevoir une réponse JSON de ce type :

{
  "item": {
    "table": {
      "headers": [
        {
          "name": "column1"
        },
        {
          "name": "column2"
        },
        {
          "name": "column3"
        }
      ],
      "rows": [
        {
          "values": [
            {
              "stringValue": "AWttmGlln6Z2MFOMqcOzDdNJS52XFxOOZsg0ckDeZzfc"
            },
            {
              "stringValue": "AUBTE+sQB6eKZ5iD3Y0Ss682zANXbijuFl9KL9ExVOTF"
            },
            {
              "stringValue": "My email address is [EMAIL_ADDRESS]"
            }
          ]
        },
        {
          "values": [
            {
              "stringValue": "AWttmGlln6Z2MFOMqcOzDdNJS52XFxOOZsg0ckDeZzfc"
            },
            {
              "stringValue": "AU+oD2pnqUDTLNItE8RplY3E0fTHeO4rZkX4GeFHN2CI"
            },
            {
              "stringValue": "My email address is [EMAIL_ADDRESS]"
            }
          ]
        }
      ]
    }
  },
  "overview": {
    "transformedBytes": "96",
    "transformationSummaries": [
      {
        "field": {
          "name": "column1"
        },
        "results": [
          {
            "count": "2",
            "code": "SUCCESS"
          }
        ],
        "fieldTransformations": [
          {
            "fields": [
              {
                "name": "column1"
              },
              {
                "name": "column2"
              }
            ],
            "primitiveTransformation": {
              "cryptoDeterministicConfig": {
                "cryptoKey": {
                  "unwrapped": {
                    "key": "YWJjZGVmZ2hpamtsbW5vcA=="
                  }
                }
              }
            }
          }
        ],
        "transformedBytes": "32"
      },
      {
        "field": {
          "name": "column2"
        },
        "results": [
          {
            "count": "2",
            "code": "SUCCESS"
          }
        ],
        "fieldTransformations": [
          {
            "fields": [
              {
                "name": "column1"
              },
              {
                "name": "column2"
              }
            ],
            "primitiveTransformation": {
              "cryptoDeterministicConfig": {
                "cryptoKey": {
                  "unwrapped": {
                    "key": "YWJjZGVmZ2hpamtsbW5vcA=="
                  }
                }
              }
            }
          }
        ],
        "transformedBytes": "32"
      },
      {
        "infoType": {
          "name": "EMAIL_ADDRESS",
          "sensitivityScore": {
            "score": "SENSITIVITY_MODERATE"
          }
        },
        "field": {
          "name": "column3"
        },
        "results": [
          {
            "count": "2",
            "code": "SUCCESS"
          }
        ],
        "fieldTransformations": [
          {
            "fields": [
              {
                "name": "column3"
              }
            ],
            "infoTypeTransformations": {
              "transformations": [
                {
                  "primitiveTransformation": {
                    "replaceWithInfoTypeConfig": {}
                  }
                }
              ]
            }
          }
        ],
        "transformedBytes": "32"
      }
    ]
  }
}

Suppression d'enregistrements

En plus d'appliquer des transformations aux données de champ, vous pouvez également demander à la protection des données sensibles d'anonymiser les données en supprimant simplement les enregistrements lorsque certaines conditions de suppression sont remplies. Vous pouvez appliquer des transformations de champ et des suppressions d'enregistrement dans la même requête.

Vous définissez le message recordSuppressions de l'objet RecordTransformations sur un tableau comprenant un ou plusieurs objets RecordSuppression.

Chaque objet RecordSuppression contient un seul objet RecordCondition, qui à son tour contient un seul objet Expressions.

Un objet Expressions contient :

logicalOperator : l'un des types LogicalOperator énumérés.
conditions : un objet Conditions comprenant un tableau d'un ou de plusieurs objets Condition. Un objet Condition correspond à une comparaison d'une valeur de champ à une autre valeur, de type string, boolean, integer, double, Timestamp ou TimeofDay.

Si la comparaison prend la valeur "true", l'enregistrement est supprimé et inversement. Si les valeurs comparées ne partagent pas le même type, vous recevez un avertissement, et la condition prend la valeur "false".

Transformations réversibles

Lorsque vous anonymisez des données à l'aide des transformations d'infoTypes CryptoReplaceFfxFpeConfig ou CryptoDeterministicConfig, vous pouvez désanonymiser ces données, à condition de disposer de la clé CryptoKey utilisée à l'origine pour les anonymiser. Pour en savoir plus, consultez la page Transformations de la tokenisation basée sur le chiffrement.

Limite du nombre de résultats

Si votre requête génère plus de 3 000 résultats, la protection des données sensibles renvoie le message suivant:

Too many findings to de-identify. Retry with a smaller request.

La liste des résultats renvoyés par la protection des données sensibles correspond à un sous-ensemble arbitraire de tous les résultats de la requête. Pour obtenir tous les résultats, divisez votre requête en plus petits lots.

Étapes suivantes

Découvrez comment un workflow d'anonymisation s'intègre aux déploiements réels.
Suivez l'atelier de programmation Masquer les données sensibles avec la protection des données sensibles.
Parcourez un exemple qui montre comment créer une clé encapsulée, tokeniser du contenu et restaurer l'identification du contenu tokenisé.
Découvrez comment créer une copie anonymisée des données dans l'espace de stockage.