Quickstart: De-identifying and re-identifying sensitive text

This quickstart shows you how to use Cloud Data Loss Prevention (DLP) to de-identify and re-identify sensitive data in text content. In the process, it takes you through using Cloud Key Management Service to create a wrapped key. You need this key in your de-identify and re-identify requests.

The process described in this quickstart is called pseudonymization (or tokenization). In this process, Cloud DLP uses a cryptographic key to convert (de-identify) sensitive text into a token. In order to restore (re-identify) that text, you need the cryptographic key that you used during de-identification and the token.

Cloud DLP supports both reversible and non-reversible cryptographic methods. In order to re-identify content, you need to choose a reversible method.

The cryptographic method described here is called deterministic encryption using AES-SIV (Advanced Encryption Standard in Synthetic Initialization Vector mode). We recommend this among all the reversible cryptographic methods that Cloud DLP supports, because it provides the highest level of security.

You can complete the steps in this topic in 10 to 20 minutes, not including the Before you begin steps.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the Cloud DLP and Cloud KMS APIs.

    Enable the APIs

  5. Create a service account:

    1. In the Cloud Console, go to the Create service account page.

      Go to Create service account
    2. Select a project.
    3. In the Service account name field, enter a name. The Cloud Console fills in the Service account ID field based on this name.

      In the Service account description field, enter a description. For example, Service account for quickstart.

    4. Click Create.
    5. Click the Select a role field.

      Under Quick access, click Basic, then click Owner.

    6. Click Continue.
    7. Click Done to finish creating the service account.

      Do not close your browser window. You will use it in the next step.

  6. Create a service account key:

    1. In the Cloud Console, click the email address for the service account that you created.
    2. Click Keys.
    3. Click Add key, then click Create new key.
    4. Click Create. A JSON key file is downloaded to your computer.
    5. Click Close.
  7. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.

  8. Install and initialize the Cloud SDK.

Step 1: Create a key ring and a key

Before you start this procedure, decide where you want Cloud DLP to process your de-identification and re-identification requests. When you create a Cloud KMS key, you must store it in either global or in the same region that you will use for your Cloud DLP requests. Otherwise, the Cloud DLP requests will fail.

You can find a list of supported locations in Cloud DLP locations. Take note of the name of your chosen region (for example, us-west1).

This procedure uses global as the location for all API requests. If you want to use a different region, replace global with the region name.

  1. Create a key ring:

    gcloud kms keyrings create "dlp-keyring" \
        --location "global"
    
  2. Create a key:

    gcloud kms keys create "dlp-key" \
        --location "global" \
        --keyring "dlp-keyring" \
        --purpose "encryption"
    
  3. List your key ring and key:

    gcloud kms keys list \
        --location "global" \
        --keyring "dlp-keyring"
    

    You get the following output:

    NAME                                                                                   PURPOSE          ALGORITHM                    PROTECTION_LEVEL  LABELS  PRIMARY_ID  PRIMARY_STATE
    projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key  ENCRYPT_DECRYPT  GOOGLE_SYMMETRIC_ENCRYPTION  SOFTWARE                  1           ENABLED
    

    In this output, PROJECT_ID is the ID of your project.

    The path under NAME is the full resource name of your Cloud KMS key. Take note of it because the de-identify and re-identify requests require it.

Step 2: Create a base64-encoded AES key

This section describes how to create an Advanced Encryption Standard (AES) key and encode it in base64 format.

  1. Create a 128-, 192-, or 256-bit AES key. The following command uses openssl to create a 256-bit key in the current directory:

    openssl rand -out "./aes_key.bin" 32
    

    The file aes_key.bin is added to your current directory.

  2. Encode the AES key as a base64 string:

    base64 -i ./aes_key.bin
    

    You get an output similar to the following:

    uEDo6/yKx+zCg2cZ1DBwpwvzMVNk/c+jWs7OwpkMc/s=
    

Step 3: Wrap the AES key using the Cloud KMS key

This section describes how to use the Cloud KMS key that you created in Step 1 to wrap the base64-encoded AES key that you created in Step 2.

To wrap the AES key, use curl to send the following request to the Cloud KMS API projects.locations.keyRings.cryptoKeys.encrypt:

curl "https://cloudkms.googleapis.com/v1/projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key:encrypt" \
  --request "POST" \
  --header "Authorization:Bearer $(gcloud auth application-default print-access-token)" \
  --header "content-type: application/json" \
  --data "{\"plaintext\": \"BASE64_ENCODED_AES_KEY\"}"

Replace the following:

The response that you get from Cloud KMS is similar to the following JSON:

{
  "name": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key/cryptoKeyVersions/1",
  "ciphertext": "CiQAYuuIGo5DVaqdE0YLioWxEhC8LbTmq7Uy2G3qOJlZB7WXBw0SSQAjdwP8ZusZJ3Kr8GD9W0vaFPMDksmHEo6nTDaW/j5sSYpHa1ym2JHk+lUgkC3Zw5bXhfCNOkpXUdHGZKou1893O8BDby/82HY=",
  "ciphertextCrc32c": "901327763",
  "protectionLevel": "SOFTWARE"
}

In this output, PROJECT_ID is the ID of your project.

Take note of the value of ciphertext in the response that you get. That is your wrapped key.

Step 4: Send a de-identify request to the Cloud DLP API

This section describes how to de-identify sensitive data in text content.

To complete this task, you need the following:

  • The full resource name of the Cloud KMS key that you created in Step 1.
  • The wrapped key that you created in Step 3.

To de-identify sensitive data in text content, follow these steps:

  1. Create a JSON request file with the following text.

    {
      "item": {
        "value": "My name is Alicia Abernathy, and my email address is aabernathy@example.com."
      },
      "deidentifyConfig": {
        "infoTypeTransformations": {
          "transformations": [
            {
              "infoTypes": [
                {
                  "name": "EMAIL_ADDRESS"
                }
              ],
              "primitiveTransformation": {
                "cryptoDeterministicConfig": {
                  "cryptoKey": {
                    "kmsWrapped": {
                      "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key",
                      "wrappedKey": "WRAPPED_KEY"
                    }
                  },
                  "surrogateInfoType": {
                    "name": "EMAIL_ADDRESS_TOKEN"
                  }
                }
              }
            }
          ]
        }
      },
      "inspectConfig": {
        "infoTypes": [
          {
            "name": "EMAIL_ADDRESS"
          }
        ]
      }
    }
    

    Replace the following:

    • PROJECT_ID: the ID of your project.
    • WRAPPED_KEY: the wrapped key that you created in Step 3.

    Make sure that the resulting value of cryptoKeyName forms the full resource name of your Cloud KMS key.

    For more information on the components of this JSON request, see projects.locations.content.deidentify. After you complete this quickstart, try experimenting with different inputs for this request. You can use curl as described here. Alternatively, you can use the API Explorer on that API reference page under Try this API.

  2. Save the file as deidentify-request.json.

  3. Use curl to make a projects.locations.content.deidentify request:

    curl -s \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json" \
    https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/global/content:deidentify \
    -d @deidentify-request.json
    

    Replace PROJECT_ID with the ID of your project.

    To pass a filename to curl you use the -d option (for data) and precede the filename with an @ sign. This file must be in the same directory where you execute the curl command.

    The response that you get from Cloud DLP is similar to the following JSON:

    {
     "item": {
       "value": "My name is Alicia Abernathy, and my email address is EMAIL_ADDRESS_TOKEN(52):AVAx2eIEnIQP5jbNEr2j9wLOAd5m4kpSBR/0jjjGdAOmryzZbE/q."
     },
     "overview": {
       "transformedBytes": "22",
       "transformationSummaries": [
         {
           "infoType": {
             "name": "EMAIL_ADDRESS"
           },
           "transformation": {
             "cryptoDeterministicConfig": {
               "cryptoKey": {
                 "kmsWrapped": {
                   "wrappedKey": "CiQAYuuIGo5DVaqdE0YLioWxEhC8LbTmq7Uy2G3qOJlZB7WXBw0SSQAjdwP8ZusZJ3Kr8GD9W0vaFPMDksmHEo6nTDaW/j5sSYpHa1ym2JHk+lUgkC3Zw5bXhfCNOkpXUdHGZKou1893O8BDby/82HY=",
                   "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key"
                 }
               },
               "surrogateInfoType": {
                 "name": "EMAIL_ADDRESS_TOKEN"
               }
             }
           },
           "results": [
             {
               "count": "1",
               "code": "SUCCESS"
             }
           ],
           "transformedBytes": "22"
         }
       ]
     }
    }
    

    In the item field, the email address is replaced with a token like EMAIL_ADDRESS_TOKEN(52):AVAx2eIEnIQP5jbNEr2j9wLOAd5m4kpSBR/0jjjGdAOmryzZbE/q. To re-identify this content, you must pass the entire token in the re-identify request.

Step 5: Send a re-identify request to the Cloud DLP API

This section describes how to re-identify tokenized data in text content.

To complete this task, you need the following:

  • The full resource name of the Cloud KMS key that you created in Step 1.
  • The wrapped key that you created in Step 3.
  • The token that you received in Step 4.

To re-identify tokenized content, follow these steps:

  1. Create a JSON request file with the following text.

    {
      "reidentifyConfig":{
        "infoTypeTransformations":{
          "transformations":[
            {
              "infoTypes":[
                {
                  "name":"EMAIL_ADDRESS_TOKEN"
                }
              ],
              "primitiveTransformation":{
                "cryptoDeterministicConfig":{
                  "cryptoKey":{
                  "kmsWrapped": {
                    "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key",
                    "wrappedKey": "WRAPPED_KEY"
                  }
                },
                  "surrogateInfoType":{
                    "name":"EMAIL_ADDRESS_TOKEN"
                  }
                }
              }
            }
          ]
        }
      },
      "inspectConfig":{
        "customInfoTypes":[
          {
            "infoType":{
              "name":"EMAIL_ADDRESS_TOKEN"
            },
            "surrogateType":{
    
            }
          }
        ]
      },
      "item":{
        "value": "My name is Alicia Abernathy, and my email address is TOKEN."
      }
    }
    

    Replace the following:

    • PROJECT_ID: the ID of your project.
    • WRAPPED_KEY: the wrapped key that you created in Step 3.
    • TOKEN: the token that you received in Step 4—for example, EMAIL_ADDRESS_TOKEN(52):AVAx2eIEnIQP5jbNEr2j9wLOAd5m4kpSBR/0jjjGdAOmryzZbE/q.

    Make sure that the resulting value of cryptoKeyName forms the full resource name of your Cloud KMS key.

    For more information on the components of this JSON request, see projects.locations.content.reidentify. After you complete this quickstart, try experimenting with different inputs for this request. You can use curl as described here. Alternatively, you can use the API Explorer on that API reference page under Try this API.

  2. Save the file as reidentify-request.json.

  3. Use curl to make a projects.locations.content.reidentify request:

    curl -s \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
    -H "Content-Type: application/json" \
    https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/global/content:reidentify \
    -d @reidentify-request.json
    

    Replace PROJECT_ID with the ID of your project.

    To pass a filename to curl you use the -d option (for data) and precede the filename with an @ sign. This file must be in the same directory where you execute the curl command.

    The response that you get from Cloud DLP is similar to the following JSON:

    {
     "item": {
       "value": "My name is Alicia Abernathy, and my email address is aabernathy@example.com."
     },
     "overview": {
       "transformedBytes": "70",
       "transformationSummaries": [
         {
           "infoType": {
             "name": "EMAIL_ADDRESS"
           },
           "transformation": {
             "cryptoDeterministicConfig": {
               "cryptoKey": {
                 "kmsWrapped": {
                   "wrappedKey": "CiQAYuuIGo5DVaqdE0YLioWxEhC8LbTmq7Uy2G3qOJlZB7WXBw0SSQAjdwP8ZusZJ3Kr8GD9W0vaFPMDksmHEo6nTDaW/j5sSYpHa1ym2JHk+lUgkC3Zw5bXhfCNOkpXUdHGZKou1893O8BDby/82HY=",
                   "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key"
                 }
               },
               "surrogateInfoType": {
                 "name": "EMAIL_ADDRESS_TOKEN"
               }
             }
           },
           "results": [
             {
               "count": "1",
               "code": "SUCCESS"
             }
           ],
           "transformedBytes": "70"
         }
       ]
     }
    }
    

    In the item field, the email address token is replaced with the actual email address from the original text.

    You've just de-identified and re-identified sensitive data in text content using deterministic encryption.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this quickstart, follow these steps.

Destroy your key version

If you no longer want to use the key you created in this quickstart, destroy its version.

List the versions available for your key:

gcloud kms keys versions list \
    --location "global" \
    --keyring "dlp-keyring" \
    --key "dlp-key"

To destroy a version, run the following command:

gcloud kms keys versions destroy KEY_VERSION \
    --location "global" \
    --keyring "dlp-keyring" \
    --key "dlp-key"

Replace KEY_VERSION with the number of the version to be destroyed.

Delete the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next