De-identify and re-identify sensitive data
The process described in this quickstart is called pseudonymization (or tokenization). In this process, Sensitive Data Protection uses a cryptographic key to convert (de-identify) sensitive text into a token. In order to restore (re-identify) that text, you need the cryptographic key that you used during de-identification and the token.
Sensitive Data Protection supports both reversible and non-reversible cryptographic methods. In order to re-identify content, you need to choose a reversible method.
The cryptographic method described here is called deterministic encryption using AES-SIV (Advanced Encryption Standard in Synthetic Initialization Vector mode). We recommend this among all the reversible cryptographic methods that Sensitive Data Protection supports, because it provides the highest level of security.
You can complete the steps in this topic in 10 to 20 minutes, not including the Before you begin steps.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_ID
with a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_ID
with your Google Cloud project name.
-
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Sensitive Data Protection and Cloud KMS APIs:
gcloud services enable dlp.googleapis.com
cloudkms.googleapis.com -
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/dlp.user
gcloud projects add-iam-policy-binding PROJECT_ID --member="USER_IDENTIFIER" --role=ROLE
- Replace
PROJECT_ID
with your project ID. -
Replace
USER_IDENTIFIER
with the identifier for your user account. For example,user:myemail@example.com
. - Replace
ROLE
with each individual role.
- Replace
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project.
-
Create a Google Cloud project:
gcloud projects create PROJECT_ID
Replace
PROJECT_ID
with a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_ID
Replace
PROJECT_ID
with your Google Cloud project name.
-
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the Sensitive Data Protection and Cloud KMS APIs:
gcloud services enable dlp.googleapis.com
cloudkms.googleapis.com -
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/dlp.user
gcloud projects add-iam-policy-binding PROJECT_ID --member="USER_IDENTIFIER" --role=ROLE
- Replace
PROJECT_ID
with your project ID. -
Replace
USER_IDENTIFIER
with the identifier for your user account. For example,user:myemail@example.com
. - Replace
ROLE
with each individual role.
- Replace
Step 1: Create a key ring and a key
Before you start this procedure, decide where you want Sensitive Data Protection
to process your de-identification and re-identification requests. When you
create a Cloud KMS key, you must store it in either global
or in the
same region that you will use for your Sensitive Data Protection requests.
Otherwise, the Sensitive Data Protection requests will fail.
You can find a list of supported locations in
Sensitive Data Protection locations. Take note of the name
of your chosen region (for example, us-west1
).
This procedure uses global
as the location for all API requests. If you want
to use a different region, replace global
with the region name.
Create a key ring:
gcloud kms keyrings create "dlp-keyring" \ --location "global"
Create a key:
gcloud kms keys create "dlp-key" \ --location "global" \ --keyring "dlp-keyring" \ --purpose "encryption"
List your key ring and key:
gcloud kms keys list \ --location "global" \ --keyring "dlp-keyring"
You get the following output:
NAME PURPOSE ALGORITHM PROTECTION_LEVEL LABELS PRIMARY_ID PRIMARY_STATE projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key ENCRYPT_DECRYPT GOOGLE_SYMMETRIC_ENCRYPTION SOFTWARE 1 ENABLED
In this output,
PROJECT_ID
is the ID of your project.The path under
NAME
is the full resource name of your Cloud KMS key. Take note of it because the de-identify and re-identify requests require it.
Step 2: Create a base64-encoded AES key
This section describes how to create an Advanced Encryption Standard (AES) key and encode it in base64 format.
Create a 128-, 192-, or 256-bit AES key. The following command uses
openssl
to create a 256-bit key in the current directory:openssl rand -out "./aes_key.bin" 32
The file
aes_key.bin
is added to your current directory.Encode the AES key as a base64 string:
base64 -i ./aes_key.bin
You get an output similar to the following:
uEDo6/yKx+zCg2cZ1DBwpwvzMVNk/c+jWs7OwpkMc/s=
Step 3: Wrap the AES key using the Cloud KMS key
This section describes how to use the Cloud KMS key that you created in Step 1 to wrap the base64-encoded AES key that you created in Step 2.
To wrap the AES key, use curl
to send the following request to the
Cloud KMS API
projects.locations.keyRings.cryptoKeys.encrypt
:
curl "https://cloudkms.googleapis.com/v1/projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key:encrypt" \
--request "POST" \
--header "Authorization:Bearer $(gcloud auth application-default print-access-token)" \
--header "content-type: application/json" \
--data "{\"plaintext\": \"BASE64_ENCODED_AES_KEY\"}"
Replace the following:
PROJECT_ID
: the ID of your project.BASE64_ENCODED_AES_KEY
: the base64-encoded string returned in Step 2.
The response that you get from Cloud KMS is similar to the following JSON:
{ "name": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key/cryptoKeyVersions/1", "ciphertext": "CiQAYuuIGo5DVaqdE0YLioWxEhC8LbTmq7Uy2G3qOJlZB7WXBw0SSQAjdwP8ZusZJ3Kr8GD9W0vaFPMDksmHEo6nTDaW/j5sSYpHa1ym2JHk+lUgkC3Zw5bXhfCNOkpXUdHGZKou1893O8BDby/82HY=", "ciphertextCrc32c": "901327763", "protectionLevel": "SOFTWARE" }
In this output, PROJECT_ID
is the ID of your project.
Take note of the value of ciphertext
in the response that you get.
That is your wrapped key.
Step 4: Send a de-identify request to the DLP API
This section describes how to de-identify sensitive data in text content.
To complete this task, you need the following:
- The full resource name of the Cloud KMS key that you created in Step 1.
- The wrapped key that you created in Step 3.
This section requires you to save the sample request in a JSON file. If you're using Cloud Shell, you can use the Cloud Shell Editor to create the file. To launch the editor, click
Open Editor on the toolbar of the Cloud Shell window.To de-identify sensitive data in text content, follow these steps:
Create a JSON request file with the following text.
{ "item": { "value": "My name is Alicia Abernathy, and my email address is aabernathy@example.com." }, "deidentifyConfig": { "infoTypeTransformations": { "transformations": [ { "infoTypes": [ { "name": "EMAIL_ADDRESS" } ], "primitiveTransformation": { "cryptoDeterministicConfig": { "cryptoKey": { "kmsWrapped": { "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key", "wrappedKey": "WRAPPED_KEY" } }, "surrogateInfoType": { "name": "EMAIL_ADDRESS_TOKEN" } } } } ] } }, "inspectConfig": { "infoTypes": [ { "name": "EMAIL_ADDRESS" } ] } }
Replace the following:
PROJECT_ID
: the ID of your project.WRAPPED_KEY
: the wrapped key that you created in Step 3.
Make sure that the resulting value of
cryptoKeyName
forms the full resource name of your Cloud KMS key.For more information on the components of this JSON request, see
projects.locations.content.deidentify
. After you complete this quickstart, try experimenting with different inputs for this request. You can usecurl
as described here. Alternatively, you can use the API Explorer on that API reference page under Try this API.Save the file as
deidentify-request.json
.Use
curl
to make aprojects.locations.content.deidentify
request:curl -s \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/global/content:deidentify \ -d @deidentify-request.json
Replace
PROJECT_ID
with the ID of your project.To pass a filename to
curl
you use the-d
option (for data) and precede the filename with an@
sign. This file must be in the same directory where you execute thecurl
command.The response that you get from Sensitive Data Protection is similar to the following JSON:
{ "item": { "value": "My name is Alicia Abernathy, and my email address is EMAIL_ADDRESS_TOKEN(52):AVAx2eIEnIQP5jbNEr2j9wLOAd5m4kpSBR/0jjjGdAOmryzZbE/q." }, "overview": { "transformedBytes": "22", "transformationSummaries": [ { "infoType": { "name": "EMAIL_ADDRESS" }, "transformation": { "cryptoDeterministicConfig": { "cryptoKey": { "kmsWrapped": { "wrappedKey": "CiQAYuuIGo5DVaqdE0YLioWxEhC8LbTmq7Uy2G3qOJlZB7WXBw0SSQAjdwP8ZusZJ3Kr8GD9W0vaFPMDksmHEo6nTDaW/j5sSYpHa1ym2JHk+lUgkC3Zw5bXhfCNOkpXUdHGZKou1893O8BDby/82HY=", "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key" } }, "surrogateInfoType": { "name": "EMAIL_ADDRESS_TOKEN" } } }, "results": [ { "count": "1", "code": "SUCCESS" } ], "transformedBytes": "22" } ] } }
In the
item
field, the email address is replaced with a token likeEMAIL_ADDRESS_TOKEN(52):AVAx2eIEnIQP5jbNEr2j9wLOAd5m4kpSBR/0jjjGdAOmryzZbE/q
. To re-identify this content, you must pass the entire token in the re-identify request.
Step 5: Send a re-identify request to the DLP API
This section describes how to re-identify tokenized data in text content.
To complete this task, you need the following:
- The full resource name of the Cloud KMS key that you created in Step 1.
- The wrapped key that you created in Step 3.
- The token that you received in Step 4.
To re-identify tokenized content, follow these steps:
Create a JSON request file with the following text.
{ "reidentifyConfig":{ "infoTypeTransformations":{ "transformations":[ { "infoTypes":[ { "name":"EMAIL_ADDRESS_TOKEN" } ], "primitiveTransformation":{ "cryptoDeterministicConfig":{ "cryptoKey":{ "kmsWrapped": { "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key", "wrappedKey": "WRAPPED_KEY" } }, "surrogateInfoType":{ "name":"EMAIL_ADDRESS_TOKEN" } } } } ] } }, "inspectConfig":{ "customInfoTypes":[ { "infoType":{ "name":"EMAIL_ADDRESS_TOKEN" }, "surrogateType":{ } } ] }, "item":{ "value": "My name is Alicia Abernathy, and my email address is TOKEN." } }
Replace the following:
PROJECT_ID
: the ID of your project.WRAPPED_KEY
: the wrapped key that you created in Step 3.TOKEN
: the token that you received in Step 4—for example,EMAIL_ADDRESS_TOKEN(52):AVAx2eIEnIQP5jbNEr2j9wLOAd5m4kpSBR/0jjjGdAOmryzZbE/q
.
Make sure that the resulting value of
cryptoKeyName
forms the full resource name of your Cloud KMS key.For more information on the components of this JSON request, see
projects.locations.content.reidentify
. After you complete this quickstart, try experimenting with different inputs for this request. You can usecurl
as described here. Alternatively, you can use the API Explorer on that API reference page under Try this API.Save the file as
reidentify-request.json
.Use
curl
to make aprojects.locations.content.reidentify
request:curl -s \ -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ -H "Content-Type: application/json" \ https://dlp.googleapis.com/v2/projects/PROJECT_ID/locations/global/content:reidentify \ -d @reidentify-request.json
Replace
PROJECT_ID
with the ID of your project.To pass a filename to
curl
you use the-d
option (for data) and precede the filename with an@
sign. This file must be in the same directory where you execute thecurl
command.The response that you get from Sensitive Data Protection is similar to the following JSON:
{ "item": { "value": "My name is Alicia Abernathy, and my email address is aabernathy@example.com." }, "overview": { "transformedBytes": "70", "transformationSummaries": [ { "infoType": { "name": "EMAIL_ADDRESS" }, "transformation": { "cryptoDeterministicConfig": { "cryptoKey": { "kmsWrapped": { "wrappedKey": "CiQAYuuIGo5DVaqdE0YLioWxEhC8LbTmq7Uy2G3qOJlZB7WXBw0SSQAjdwP8ZusZJ3Kr8GD9W0vaFPMDksmHEo6nTDaW/j5sSYpHa1ym2JHk+lUgkC3Zw5bXhfCNOkpXUdHGZKou1893O8BDby/82HY=", "cryptoKeyName": "projects/PROJECT_ID/locations/global/keyRings/dlp-keyring/cryptoKeys/dlp-key" } }, "surrogateInfoType": { "name": "EMAIL_ADDRESS_TOKEN" } } }, "results": [ { "count": "1", "code": "SUCCESS" } ], "transformedBytes": "70" } ] } }
In the
item
field, the email address token is replaced with the actual email address from the original text.You've just de-identified and re-identified sensitive data in text content using deterministic encryption.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, delete the Google Cloud project with the resources.
Destroy your key version
If you no longer want to use the key you created in this quickstart, destroy its version.
List the versions available for your key:
gcloud kms keys versions list \
--location "global" \
--keyring "dlp-keyring" \
--key "dlp-key"
To destroy a version, run the following command:
gcloud kms keys versions destroy KEY_VERSION \
--location "global" \
--keyring "dlp-keyring" \
--key "dlp-key"
Replace KEY_VERSION
with the number of the version to be
destroyed.
Delete the project
If you created a new project for this quickstart, the easiest way to prevent additional charges is to delete the project.
Delete a Google Cloud project:
gcloud projects delete PROJECT_ID
Revoke your credentials
Optional: Revoke credentials from the gcloud CLI.
gcloud auth revoke
What's next
- For more in-depth information on how to de-identify sensitive content, see De-identifying sensitive data.
- For information on how a de-identification workflow fits into real-life deployments, see De-identification and re-identification of PII in large-scale datasets using Sensitive Data Protection.
- For conceptual information on tokenizing data through a cryptographic key, see Pseudonymization.