Sensitive Data Protection can de-identify sensitive data in text content, including text stored in container structures such as tables. De-identification is the process of removing identifying information from data. The API detects sensitive data such as personally identifiable information (PII), and then uses a de-identification transformation to mask, delete, or otherwise obscure the data. For example, de-identification techniques can include any of the following:
- Masking sensitive data by partially or fully replacing characters with a symbol, such as an asterisk (*) or hash (#).
- Replacing each instance of sensitive data with a token, or surrogate, string.
- Encrypting and replacing sensitive data using a randomly generated or pre-determined key.
You can feed information to the API using JSON over HTTPS, as well as the CLI and several programming languages using the Sensitive Data Protection client libraries. To set up the CLI, refer to the quickstart. For more information about submitting information in JSON format, see the JSON quickstart.
API overview
To de-identify sensitive data, use Sensitive Data Protection's
content.deidentify
method.
There are three parts to a de-identification API call:
- The data to inspect: A string or table structure
(
ContentItem
object) for the API to inspect. - What to inspect for: Detection configuration information
(
InspectConfig
) such as what types of data (or infoTypes) to look for, whether to filter findings that are above a certain likelihood threshold, whether to return no more than a certain number of results, and so on. Not specifying at least one infoType in anInspectConfig
argument is equivalent to specifying all built-in infoTypes. Doing so is not recommended, as it can cause decreased performance and increased cost. - What to do with the inspection findings: Configuration information
(
DeidentifyConfig
) that defines how you want the sensitive data de-identified. This argument is covered in more detail in the following section.
The API returns the same items you gave it, in the same format, but any text identified as containing sensitive information according to your criteria has been de-identified.
Specifying detection criteria
Information type (or "infoType") detectors are the mechanisms that Sensitive Data Protection uses to find sensitive data.
Sensitive Data Protection includes several kinds of infoType detectors, all of which are summarized here:
- Built-in infoType detectors are built into Sensitive Data Protection. They include detectors for country- or region-specific sensitive data types as well as globally applicable data types.
- Custom infoType detectors are detectors that you create
yourself. There are three kinds of custom infoType detectors:
- Regular custom dictionary detectors are simple word lists that Sensitive Data Protection matches on. Use regular custom dictionary detectors when you have a list of up to several tens of thousands of words or phrases. Regular custom dictionary detectors are preferred if you don't anticipate your word list changing significantly.
- Stored custom dictionary detectors are generated by Sensitive Data Protection using large lists of words or phrases stored in either Cloud Storage or BigQuery. Use stored custom dictionary detectors when you have a large list of words or phrases—up to tens of millions.
- Regular expressions (regex) detectors enable Sensitive Data Protection to detect matches based on a regular expression pattern.
In addition, Sensitive Data Protection includes the concept of inspection rules, which enable you to fine-tune scan results using the following:
- Exclusion rules enable you to decrease the number of findings returned by adding rules to a built-in or custom infoType detector.
- Hotword rules enable you to increase the quantity or change the likelihood value of findings returned by adding rules to a built-in or custom infoType detector.
De-identification transformations
You must specify one or more transformations when you set the de-identification
configuration
(DeidentifyConfig
).
There are two categories of transformations:
InfoTypeTransformations
: Transformations that are only applied to values within submitted text that are identified as a specific infoType.RecordTransformations
: Transformations that are only applied to values within submitted tabular text data that are identified as a specific infoType, or on an entire column of tabular data.
InfoType transformations
You can specify one or more infoType transformations per request. Within each
InfoTypeTransformation
object, you specify both of the following:
- One or more infoTypes to which a
transformation should be applied (the
infoTypes[]
array object). - A primitive transformation (the
PrimitiveTransformation
object).
Note that specifying an infoType is optional, but not specifying at least one
infoType in an
InspectConfig
argument causes the transformation to apply to all built-in infoTypes that
don't have a transformation provided. Doing so is not recommended, as it can
cause decreased performance and increased cost.
Primitive transformations
You must specify at least one primitive transformation to apply to the input, regardless of whether you're applying it only to certain infoTypes or to the entire text string. The following sections describe examples of transformation methods that you can use. For a list of all transformation methods that Sensitive Data Protection offers, see Transformation reference.
replaceConfig
Setting replaceConfig
to a ReplaceValueConfig
object replaces matched input values with a value you specify.
For example, suppose you've set replaceConfig
to "[email-address]
"
for all EMAIL_ADDRESS
infoTypes, and the following string is sent to
Sensitive Data Protection:
My name is Alicia Abernathy, and my email address is aabernathy@example.com.
The returned string will be the following:
My name is Alicia Abernathy, and my email address is [email-address].
The following JSON example and code in several languages shows how to form the API request and what the DLP API returns:
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
JSON Input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}
{
"item":{
"value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
},
"deidentifyConfig":{
"infoTypeTransformations":{
"transformations":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"primitiveTransformation":{
"replaceConfig":{
"newValue":{
"stringValue":"[email-address]"
}
}
}
}
]
}
},
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
}
}
JSON Output:
{
"item":{
"value":"My name is Alicia Abernathy, and my email address is [email-address]."
},
"overview":{
"transformedBytes":"22",
"transformationSummaries":[
{
"infoType":{
"name":"EMAIL_ADDRESS"
},
"transformation":{
"replaceConfig":{
"newValue":{
"stringValue":"[email-address]"
}
}
},
"results":[
{
"count":"1",
"code":"SUCCESS"
}
],
"transformedBytes":"22"
}
]
}
}
redactConfig
Specifying
redactConfig
redacts a given value by removing it completely. The redactConfig
message
has no arguments; specifying it enables its transformation.
For example, suppose you've specified redactConfig
for all EMAIL_ADDRESS
infoTypes, and the following string is sent to Sensitive Data Protection:
My name is Alicia Abernathy, and my email address is aabernathy@example.com.
The returned string will be the following:
My name is Alicia Abernathy, and my email address is .
The following examples show how to form the API request and what the DLP API returns:
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
JSON Input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}
{
"item":{
"value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
},
"deidentifyConfig":{
"infoTypeTransformations":{
"transformations":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"primitiveTransformation":{
"redactConfig":{
}
}
}
]
}
},
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
}
}
JSON Output:
{
"item":{
"value":"My name is Alicia Abernathy, and my email address is ."
},
"overview":{
"transformedBytes":"22",
"transformationSummaries":[
{
"infoType":{
"name":"EMAIL_ADDRESS"
},
"transformation":{
"redactConfig":{
}
},
"results":[
{
"count":"1",
"code":"SUCCESS"
}
],
"transformedBytes":"22"
}
]
}
}
characterMaskConfig
Setting characterMaskConfig
to a CharacterMaskConfig
object partially masks a string by replacing a given number of characters
with a fixed character. Masking can start from the beginning or end of the
string. This transformation also works with number types such as long
integers.
The CharacterMaskConfig
object has several of its own arguments:
maskingCharacter
: The character to use to mask each character of a sensitive value. For example, you could specify an asterisk (*) or hash (#) to mask a series of numbers such as those in a credit card number.numberToMask
: The number of characters to mask. If you don't set this value, all matching characters will be masked.reverseOrder
: Whether to mask characters in reverse order. SettingreverseOrder
to true causes characters in matched values to be masked from the end toward the beginning of the value. Setting it to false causes masking to begin at the start of the value.charactersToIgnore[]
: One or more characters to skip when masking values. For example, specify a hyphen here to leave the hyphens in place when masking a telephone number. You can also specify a group of common characters (CharsToIgnore
) to ignore when masking.
For example, suppose you've set characterMaskConfig
to mask with '#' for
EMAIL_ADDRESS
infotypes, except for the '.' and '@' characters. If the
following string is sent to Sensitive Data Protection:
My name is Alicia Abernathy, and my email address is aabernathy@example.com.
The returned string will be the following:
My name is Alicia Abernathy, and my email address is ##########@#######.###.
Following are examples that demonstrate how to use the DLP API to de-identify sensitive data using masking techniques.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
The following JSON example shows how to form the API request and what the DLP API returns:
JSON Input:
POST https://dlp.googleapis.com/v2/projects/[PROJECT_ID]/content:deidentify?key={YOUR_API_KEY}
{
"item":{
"value":"My name is Alicia Abernathy, and my email address is aabernathy@example.com."
},
"deidentifyConfig":{
"infoTypeTransformations":{
"transformations":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"primitiveTransformation":{
"characterMaskConfig":{
"maskingCharacter":"#",
"reverseOrder":false,
"charactersToIgnore":[
{
"charactersToSkip":".@"
}
]
}
}
}
]
}
},
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
}
}
JSON Output:
{
"item":{
"value":"My name is Alicia Abernathy, and my email address is ##########@#######.###."
},
"overview":{
"transformedBytes":"22",
"transformationSummaries":[
{
"infoType":{
"name":"EMAIL_ADDRESS"
},
"transformation":{
"characterMaskConfig":{
"maskingCharacter":"#",
"charactersToIgnore":[
{
"charactersToSkip":".@"
}
]
}
},
"results":[
{
"count":"1",
"code":"SUCCESS"
}
],
"transformedBytes":"22"
}
]
}
}
cryptoHashConfig
Setting cryptoHashConfig
to a CryptoHashConfig
object performs pseudonymization on an input value
by generating a surrogate value using cryptographic hashing.
This method replaces the input value with an encrypted "digest," or hash value.
The digest is computed by taking the SHA-256 hash of the input value.
The cryptographic key used to make the hash is a
CryptoKey
object, and must be either 32 or 64 bytes in size.
The method outputs a base64-encoded representation of the hashed output. Currently, only string and integer values can be hashed.
For example, suppose you've specified cryptoHashConfig
for all
EMAIL_ADDRESS
infoTypes, and the CryptoKey
object consists of a
randomly-generated key (a
TransientCryptoKey
).
Then, the following string is sent to Sensitive Data Protection:
My name is Alicia Abernathy, and my email address is aabernathy@example.com.
The cryptographically generated returned string will look like the following:
My name is Alicia Abernathy, and my email address is 41D1567F7F99F1DC2A5FAB886DEE5BEE.
Of course, the hex string will be cryptographically generated and different from the one shown here.
dateShiftConfig
Setting dateShiftConfig
to a DateShiftConfig
object performs date shifting on
a date input value by shifting the dates by a random number of days.
Date shifting techniques randomly shift a set of dates but preserve the sequence and duration of a period of time. Shifting dates is usually done in context to an individual or an entity. That is, you want to shift all of the dates for a specific individual using the same shift differential, but use a separate shift differential for each other individual.
For more information about date shifting, see the date shifting concept topic.
Following is sample code in several languages that demonstrates how to use the DLP API to de-identify dates using date shifting.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
cryptoReplaceFfxFpeConfig
Setting cryptoReplaceFfxFpeConfig
to a
CryptoReplaceFfxFpeConfig
object performs pseudonymization on an input
value by replacing an input value with a token. This token is:
- The encrypted input value.
- The same length as the input value.
- Computed using format-preserving encryption in FFX mode ("FPE-FFX") keyed
on the cryptographic key specified by
cryptoKey
. - Comprised of the characters specified by
alphabet
. Valid options:NUMERIC
HEXADECIMAL
UPPER_CASE_ALPHA_NUMERIC
ALPHA_NUMERIC
The input value:
- Must be at least two characters long (or the empty string).
- Must be comprised of the characters specified by an
alphabet
. Thealphabet
can be comprised of between 2 and 95 characters. (Analphabet
with 95 characters includes all printable characters in the US-ASCII character set.)
Sensitive Data Protection computes the replacement token using a cryptographic key. You provide this key in one of three ways:
- By embedding it unencrypted in the API request. This is not recommended.
- By requesting that Sensitive Data Protection generate it.
- By embedding it encrypted in the API request.
If you choose to embed the key in the API request, you need to create a key and wrap (encrypt) it using a Cloud Key Management Service (Cloud KMS) key. The value returned is a base64-encoded string by default. To set this value in Sensitive Data Protection, you must decode it into a byte string. The following code snippets highlight how to do this in several languages. End-to-end examples are provided following these snippets.
Java
KmsWrappedCryptoKey.newBuilder()
.setWrappedKey(ByteString.copyFrom(BaseEncoding.base64().decode(wrappedKey)))
Python
# The wrapped key is base64-encoded, but the library expects a binary
# string, so decode it here.
import base64
wrapped_key = base64.b64decode(wrapped_key)
PHP
// Create the wrapped crypto key configuration object
$kmsWrappedCryptoKey = (new KmsWrappedCryptoKey())
->setWrappedKey(base64_decode($wrappedKey))
->setCryptoKeyName($keyName);
C#
WrappedKey = ByteString.FromBase64(wrappedKey)
For more information about encrypting and decrypting data using Cloud KMS, see Encrypting and Decrypting Data.
By design, FPE-FFX preserves the length and character set of the input text. This means that it lacks authentication and an initialization vector, which would cause a length expansion in the output token. Other methods like AES-SIV provide these stronger security guarantees and are recommended for tokenization use cases unless length and character set preservation are strict requirements—for example, for backward compatibility with a legacy data system.
Following is sample code in several languages that demonstrates how to use Sensitive Data Protection to de-identify sensitive data by replacing an input value with a token.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
For code samples that demonstrate how to use Sensitive Data Protection to
re-identify sensitive data that was de-identified through the
CryptoReplaceFfxFpeConfig
transformation method, see Format-preserving
encryption: re-identification
examples.
fixedSizeBucketingConfig
The bucketing transformations—this one and
bucketingConfig
—serve to mask numerical data by
"bucketing" it into ranges. The resulting number range is a hyphenated string
consisting of a lower bound, a hyphen, and an upper bound.
Setting fixedSizeBucketingConfig
to a FixedSizeBucketingConfig
object buckets input values based on fixed size ranges. The
FixedSizeBucketingConfig
object consists of the following:
lowerBound
: The lower bound value of all of the buckets. Values less than this one are grouped together in a single bucket.upperBound
: The upper bound value of all of the buckets. Values greater than this one are grouped together in a single bucket.bucketSize
: The size of each bucket other than the minimum and maximum buckets.
For example, if lowerBound
is set to 10, upperBound
is set to 89, and
bucketSize
is set to 10, then the following buckets would be used: -10,
10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-89, 89+.
For more information about the concept of bucketing, see Generalization and Bucketing.
bucketingConfig
The bucketingConfig
transformation offers more flexibility than the other
bucketing transformation, fixedSizeBucketingConfig
.
Instead of specifying upper and lower bounds and an interval value with which
to create equal-sized buckets, you specify the maximum and minimum values for
each bucket you want created. Each maximum and minimum value pair must have
the same type.
Setting bucketingConfig
to a BucketingConfig
object specifies custom buckets. The BucketingConfig
object consists of a
buckets[]
array of Bucket
objects. Each Bucket
object consists of the following:
min
: The lower bound of the bucket's range. Omit this value to create a bucket that has no lower bound.max
: The upper bound of the bucket's range. Omit this value to create a bucket that has no upper bound.replacementValue
: The value with which to replace values that fall within the lower and upper bounds. If you don't provide areplacementValue
, a hyphenatedmin-max
range will be used instead.
If a value falls outside of the defined ranges, the TransformationSummary
returned will contain an error message.
For example, consider the following configuration for the bucketingConfig
transformation:
"bucketingConfig":{
"buckets":[
{
"min":{
"integerValue":"1"
},
"max":{
"integerValue":"30"
},
"replacementValue":{
"stringValue":"LOW"
}
},
{
"min":{
"integerValue":"31"
},
"max":{
"integerValue":"65"
},
"replacementValue":{
"stringValue":"MEDIUM"
}
},
{
"min":{
"integerValue":"66"
},
"max":{
"integerValue":"100"
},
"replacementValue":{
"stringValue":"HIGH"
}
}
]
}
This defines the following behavior:
- Integer values falling between 1 and 30 are masked by being replaced with
LOW
. - Integer values falling between 31-65 are masked by being replaced with
MEDIUM
. - Integer values falling between 66-100 are masked by being replaced with
HIGH
.
For more information about the concept of bucketing, see Generalization and Bucketing.
replaceWithInfoTypeConfig
Specifying
replaceWithInfoTypeConfig
replaces each matched value with the name of the infoType. The
replaceWithInfoTypeConfig
message has no arguments; specifying it enables
its transformation.
For example, suppose you've specified replaceWithInfoTypeConfig
for all
EMAIL_ADDRESS
infoTypes, and the following string is sent to
Sensitive Data Protection:
My name is Alicia Abernathy, and my email address is aabernathy@example.com.
The returned string will be the following:
My name is Alicia Abernathy, and my email address is EMAIL_ADDRESS.
timePartConfig
Setting timePartConfig
to a
TimePartConfig
object preserves a portion of a matched value that includes Date
,
Timestamp
, and TimeOfDay
values. The TimePartConfig
object consists of
a partToExtract
argument, which can be set to any of the TimePart
enumerated values, including year, month, day of the month, and so on.
For example, suppose you've configured a timePartConfig
transformation by
setting partToExtract
to YEAR
. After sending the data in the first column
below to Sensitive Data Protection, you'd end up with the transformed values
in the second column:
Original values | Transformed values |
---|---|
9/21/1976 |
1976 |
6/7/1945 |
1945 |
1/20/2009 |
2009 |
7/4/1776 |
1776 |
8/1/1984 |
1984 |
4/21/1982 |
1982 |
Record transformations
Record transformations (the
RecordTransformations
object) are only applied to values within tabular data that are identified as
a specific infoType. Within RecordTransformations
, there are two further
subcategories of transformations:
fieldTransformations[]
: Transformations that apply various field transformations.recordSuppressions[]
: Rules defining which records get suppressed completely. Records that match any suppression rule withinrecordSuppressions[]
are omitted from the output.
Field transformations
Each
FieldTransformation
object includes three arguments:
fields
: One or more input fields (FieldID
objects) to apply the transformation to.condition
: A condition (aRecordCondition
object) that must evaluate to true for the transformation to be applied. For example, apply a bucket transformation to an age column of a record only if the ZIP code column for the same record is within a specific range. Or, redact a field only if the birthdate field puts a person's age at 85 or above.- One of the following two transformation type arguments. Specifying one is
required:
infoTypeTransformations
: Treat the contents of the field as free text, and apply aPrimitiveTransformation
only to content that matches anInfoType
. These transformations were discussed earlier in this topic.primitiveTransformation
: Apply the specified primitive transformation (PrimitiveTransformation
object) to the entire field. These transformations were discussed earlier in this topic.
Field transformations example
The following example sends a
projects.content.deidentify
request with two field transformations:
The first field transformation applies to the first two columns (
column1
andcolumn2
). Because its transformation type is aprimitiveTransformation
object (specifically, aCryptoDeterministicConfig
), Sensitive Data Protection transforms the entire field.The second field transformation applies to the third column (
column3
). Because its transformation type is aninfoTypeTransformations
object, Sensitive Data Protection applies the primitive transformation (specifically, aReplaceWithInfoTypeConfig
) to only the content that matches the infoType set in the inspection configuration.
Before using any of the request data, make the following replacements:
-
PROJECT_ID
: Your Google Cloud project ID. Project IDs are alphanumeric strings, likemy-project
.
HTTP method and URL:
POST https://dlp.googleapis.com/v2/projects/PROJECT_ID/content:deidentify
Request JSON body:
{ "item": { "table": { "headers": [ { "name": "column1" }, { "name": "column2" }, { "name": "column3" } ], "rows": [ { "values": [ { "stringValue": "Example string 1" }, { "stringValue": "Example string 2" }, { "stringValue": "My email address is dani@example.org" } ] }, { "values": [ { "stringValue": "Example string 1" }, { "stringValue": "Example string 3" }, { "stringValue": "My email address is cruz@example.org" } ] } ] } }, "deidentifyConfig": { "recordTransformations": { "fieldTransformations": [ { "fields": [ { "name": "column1" }, { "name": "column2" } ], "primitiveTransformation": { "cryptoDeterministicConfig": { "cryptoKey": { "unwrapped": { "key": "YWJjZGVmZ2hpamtsbW5vcA==" } } } } }, { "fields": [ { "name": "column3" } ], "infoTypeTransformations": { "transformations": [ { "primitiveTransformation": { "replaceWithInfoTypeConfig": {} } } ] } } ] } }, "inspectConfig": { "infoTypes": [ { "name": "EMAIL_ADDRESS" } ] } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "item": { "table": { "headers": [ { "name": "column1" }, { "name": "column2" }, { "name": "column3" } ], "rows": [ { "values": [ { "stringValue": "AWttmGlln6Z2MFOMqcOzDdNJS52XFxOOZsg0ckDeZzfc" }, { "stringValue": "AUBTE+sQB6eKZ5iD3Y0Ss682zANXbijuFl9KL9ExVOTF" }, { "stringValue": "My email address is [EMAIL_ADDRESS]" } ] }, { "values": [ { "stringValue": "AWttmGlln6Z2MFOMqcOzDdNJS52XFxOOZsg0ckDeZzfc" }, { "stringValue": "AU+oD2pnqUDTLNItE8RplY3E0fTHeO4rZkX4GeFHN2CI" }, { "stringValue": "My email address is [EMAIL_ADDRESS]" } ] } ] } }, "overview": { "transformedBytes": "96", "transformationSummaries": [ { "field": { "name": "column1" }, "results": [ { "count": "2", "code": "SUCCESS" } ], "fieldTransformations": [ { "fields": [ { "name": "column1" }, { "name": "column2" } ], "primitiveTransformation": { "cryptoDeterministicConfig": { "cryptoKey": { "unwrapped": { "key": "YWJjZGVmZ2hpamtsbW5vcA==" } } } } } ], "transformedBytes": "32" }, { "field": { "name": "column2" }, "results": [ { "count": "2", "code": "SUCCESS" } ], "fieldTransformations": [ { "fields": [ { "name": "column1" }, { "name": "column2" } ], "primitiveTransformation": { "cryptoDeterministicConfig": { "cryptoKey": { "unwrapped": { "key": "YWJjZGVmZ2hpamtsbW5vcA==" } } } } } ], "transformedBytes": "32" }, { "infoType": { "name": "EMAIL_ADDRESS", "sensitivityScore": { "score": "SENSITIVITY_MODERATE" } }, "field": { "name": "column3" }, "results": [ { "count": "2", "code": "SUCCESS" } ], "fieldTransformations": [ { "fields": [ { "name": "column3" } ], "infoTypeTransformations": { "transformations": [ { "primitiveTransformation": { "replaceWithInfoTypeConfig": {} } } ] } } ], "transformedBytes": "32" } ] } }
Record suppressions
In addition to applying transformations to field data, you can also instruct Sensitive Data Protection to de-identify data by simply suppressing records when certain suppression conditions evaluate to true. You can apply both field transformations and record suppressions in the same request.
You set the recordSuppressions
message of the RecordTransformations
object to an array of one or more RecordSuppression
objects.
Each
RecordSuppression
object contains a single
RecordCondition
object, which in turn contains a single
Expressions
object.
An Expressions
object contains:
logicalOperator
: One of theLogicalOperator
enumerated types.conditions
: AConditions
object, containing an array of one or moreCondition
objects. ACondition
is a comparison of a field value and another value, both of which be of typestring
,boolean
,integer
,double
,Timestamp
, orTimeofDay
.
If the comparison evaluates to true, the record is suppressed, and vice-versa. If the compared values are not the same type, a warning is given and the condition evaluates to false.
Reversible transformations
When you de-identify data using the
CryptoReplaceFfxFpeConfig
or
CryptoDeterministicConfig
infoType transformations, you can re-identify that data, as long as you have the
CryptoKey
used to originally de-identify the data.
For more information, see Crypto-based tokenization
transformations.
Limit on the number of findings
If your request has more than 3,000 findings, Sensitive Data Protection returns the following message:
Too many findings to de-identify. Retry with a smaller request.
The list of findings that Sensitive Data Protection returns is an arbitrary subset of all findings in the request. To get all of the findings, break up your request into smaller batches.
What's next
Learn more about how a de-identification workflow fits into real-life deployments.
Work through the Redacting Sensitive Data with Sensitive Data Protection codelab.
Work through an example that demonstrates how to create a wrapped key, tokenize content, and re-identify tokenized content.
Learn more about creating a de-identified copy of data in storage.