Date shifting

Dates are a very common type of data. In cases where dates can be considered sensitive data or personally identifiable information (PII), you may need to generalize, obfuscate, or redact them.

One method for doing this is generalization, or bucketing. Depending on the use case and configuration, though, bucketing can remove the utility in the dates. For example, if you generalize all dates to just a year, then you could lose the order in which events happen within that year. An alternate method for obfuscating dates that addresses this problem is date shifting.

Date shifting techniques randomly shift a set of dates but preserve the sequence and duration of a period of time. Shifting dates is usually done in context to an individual or an entity. That is, each individual's dates are shifted by an amount of time that is unique to that individual.

Date shifting example

Consider the following data:

user_id date action
1 2009-06-09 run
1 2009-06-03 walk
1 2009-05-23 crawl
2 2010-11-03 crawl
2 2010-11-22 walk
... ... ...

If you generalize these dates to year, then you get:

user_id date_year action
1 2009 run
1 2009 walk
1 2009 crawl
2 2010 crawl
2 2010 walk
... ... ...

But now you've lost any sense of the sequence per user.

Instead try date shifting:

user_id date action
1 2009-07-17 run
1 2009-07-11 walk
1 2009-06-30 crawl
2 2011-01-26 crawl
2 2011-02-14 walk
... ... ...

Note how the dates are different but the sequence and duration are preserved. The magnitude that the dates were shifted was different between user_ids 1 and 2.

Date shifting in Cloud DLP

A JSON object to configure this for Cloud DLP's content.deidentify method follows:

deidentify_config {
  record_transformations {
    field_transformations {
      fields {
        name: "date"
      }
      primitive_transformation {
        date_shift_config {
          upper_bound_days: 100
          lower_bound_days: -100
          entity_field_id {
            name: "user_id"
          }
          crypto_key {
            unwrapped {
              key: "123456789012345678901234567890ab"
            }
          }
        }
      }
    }
  }
}

The upper and lower bounds of the shift are specified by the upper_bound_days and lower_bound_days values, respectively. The context or scope that that shift will apply to is based on the entity_id_field value, which in this case is "user_id".

Note the use of a crypto_key as well. This is similar to how it's used in pseudonymization. The key will allow you to keep integrity of these date shifts across multiple requests or data runs.

Resources

For more information about how to de-identify data using date shifting and other methods in Cloud DLP, see:

For API reference information about primitive transformations in Cloud DLP, see:

Was this page helpful? Let us know how we did:

Send feedback about...

Data Loss Prevention Documentation