Date shifting

Dates are a very common and useful type of data. However, in some cases, dates need to be generalized, obfuscated, or redacted. One method for doing this is generalization, or bucketing. Another is date shifting. Depending on the use case and configuration, though, bucketing can remove the utility in the dates. For example, if you generalize all dates to just a year, then you could lose the order in which events happen. An alternate method for obfuscating dates that addresses this problem is date shifting.

Date shifting techniques randomly shift a set of dates but preserve the sequence and duration of a period of time. Shifting dates is usually done in context to an individual or an entity. That is, you want to shift all of the dates for a specific individual using the same shift differential, but use a separate shift differential for each other individual.

Date shifting example

Consider the following data:

user_id date action
1 2009-06-09 run
1 2009-06-03 walk
1 2009-05-23 crawl
2 2010-11-03 crawl
2 2010-11-22 walk
... ... ...

If we generalized these dates to year, then we would get:

user_id date_year action
1 2009 run
1 2009 walk
1 2009 crawl
2 2010 crawl
2 2010 walk
... ... ...

Now we have lost any sense of the sequence per user.

Instead we'll try date shifting:

user_id date action
1 2009-07-17 run
1 2009-07-11 walk
1 2009-06-30 crawl
2 2011-01-26 crawl
2 2011-02-14 walk
... ... ...

Note how the dates are different but the sequence and duration are preserved. The magnitude that the dates were shifted was different between user_id 1 and 2.

Date shifting in Cloud DLP

The configuration to do this using the Cloud DLP API would look something like the following:

deidentify_config {
  record_transformations {
    field_transformations {
      fields {
        name: "date"
      }
      primitive_transformation {
        date_shift_config {
          upper_bound_days: 100
          lower_bound_days: -100
          entity_field_id {
            name: "id"
          }
          crypto_key {
            unwrapped {
              key: "123456789012345678901234567890ab"
            }
          }
        }
      }
    }
  }
}

The upper and lower bounds of the shift are specified by the upper_bound_days and lower_bound_days values, respectively. The context or scope that that shift will apply to is based on the entity_id_field value, which in this case is "user_id".

Note the use of a crypto_key, as well. This is similar to how it's used in pseudonymization. The key will allow you to keep integrity of these date shifts across multiple requests or data runs

Resources

For more information about how to de-identify data using date shifting and other methods in Cloud DLP, see:

For API reference information about primitive transformations in the Cloud DLP API, see:

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Data Loss Prevention