Track Column Changes

The Cloud Dataprep application enables you to easily move between steps in your transform recipe so that you can check the state of your dataset at any point during the transformation. In some cases, you may want to be able to track the changes made to an individual column side-by-side with the original column. This section provides a generalized approach for tracking column changes in this manner.

NOTE: Use this workflow only if it is important to monitor which values have changed in a column. For most use cases, the Transformer page provides sufficient visibility over your sample data to manage column values.

Steps:

In the following sequence, the original column is called String. For numeric columns, you can perform more detailed analysis between original and modified column values.

  1. After you have completed your general setup steps of your transform, create a copy of the original column:

    derive value:String as:'String_orig'
  2. You now have a copy of the original column before any manipulations were applied to it.
  3. Add any transforms to your recipe, including any that change the values of String. In the example below, the following transform has been applied: set col:String value:TRIM(String)

  4. At the point in your recipe where you would like to test the column for changes, insert the following:

    derive value:(String != String_orig) as:'String_changes'
  5. The String_changes column now contains true values where the values in String have been changed from their original values (String_orig).
  6. Before you run your recipe, you may want to remove the tracking columns that you generated (String_orig and String_changes in our example).


Figure: Example tracking column changes

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation
Need help? Visit our support page.