Your dataset contains master product data with product properties stored in two arrays of keys and values.

S001ShirtsCrew Neck T-Shirt["type", "color", "fabric", "sizes"]["crew","blue","cotton","S,M,L","in stock","padded"]
S002ShirtsV-Neck T-Shirt["type", "color", "fabric", "sizes"]["v-neck","white","blend","S,M,L,XL","in stock","discount - seasonal"]
S003ShirtsTanktop["type", "color", "fabric", "sizes"]["tank","red","mesh","XS,S,M","discount - clearance","in stock"]
S004ShirtsTurtleneck["type", "color", "fabric", "sizes"]["turtle","black","cotton","M,L,XL","out of stock","padded"]


When the above data is loaded into the Transformer page, you might need to clean up the two array columns.

Using the following transform, you can map the first element of the first array as a key for the first element of the second, which is its value. You might notice that the number of keys and the number of values are not consistent. For the extra elements in the second array, the default key of ProdMiscProperties is used:

derive value: ARRAYSTOMAP(ProdProperties, ProdValues, 'ProdMiscProperties') as: 'prodPropertyMap'

You can now use the following steps to generate a new version of the keys:

drop col:ProdKeys

derive value:KEYS(prodPropertyMap) as:'ProdKeys'


S001ShirtsCrew Neck T-Shirt["type", "color", "fabric", "sizes","ProdMiscProperties"]["crew","blue","cotton","S,M,L","in stock","padded"]
  "type": [ "crew" ],
  "color": [ "blue" ],
  "fabric": [ "cotton" ],
  "sizes": [ "S,M,L" ],
  "ProdMiscProperties": [ "in stock", "padded" ] }
S002ShirtsV-Neck T-Shirt["type", "color", "fabric", "sizes","ProdMiscProperties"]["v-neck","white","blend","S,M,L,XL","in stock","discount - seasonal"]
  "type": [ "v-neck" ],
  "color": [ "white" ],
  "fabric": [ "blend" ],
  "sizes": [ "S,M,L,XL" ],
  "ProdMiscProperties": [ "in stock", "discount - seasonal" ] }
S003ShirtsTanktop["type", "color", "fabric", "sizes","ProdMiscProperties"]["tank","red","mesh","XS,S,M","discount - clearance","in stock"]
  "type": [ "tank" ],
  "color": [ "red" ],
  "fabric": [ "mesh" ],
  "sizes": [ "XS,S,M" ],
  "ProdMiscProperties": [ "discount - clearance", "in stock" ] }
S004ShirtsTurtleneck["type", "color", "fabric", "sizes","ProdMiscProperties"]["turtle","black","cotton","M,L,XL","out of stock","padded"]
  "type": [ "turtle" ],
  "color": [ "black" ],
  "fabric": [ "cotton" ],
  "sizes": [ "M,L,XL" ],
  "ProdMiscProperties": [ "out of stock", "padded" ] }

Send feedback about...

Google Cloud Dataprep Documentation