Explode data from fields

This page explains how to separate data from a field (a cell) into multiple rows when you prepare data in the Wrangler workspace of the Cloud Data Fusion Studio.

Separate delimited text

You can separate the values from a cell into new rows if the values are separated by the following delimiters:

  • Comma
  • Tab
  • Pipe
  • Whitespace
  • Custom separator

If a cell doesn't contain the chosen delimiter, no new row is inserted.

To split values based on a delimiter, follow these steps:

  1. Go to Wrangler workspace in Cloud Data Fusion.
  2. On the Data tab, go to a column name and click the arrow_drop_down expander arrow.
  3. Click Explode > Delimited text.
  4. Choose a delimiter—for example Pipe.
  5. Click Extract.

Wrangler splits the fields based on the selected delimiter and adds the split-to-row directive to the recipe. When you run the data pipeline, Cloud Data Fusion applies the transformation to all values in the column.

In this example, a dataset has a column of string values containing the comma delimiter:

ID Name
1 Lee,Lucian,Luka
2 Mahan,Noam

To divide the value into separate rows, Wrangler deletes the original column and creates a new column with one row for each value. The other column values from the original row are copied into the new rows:

ID Name_1
1 Lee
1 Lucian
1 Luka
2 Mahan
2 Noam

Separate arrays

The flatten directive separates items in arrays, such as ["ELEMENT_1", "ELEMENT_2", "ELEMENT_3"], into new rows. The other column values from the original record are copied into the new records.

What's next