This section provides simple examples for how to use the IF* functions for data type validation. These functions include the following:
IFNULL- For an input expression or value, this function returns the specified value if the input is a null value. See IFNULL Function.
IFMISSING- Returns the specified value if the input value or expression is a missing value. See IFMISSING Function.
IFMISMATCHED- Returns the specified value if the input value or expression is mismatched against the column's data type. See IFMISMATCHED Function.
IFVALID- Returns the specified value if the input value or expression is valid against the column's data type. See IFVALID Function.
The following simple table lists zip codes by customer identifier:
When the above is imported into the Transformer page, you notice the following:
custZipcolumn is typed as Integer.
- There are two missing and two mismatched values in the
First, you test for valid values in the
custZip column. Using the
IFVALID function, you can validate against any data type:
Fix four-digit zips: In the
derive value:IFVALID(custZip, 'Zipcode', 'ok') as:'status'
statuscolumn are instances of
okfor the top four rows. You notice that the bottom two rows contain four-digit codes.
custZip values were originally imported as Integer, any leading
0 values are dropped. In this case, you can add back the leading zero. Before the previous step, change the data type of
zip to String and insert the following:
derive value:IF(LEN(custZip)==4,'0','') as:'FourDigitZip'
derive value: merge([FourDigitZip,custZip]) as:'custZip2'
set col:zip value:custZip2
Now, when you click the last recipe step, you should see that two more rows in
statusare listed as
For the zip code with the three-digit extension, you can simply remove that extension to make it valid. Click the step above the last one. In the data grid, highlight the value. Click the Replace suggestion card. Select the option that uses the following for the matching pattern:
The above means that all three-digit extensions are dropped from the zip. You can do the same for any two- and one-digit extensions, although there are none in this sample.
Missing and null values: Now, you need to address how to handle missing and null values. The
IFMISSING tests for both missing and null values, while the
IFNULL tests just for null values. In this example, you want to delete null values, which could mean that the data for that row is malformed and to write a status of
missing for missing values.
Click above the last line in the recipe to insert the following:
set col:custZip value:IFNULL(custZip, 'xxxxx')
Now, when you click the last line of the recipe, only the null value is listed as having a status other than
set col:custZip value:IFMISSING(custZip, '00000')
ok. You can use the following to remove this row and all like it:
delete row:(status == 'xxxxx')
As an exercise, you might repeat the above steps starting with the
IFMISMATCHED function determining the value in the
derive value:IFMISMATCHED(custZip, 'Zipcode', 'mismatched') as:'status'