EXAMPLE - String Comparison Functions

The following example demonstrates functions that can be used to compare two sets of strings. These functions include the following:

  • STRINGGREATERTHAN - Evaluates to true if the first string is greater than the second string. See STRINGGREATERTHAN Function.
  • STRINGGREATERTHANEQUAL - Evaluates to true if the first string is greater than or equal to the second string. See STRINGGREATERTHANEQUAL Function.
  • STRINGLESSTHAN - Evaluates to true if the first string is less than the second string. See STRINGLESSTHAN Function.
  • STRINGLESSTHANEQUAL - Evaluates to true if the first string is less than or equal to the second string. See STRINGLESSTHANEQUAL Function.
  • EXACT - Evaluates to true if the first string is an exact match with the second string. See EXACT Function.

Source:

The following table contains some example strings to be compared.

rowIdstringAstringB
1aa
2aA
3ab
4a1
5a;
6;1
7a a
8aaa
9abcx

Note that in row #6, stringB begins with a space character.

Transform:

For each set of strings, the following functions are applied to generate a new column containing the results of the comparison.

derive type:single value: STRINGGREATERTHAN(stringA,stringB) as: 'greaterThan'

derive type:single value: STRINGGREATERTHANEQUAL(stringA,stringB) as: 'greaterThanEqual'

derive type:single value: STRINGLESSTHAN(stringA,stringB) as: 'lessThan'

derive type:single value: STRINGLESSTHANEQUAL(stringA,stringB) as: 'lessThanEqual'

derive type:single value: EXACT(stringA,stringB) as: 'exactEqual'

Results:

In the following table, the Notes column has been added manually.

rowIdstringAstringBlessThanEquallessThangreaterThanEqualgreaterThanexactEqualNotes
1aatruefalsetruefalsetrueEvaluation of differences between STRINGLESSTHAN and STRINGGREATERTHAN and greater than versions.
2aAtruetruefalsefalsefalseComparisons are case-sensitive. Uppercase letters are greater than lowercase letters.
3abtruetruefalsefalsefalse
Letters later in the alphabet (b) are greater than earlier letters (a).
4a1falsefalse
true true false
Letters (a) are greater than digits (1).
5a;falsefalsetruetruefalseLetters (a) are greater than non-alphanumerics (;).
6;1truetruefalsefalsefalse

Digits (1) are greater than non-alphanumerics (;). Therefore, the following characters are listed in order of evaluation:

Aa1;
7a afalsefalsetruetruefalseLetters (and any non-breaking character) are greater than space values.
8aaatruetruefalsefalsefalseThe second string is greater, since it contains one additional string at the end.
9abcxtruetruefalsefalsefalseThe second string is greater, since its first letter is greater than the first letter of the first string.
Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation