REMOVESYMBOLS Function

Removes from a string all characters that are not alpha-numeric, accented Latin characters, or whitespace.

NOTE: Non-Latin letters are also removed.

Tip: This function also removes common punctuation, such as the following:

. , ! & ?

To preserve these characters, you might replace them with an alphanumeric text string. For example, the question mark might be replaced by:

zzQUESTIONMARKzz

After the function has been applied, you can replace these strings with the original values.

Basic Usage

Column reference example:

derive value:REMOVESYMBOLS(MyName)

Output: The value of the MyName column value with all non-alphanumeric characters removed.

String literal example:

derive value:REMOVESYMBOLS('Héllõ, Wõrlds!?!?')

Output: The string Héllõ Wõrlds is written to the new column.

Wildcard example:

set col:* value:REMOVESYMBOLS($col)

Output: Strips all non-alphanumeric or space characters from all columns in the dataset.

Syntax

derive value:REMOVESYMBOLS(column_string)

ArgumentRequired?Data TypeDescription
column_stringYstringName of the column or string literal to be applied to the function

For more information on syntax standards, see Language Documentation Syntax Notes.

column_string

Name of the column or string constant to be trimmed of symbols.

  • Missing string or column values generate missing string results.
  • String constants must be quoted ('Hello, World').
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or column referencemyColumn

Examples

Example - String cleanup functions together

The following example demonstrates functions that can be used to clean up strings. These functions include the following:

Source:

In the following (space) and (tab) indicate space keys and tabs, respectively. Carriage return and newline characters are also supported by whitespace functions.

Stringssource
String01this source(space)(space)
String02(tab)(tab)this source
String03(tab)(tab)this source(space)(space)
String04this source's?
String05Why, you @#$%^&*()!
String06this söurce
String07(space)this söurce
String08à mañana

Transform:

The following transforms generate new columns using each of the string cleanup functions:

derive value: TRIM(source) as: 'trim_source'

derive value: REMOVEWHITESPACE(source) as: 'removewhitespace_source'

derive value: REMOVESYMBOLS(source) as: 'removesymbols_source'

Results:

Stringssourceremovesymbols_sourceremovewhitespace_sourcetrim_source
String01this source(space)(space) this source(space)(space) thissourcethis source
String02(tab)(tab)this source(tab)(tab)this sourcethissourcethis source
String03(tab)(tab)this source(space)(space)(tab)(tab)this source(space)(space)thissourcethis source
String04this source's?this sourcesthissource's?this source's?
String05Why, you @#$%^&*()!Why you Why,you@#$%^&*()!Why, you @#$%^&*()!
String06this söurcethis surcethissöurcethis söurce
String07(space)this söurce(space)this surcethissöurcethis söurce
String08à mañana maanaà ma ñanaà ma ñana

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation