SUBSTITUTE Function

Replaces found string literal or pattern from a source with a string.

Input can be specified as a column reference, a function returning a string, or a string literal, although string literal usage is rare.

  • A column reference can refer to a column of String type.
  • If no match is found, the function returns the source string.
  • If multiple matches are found in a single string, all replacements are made.

Basic Usage

Column reference example:

set col:myURL value:SUBSTITUTE(myURL,`{ip-address}`,myDomain)

Output: Searches the myURL column values for sub-strings that match valid IP addresses. Where matches are found, they are replaced with the corresponding value in the myDomain column.

Function reference example:

set col:companyName value:SUBSTITUTE(UPPER(companyName),'ACME','New ACME')

Output: Searches the uppercase version of values from the companyName column for the string literal ACME. When found, these matches are replaced by New ACME in the companyName column.

Syntax

derive value:SUBSTITUTE(string_source,string_pattern,replacement_string)

ArgumentRequired?Data TypeDescription
string_sourceYstringName of the column, a function returning a string, or string literal to be applied to the function
string_patternYstringString literal or pattern to find
string_replacementYstringString literal to use as replacement
pattern_beforeNstringString literal or pattern to find before finding the string_pattern value.
pattern_afterNstringString literal or pattern to find after finding the string_pattern value.

For more information on syntax standards, see Language Documentation Syntax Notes.

string_source

Name of the item to be searched. Valid values can be:

  • String literals must be quoted ( 'Hello, World' ).
  • Column reference to any type that can be inferred as a string, which encompasses all values
  • Functions that return string values

Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or column reference (String, Array, or Object)myColumn

string_pattern

String literal or pattern to find. This value can be a string literal, a Cloud Dataprep pattern, or a regular expression.

  • String literals must be quoted ('Hello, World').
  • Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or pattern'Hello'

string_replacement

Value with which to replacement any matched patterns. Value can be a string, a function returning string values, or a column reference containing strings.

  • String literals must be quoted ('Hello, World').
  • column reference to any type that can be inferred as a string, which encompasses all values.

Multiple values and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or column reference (String, Array, or Object)'##REDACTED##'

pattern_before

String literal or pattern to find in a position before the pattern to match.

Tip: Use this parameter if there are potentially multiple instances of the pattern to match in the source.

Usage Notes:

Required?Data TypeExample Value
NoString literal or pattern`{digit}{3}`

pattern_after

String literal or pattern to find in a position after the pattern to match.

Tip: Use this parameter if there are potentially multiple instances of the pattern to match in the source.

Usage Notes:

Required?Data TypeExample Value
NoString literal or pattern' '

Examples

Example - Partial obfuscation of credit card numbers

Source:

Suppose you have the following transactional data, which contains customer credit card numbers.

TransactionIdCreditCardNumAmtDollars
T0014111-1111-1111-1111100.29
T0025500-0000-0000-0004510.21
T0033400-0000-0000-009162.13
T0043000-0000-0000-04294.12

For security purposes, you wish to redact the first three sets of digits, so only the last set of digits appears.

Transform:

To make the substitution, you must first change the type of the column to be a string:

settype col: CreditCardNum type: 'String'

You can then use the following transform to perform the pattern-based replacement of four-digit sets that end in a dash with XXXX:

set col: CreditCardNum value: substitute(CreditCardNum, `{digit}+\-`, 'XXXX-')

To indicate that the column no longer contains valid information, you might choose to rename it like in the following:

rename mapping: [CreditCardNum,'CreditCardNumOBSCURED']

Results:

TransactionIdCreditCardNumOBSCUREDAmtDollars
T001XXXX-XXXX-XXXX-1111100.29
T002XXXX-XXXX-XXXX-0004510.21
T003XXXX-XXXX-XXXX-009162.13
T004XXXX-XXXX-XXXX-04294.12

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation