SUFFIX Function

Finds the suffix value after the domain from a valid URL. Input values must be of URL or String type.This function is part of a set of functions for processing URL data.

Basic Usage

URL literal examples:

derive value: SUFFIX('http://www.example.com' ) as: 'myDomain'

Output: Generates a column containing the value com.

derive value: SUFFIX('http://www.exampl.e.com' ) as: 'myDomain'

Output: Generates a column containing the value com.

Column reference example:

derive value: SUFFIX(myURLs) as: 'myDomain'

Output: Generates the new myDomain column containing the suffix values extracted from the myURLs column.

Syntax

derive value: SUFFIX(column_url)

ArgumentRequired?Data TypeDescription
column_urlYstringName of column or String or URL literal containing the suffix value to extract

For more information on syntax standards, see Language Documentation Syntax Notes.

column_url

Name of the column or URL or String literal whose values are used to extract the suffix value.

  • Missing input values generate missing results.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString literal or column reference (URL)http://www.example.com

Examples

Example - Domain, Subdomain, and Suffix functions

This examples illustrates how you can extract component parts of a URL using the following functions:

  • DOMAIN - extracts the domain value from a URL. See DOMAIN Function.
  • SUBDOMAIN - extracts the first group after the protocol identifier and before the domain value. See SUBDOMAIN Function.
  • SUFFIX - extracts the suffix of a URL. See SUFFIX Function.

Source:

Your dataset includes the following values for URLs:

URL
www.example.com
example.com/support
http://www.example.com/products/
http://1.2.3.4
https://www.example.com/free-download
https://www.example.com/about-us/careers
www.app.example.com
www.some.app.example.com
some.app.example.com
some.example.com
example.com

Transform:

When the above data is imported into the application, the column is recognized as a URL. All values are registered as valid, even the IPv4 address.

To extract the domain and subdomain values:

derive value: DOMAIN(URL) as: 'domain_URL'

derive value: SUBDOMAIN(URL) as:'subdomain_URL'

derive value: SUFFIX(URL) as:'suffix_URL'

You can use the Cloud Dataprep pattern in the following transform to extract protocol identifiers, if present, into a new column:

extract col:URL on:`{start}%*://`

To clean this up, you might want to rename the column to protocol_URL.

To extract the path values, you can use the following regular expression:

NOTE: Regular expressions are considered a developer-level method for pattern matching. Please use them with caution. See Text Matching.

extract col: URL on: /[^*:\/\/]\/.*$/

The above transform grabs a little too much of the URL. If you rename the column to path_URL, you can use the following regular expression to clean it up:

extract col:path_URL on:/[!^\/].*$/


If you drop the path_URL column and rename the path_URL1 column to the dropped one, you should have the following results:

Results:

URLpath_URLprotocol_URLsubdomain_URLdomain_URLsuffix_URL
www.example.com wwwexamplecom
example.com/support/support examplecom
http://www.example.com/products//products/http://wwwexamplecom
http://1.2.3.4 http://
https://www.example.com/free-download/free-downloadhttps://wwwexamplecom
https://www.example.com/about-us/careers/about-us/careershttps://wwwexamplecom
www.app.example.com www.appexamplecom
www.some.app.example.com www.some.appexamplecom
some.app.example.com some.appexamplecom
some.example.com someexamplecom
example.com examplecom

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation