KTHLARGEST Function

Extracts the ranked value from the values in a column, where k=1 returns the maximum value. The value for k must be between 1 and 1000, inclusive.

For purposes of this calculation, two instances of the same value are treated as separate values. So, if your dataset contains three rows with column values 10, 9, and 9, then KTHLARGEST returns 9 for k=2 and k=3.

When used in a pivot transform, the function is computed for each instance of the value specified in the group parameter. See Pivot Transform.

Input column can be of Integer or Decimal type. Non-numeric data in the column is ignored. If a row contains a missing or null value, it is not factored into the calculation.

Basic Usage

pivot value:KTHLARGEST(myRating, 2) group:postal_code limit:1

Output: Generates a two-column table containing the unique values in the postal_code column and the second highest value from the myRating column for that postal_code value. The limit parameter defines the maximum number of output columns.

Syntax

pivot value:KTHLARGEST(function_col_ref, k_integer) [group:group_col_ref] [limit:limit_count]

ArgumentRequired?Data TypeDescription
function_col_refYstringName of column to which to apply the function
k_integerYinteger (positive)The ranking of the value to extract from the source column

For more information on the group and limit parameters, see Pivot Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

function_col_ref

Name of the column the values of which you want to calculate the mean. Column must contain Integer or Decimal values.

  • Literal values are not supported as inputs.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference)myValues

k_integer

Integer representing the ranking of the value to extract from the source column.

NOTE: The value for k must be an integer between 1 and 1,000 inclusive.

  • k=1 represents the maximum value in the column.
  • If k is greater than or equal to the number of values in the column, the minimum value is returned.
  • Missing and null values are not factored into the ranking of k.

Usage Notes:

Required?Data TypeExample Value
YesInteger (positive)4

Examples

This example explores how you can use aggregation functions to calculate rank of values in a column using the KTHLARGEST and KTHLARGESTUNIQUE functions.

Source:

You have a set of student test scores:

StudentScore
Anna84
Ben71
Caleb76
Danielle87
Evan85
Faith92
Gabe87
Hannah99
Ian73
Jane68

Transform:

You can use the following transforms to extract the 1st through 4th-ranked scores on the test:

derive type:single value:KTHLARGEST(Score, 1) as: '1st'

derive type:single value:KTHLARGEST(Score, 2) as: '2nd'

derive type:single value:KTHLARGEST(Score, 3) as: '3rd'

derive type:single value:KTHLARGEST(Score, 4) as: '4th'

derive type:single value:KTHLARGESTUNIQUE(Score, 3) as: '3rdUnique'

derive type:single value:KTHLARGESTUNIQUE(Score, 4) as: '4thUnique'

Results:

When you reorganize the columns, the dataset might look like the following:

StudentScore1st2nd3rd4th3rdUnique4thUnique
Anna84999287878785
Ben71999287878785
Caleb76999287878785
Danielle87999287878785
Evan85999287878785
Faith92999287878785
Gabe87999287878785
Hannah99999287878785
Ian73999287878785
Jane68999287878785

Notes:

  • The value 87 is both the third and fourth scores.
    • For the KTHLARGEST function, it is the output for the third and fourth ranking.
    • For the KTHLARGESTUNIQUE function, it is the output for the third ranking only.

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation
Need help? Visit our support page.