KTHLARGEST Function

Extracts the ranked value from the values in a column, where k=1 returns the maximum value. The value for k must be between 1 and 1000, inclusive. For purposes of this calculation, two instances of the same value are treated as separate values. So, if your dataset contains three rows with column values 10, 9, and 9, then KTHLARGEST returns 9 for k=2 and k=3.

When used in an aggregate transform, the function is computed for each instance of the value specified in the group parameter. See Aggregate Transform.

Input column can be of Integer or Decimal type. Non-numeric data in the column is ignored. If a row contains a missing or null value, it is not factored into the calculation.

Basic Usage

aggregate value:KTHLARGEST(myRating, 2) group:postal_code

Output: Generates a two-column table containing the unique values in the postal_code column and the second highest value from the myRating column for that postal_code value.

Syntax

aggregate value:KTHLARGEST(function_col_ref, k_integer) [group:group_col_ref]

ArgumentRequired?Data TypeDescription
function_col_refYstringName of column to which to apply the function
k_integerYinteger (positive)The ranking of the value to extract from the source column

For more information on the group parameter, see Aggregate Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

function_col_ref

Name of the column the values of which you want to calculate the mean. Column must contain Integer or Decimal values.

  • Literal values are not supported as inputs.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference)myValues

k_integer

Integer representing the ranking of the value to extract from the source column.

NOTE: The value for k must be an integer between 1 and 1,000 inclusive.

  • k=1 represents the maximum value in the column.
  • If k is greater than or equal to the number of values in the column, the minimum value is returned.
  • Missing and null values are not factored into the ranking of k.

Usage Notes:

Required?Data TypeExample Value
YesInteger (positive)4

Examples

This example explores how you can use aggregation functions to calculate rank of values in a column using the KTHLARGEST function.

Source:

You have a set of student test scores:

StudentScore
Anna84
Ben71
Caleb76
Danielle87
Evan85
Faith92
Gabe87
Hannah99
Ian73
Jane68

Transform:

You can use the following transforms to extract the 1st through 4th-ranked scores on the test:

derive value:KTHLARGEST(Score, 1) as: '1st'

derive value:KTHLARGEST(Score, 2) as: '2nd'

derive value:KTHLARGEST(Score, 3) as: '3rd'

derive value:KTHLARGEST(Score, 4) as: '4th'

Results:

When you reorganize the columns, the dataset might look like the following:

StudentScore1st2nd3rd4th
Anna8499928787
Ben7199928787
Caleb7699928787
Danielle8799928787
Evan8599928787
Faith9299928787
Gabe8799928787
Hannah9999928787
Ian7399928787
Jane6899928787

Notes:

  • Since the value 87 is both the third and fourth scores, it is listed twice as the result of separate transform steps.

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation