LIST Function

Extracts the set of values from a column into an array stored in a new column. This function is typically part of an aggregation.

Tip: To generate unique values for the list, apply the ARRAYUNIQUE function in the next step after this one. See ARRAYUNIQUE Function.

Input column can be of any type.

  • By default, the list is limited to 1000 values. To change the maximum number of values, specify a value for the limit parameter.
  • This function is intended to be used as part of an aggregation to return the distinct set of values by group. See Pivot Transform.

For a version of this function computed over a rolling window of rows, see ROLLINGLIST Function.

Basic Usage

pivot value:LIST(Name, 500) group:Month limit:1

Output: Generates a two-column table containing the values for Month and an array of all values (up to a count of 500) from the Name column for each Month value. The limit parameter defines the maximum number of output columns.

Syntax

pivot value:LIST(function_col_ref, [limit_int]) [group:group_col_ref] [limit:limit_count]

ArgumentRequired?Data TypeDescription
function_col_refYstringName of column to which to apply the function
limit_intNinteger (positive)Maximum number of values to extract into the list array. From 1 to 1000.

For more information on the group and limit parameters, see Pivot Transform.

For more information on syntax standards, see Language Documentation Syntax Notes.

function_col_ref

Name of the column from which to extract the list of values based on the grouping.

  • Literal values are not supported as inputs.
  • Multiple columns and wildcards are not supported.

Usage Notes:

Required?Data TypeExample Value
YesString (column reference)myValues

limit_int

Non-negative integer that defines the maximum number of values to extract into the list array.

NOTE: If specified, this value must between 1 and 1000, inclusive.

NOTE: Do not use the limiting argument in a LIST function call on a flat aggregate, in which all values in a column have been inserted into a single cell. In this case, you might be able to use the limit argument if you also specify a group parameter. Misuse of the LIST function can cause the application to crash.

Usage Notes:

Required?Data TypeExample Value
NoInteger50

Examples

Example - Colors sold this month

You have the following set of orders for two months, and you are interested in identifying the set of colors that have been sold for each product for each month and the total quantity of product sold for each month.

Source:

OrderIdDateItemQtyColor
10011/15/15Pants1red
10021/15/15Shirt2green
10031/15/15Hat3blue
10041/16/15Shirt4yellow
10051/16/15Hat5red
10061/20/15Pants6green
10071/15/15Hat7blue
10084/15/15Shirt8yellow
10094/15/15Shoes9brown
10104/16/15Pants1red
10114/16/15Hat2green
10124/16/15Shirt3blue
10134/20/15Shoes4black
10144/20/15Hat5blue
10154/20/15Pants6black

Transform:

To track by month, you need a column containing the month value extracted from the date:

set col:Date value:DATEFORMAT(Date, 'MMM yyyy')

You can aggregate the data in your dataset, grouped by the reformatted Date values, and apply the LIST function to the Color column. In the same aggregation, you can include a summation function for the Qty column:

pivot value: LIST(Color, 1000) SUM(Qty) group: Date limit:1

Results:

Datelist_Colorsum_Qty
Jan 2015["green","blue","blue","red","green","red","yellow"]28
Apr 2015["brown","blue","red","yellow","black","blue","black","green"] 38

If needed, you can unpack the list array data using the following:

unnest col:list_Color

Was this page helpful? Let us know how we did:

Send feedback about...

Google Cloud Dataprep Documentation