The ML.CENTROIDS function
This document describes the ML.CENTROIDS
function, which lets you return
information about the
centroids
in a
k-means model.
Syntax
ML.CENTROIDS( MODEL `project_id.dataset.model` STRUCT([, standardize AS standardize]))
Arguments
ML.CENTROIDS
takes the following arguments:
project_id
: Your project ID.dataset
: The BigQuery dataset that contains the model.model
: The name of the model.standardize
: aBOOL
value that specifies whether the centroid features should be standardized to assume that all features have a mean of0
and a standard deviation of1
. Standardizing the features allows the absolute magnitude of the values to be compared to each other. The default value isFALSE
.
Output
ML.CENTROIDS
returns the following columns:
trial_id
: anINT64
value that contains the hyperparameter tuning trial ID. This column is only returned if you ran hyperparameter tuning when creating the model.centroid_id
: anINT64
value that contains the centroid ID.feature
: aSTRING
value that contains the feature column name.numerical_value
: aFLOAT64
value that contains the feature value for the centroid thatcentroid_id
identifies if the column identified by thefeature
value is numeric. Otherwise,numerical_value
isNULL
.categorical_value
: anARRAY<STRUCT>
value that contains information about categorical features. Each struct contains the following fields:categorical_value.category
: aSTRING
value that contains the name of each category.categorical_value.value
: aSTRING
value that contains the value ofcategorical_value.category
for the centroid thatcentroid_id
identifies.
geography_value
: aSTRING
value that contains thecategorical_value.category
value for the centroid thatcentroid_id
identifies if the column identified by thefeature
value is of typeGEOGRAPHY
. Otherwise,geography_value
value isNULL
.
The output contains one row per feature per centroid.
Examples
The following examples show how to use ML.CENTROIDS
with and without the
standardize
argument.
Without standardization
Numerical features
The following example retrieves centroid information from the model
mydataset.my_kmeans_model
in your default project. This model only
contains numerical features.
SELECT * FROM ML.CENTROIDS(MODEL `mydataset.my_kmeans_model`)
This query returns results like the following:
+-------------+-------------------+----------------------+---------------------+ | centroid_id | feature | numerical_value | categorical_value | +-------------+-------------------+----------------------+---------------------+ | 3 | x_coordinate | 3095929.0 | [] | | 3 | y_coordinate | 1.0089726307692308E7 | [] | | 2 | x_coordinate | 3117072.65625 | [] | | 2 | y_coordinate | 1.0083220745833334E7 | [] | | 1 | x_coordinate | 3259947.096227731 | [] | | 1 | y_coordinate | 1.0105690227895036E7 | [] | | 4 | x_coordinate | 3109887.9056603773 | [] | | 4 | y_coordinate | 1.0057112358490566E7 | [] | +-------------+-------------------+----------------------+---------------------+
Categorical features
The following example retrieves centroid information from the model
mydataset.my_kmeans_model
in your default project. This model contains
categorical features.
SELECT * FROM ML.CENTROIDS(MODEL `mydataset.my_kmeans_model`) ORDER BY centroid_id;
This query returns results like the following:
+-------------+-------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | centroid_id | feature |numerical_value| categorical_value | +-------------+-------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | department | NULL | [{"category":"Medieval Art","feature_value":"1.0"}] | | 1 | medium | NULL | [{"category":"Iron","feature_value":"0.21602160216021601"},{"category":"Glass, ceramic","feature_value":"0.3933393339333933"},{"category":"Copper alloy","feature_value":"0.39063906390639064"}] | | 2 | medium | NULL | [{"category":"Wood, gesso, paint","feature_value":"0.15"},{"category":"Carnelian","feature_value":"0.2692307692307692"},{"category":"Papyrus, ink","feature_value":"0.2653846153846154"},{"category":"Steatite, glazed","feature_value":"0.3153846153846154"}] | | 2 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] | | 3 | medium | NULL | [{"category":"Faience","feature_value":"1.0"}] | | 3 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] | | 4 | medium | NULL | [{"category":"Steatite","feature_value":"1.0"}] | | 4 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] | | 5 | medium | NULL | [{"category":"Red quartzite","feature_value":"0.20316027088036118"},{"category":"Bronze or copper alloy","feature_value":"0.3476297968397291"},{"category":"Gold","feature_value":"0.4492099322799097"}] | | 5 | department | NULL | [{"category":"Egyptian Art","feature_value":"1.0"}] | +-------------+-------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Numerical and categorical features
The following are the results from the same query against a k-means model with both numerical and categorical features.
+-------------+--------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | centroid_id | feature | numerical_value | categorical_value | +-------------+--------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | start_station_name | NULL | [{"category":"Toomey Rd @ South Lamar","value":"0.5714285714285714"},{"category":"State Capitol @ 14th & Colorado","value":"0.42857142857142855"}] | | 1 | duration_minutes | 9.142857142857142 | [] | | 2 | duration_minutes | 9.0 | [] | | 2 | start_station_name | NULL | [{"category":"Rainey @ River St","value":"0.14285714285714285"},{"category":"11th & San Jacinto","value":"0.42857142857142855"},{"category":"ACC - West & 12th Street","value":"0.14285714285714285"},{"category":"East 11th St. at Victory Grill","value":"0.2857142857142857"}] | +-------------+--------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
With standardization
The following example retrieves centroid information from the model
mydataset.my_kmeans_model
in your default project. The query in this example
assumes that all features have a mean of 0
and a standard deviation of 1
.
SELECT * FROM ML.CENTROIDS(MODEL `mydataset.my_kmeans_model`, STRUCT(TRUE AS standardize))
What's next
- For information about model weights support in BigQuery ML, see BigQuery ML model weights overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.