Create a clustering model with BigQuery DataFrames

Create a k-means clustering model on the lengths and sex of penguins using the BigQuery DataFrames API.

Explore further

For detailed documentation that includes this code sample, see the following:

Code sample

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.

from bigframes.ml.cluster import KMeans
import bigframes.pandas as bpd

# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)

# Create the KMeans model
cluster_model = KMeans(n_clusters=10)
cluster_model.fit(bq_df["culmen_length_mm"], bq_df["sex"])

# Predict using the model
result = cluster_model.predict(bq_df)
# Score the model
score = cluster_model.score(bq_df)

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser.