Choose a text embedding model
This document provides a benchmark of the performance and cost of the text embedding models available in BigQuery ML. You can use this information to help you decide which model is best for your use case.
Models
The following types of models are covered in this benchmark:
- A remote model
that targets the
Vertex AI
textembedding-gecko@001
foundation model. This model works with theML.GENERATE_EMBEDDING
function to generate embeddings. A remote model that targets a BERT model deployed as a Vertex AI endpoint. The BERT model is configured as described in the Vertex AI Model Garden:
- Machine type:
n1-highmem-8
- Accelerator type:
NVIDIA_TESLA_T4
- Accelerator count:
1
This model works with the
ML.PREDICT
function to generate embeddings.- Machine type:
Imported TensorFlow models that implement NNLM and SWIVEL models. These models work with the
ML.PREDICT
function to generate embeddings.
The benchmark uses the syntax described at
Embed text by using the ML.GENERATE_EMBEDDING
function
for processing the ML.GENERATE_EMBEDDING
query.
The benchmark uses the syntax described at
Generate text embeddings
for processing the ML.PREDICT
queries.
Cost calculation
The benchmark calculates BigQuery costs based on BigQuery on-demand compute pricing (US $6.25 per TiB). The calculation doesn't account for the fact that the first 1 TiB of compute processing used per month is free.
The Vertex AI costs associated with calling the BERT model
are calculated using the n1-highmem-8
prediction rate.
The Vertex AI costs associated with calling the
textembedding-gecko
model
are calculated using the Embeddings for Text
prediction rate.
For information about BigQuery ML pricing, see BigQuery ML pricing.
Benchmark data
The benchmark uses the bigquery-public-data.hacker_news.full
public dataset,
prepared as follows:
Copied the data into a test table, duplicating each row 100 times:
CREATE OR REPLACE TABLE `mydataset.hacker_news.large` AS SELECT base.* FROM `bigquery-public-data.hacker_news.full` AS base, UNNEST(GENERATE_ARRAY(1, 100)) AS repeat_number;
Created additional test tables of different sizes to use in the benchmark, based on the
hacker_news.large
table. Test tables of the following sized were used:- 100,000 rows
- 1,000,000 rows
- 10,000,000 rows
- 100,000,000 rows
- 1,000,000,000 rows
- 10,000,000,000 rows
Benchmark
The following table contains the benchmark data:
Model | Embedding dimensions | Number of rows | Run time | Total slot milliseconds | Bytes processed | Service used | Cost in US$ |
---|---|---|---|---|---|---|---|
SWIVEL | 20 | 100,000 | 5 seconds | 6,128 | 37 MB | BigQuery | 0.00022 |
1 million | 1 minute, 1 second | 97,210 | 341 MB | 0.00203 | |||
10 million | 28 seconds | 1,203,838 | 3.21 GB | 0.01959 | |||
100 million | 32 seconds | 11,755,909 | 31.9 GB | 0.19470 | |||
1 billion | 2 minutes, 3 seconds | 135,754,696 | 312.35 GB | 1.90643 | |||
10 billion | 19 minutes, 55 seconds | 1,257,462,851 | 3.12 TB | 19.5 | |||
NNLM | 50 | 100,000 | 18 seconds | 66,112 | 227 MB | BigQuery | 0.00135 |
1 million | 1 minute, 1 second | 666,875 | 531 MB | 0.00316 | |||
10 million | 19 seconds | 4,140,396 | 3.39 GB | 0.02069 | |||
100 million | 27 seconds | 14,971,248 | 32.08 GB | 0.19580 | |||
1 billion | 8 minutes, 16 seconds | 288,221,149 | 312.54 GB | 1.90759 | |||
10 billion | 19 minutes, 28 seconds | 1,655,252,687 | 3.12 TB | 19.5 | |||
BERT1 | 768 | 100,000 | 29 minutes, 37 seconds | 2,731,868 | 38 MB | BigQuery | 0.00022 |
Vertex AI | 8.11 | ||||||
1 million | 5 hours, 10 seconds | 28,905,706 | 339 MB | BigQuery | 0.00196 | ||
Vertex AI | 9.98 | ||||||
Vertex AI textembedding-gecko@001 LLM2 |
768 | 100,000 | 14 minutes, 14 seconds | 1,495,297 | 38 MB | BigQuery | 0.00022 |
Vertex AI | 0.73 | ||||||
1 million | 2 hours, 24 minutes | 17,342,114 | 339 MB | BigQuery | 0.00196 | ||
Vertex AI | 2.97 |
1 BigQuery query jobs are limited to 6 hours, so this model is only benchmarked for up to 1 million rows. You can use more computational resources from Vertex AI Model Garden in order to let the job process more rows within the 6 hour limit. For example, you can increase the number of accelerators.
2 BigQuery query jobs are limited to 6 hours, so this
model is only benchmarked for up to 1 million rows. You can request a higher quota in order to let the job process more rows within the 6 hour limit. You can also use this set of SQL scripts or this Dataform package to iterate through inference calls beyond the 6 hour limit.