Choose a text embedding model

This document provides a benchmark of the performance and cost of the text embedding models available in BigQuery ML. You can use this information to help you decide which model is best for your use case.

Models

The following types of models are covered in this benchmark:

The benchmark uses the syntax described at Embed text by using the ML.GENERATE_EMBEDDING function for processing the ML.GENERATE_EMBEDDING query.

The benchmark uses the syntax described at Generate text embeddings for processing the ML.PREDICT queries.

Cost calculation

The benchmark calculates BigQuery costs based on BigQuery on-demand compute pricing (US $6.25 per TiB). The calculation doesn't account for the fact that the first 1 TiB of compute processing used per month is free.

The Vertex AI costs associated with calling the BERT model are calculated using the n1-highmem-8 prediction rate.

The Vertex AI costs associated with calling the textembedding-gecko model are calculated using the Embeddings for Text prediction rate.

For information about BigQuery ML pricing, see BigQuery ML pricing.

Benchmark data

The benchmark uses the bigquery-public-data.hacker_news.full public dataset, prepared as follows:

  • Copied the data into a test table, duplicating each row 100 times:

    CREATE OR REPLACE TABLE `mydataset.hacker_news.large` AS
      SELECT base.*
      FROM `bigquery-public-data.hacker_news.full` AS base,
      UNNEST(GENERATE_ARRAY(1, 100)) AS repeat_number;
    
  • Created additional test tables of different sizes to use in the benchmark, based on the hacker_news.large table. Test tables of the following sized were used:

    • 100,000 rows
    • 1,000,000 rows
    • 10,000,000 rows
    • 100,000,000 rows
    • 1,000,000,000 rows
    • 10,000,000,000 rows

Benchmark

The following table contains the benchmark data:

Model Embedding dimensions Number of rows Run time Total slot milliseconds Bytes processed Service used Cost in US$
SWIVEL 20 100,000 5 seconds 6,128 37 MB BigQuery 0.00022
1 million 1 minute, 1 second 97,210 341 MB 0.00203
10 million 28 seconds 1,203,838 3.21 GB 0.01959
100 million 32 seconds 11,755,909 31.9 GB 0.19470
1 billion 2 minutes, 3 seconds 135,754,696 312.35 GB 1.90643
10 billion 19 minutes, 55 seconds 1,257,462,851 3.12 TB 19.5
NNLM 50 100,000 18 seconds 66,112 227 MB BigQuery 0.00135
1 million 1 minute, 1 second 666,875 531 MB 0.00316
10 million 19 seconds 4,140,396 3.39 GB 0.02069
100 million 27 seconds 14,971,248 32.08 GB 0.19580
1 billion 8 minutes, 16 seconds 288,221,149 312.54 GB 1.90759
10 billion 19 minutes, 28 seconds 1,655,252,687 3.12 TB 19.5
BERT1 768 100,000 29 minutes, 37 seconds 2,731,868 38 MB BigQuery 0.00022
Vertex AI 8.11
1 million 5 hours, 10 seconds 28,905,706 339 MB BigQuery 0.00196
Vertex AI 9.98
Vertex AI textembedding-gecko@001 LLM2 768 100,000 14 minutes, 14 seconds 1,495,297 38 MB BigQuery 0.00022
Vertex AI 0.73
1 million 2 hours, 24 minutes 17,342,114 339 MB BigQuery 0.00196
Vertex AI 2.97

1 BigQuery query jobs are limited to 6 hours, so this model is only benchmarked for up to 1 million rows. You can use more computational resources from Vertex AI Model Garden in order to let the job process more rows within the 6 hour limit. For example, you can increase the number of accelerators.

2 BigQuery query jobs are limited to 6 hours, so this model is only benchmarked for up to 1 million rows. You can request a higher quota in order to let the job process more rows within the 6 hour limit. You can also use this set of SQL scripts or this Dataform package to iterate through inference calls beyond the 6 hour limit.