本頁面由 Cloud Translation API 翻譯而成。

使用預先訓練的 TensorFlow 模型嵌入文字

本教學課程說明如何使用預先訓練的 TensorFlow 模型，在 BigQuery 中生成 NNLM、SWIVEL 和 BERT 文字嵌入。文字嵌入是文字片段的稠密向量表示法，如果兩個文字片段在語意上相似，則其各自的嵌入在嵌入向量空間中會很接近。

NNLM、SWIVEL 和 BERT 模型

NNLM、SWIVEL 和 BERT 模型的大小、準確度、可擴充性和成本各不相同。請參閱下表，判斷要使用哪個模型：

模型	模型大小	嵌入項目維度	用途	說明
NNLM	<150MB	50	短句、新聞、推文、評論	類神經網路語言模型
SWIVEL	<150MB	20	短句、新聞、推文、評論	子矩陣式向量嵌入學習器
BERT	~200MB	768	短句、新聞、推文、評論、短段落	基於變換器的雙向編碼器表示技術

在本教學課程中，NNLM 和 SWIVEL 模型是匯入的 TensorFlow 模型，而 BERT 模型則是 Vertex AI 上的遠端模型。

所需權限

如要建立資料集，您需要 bigquery.datasets.create 身分與存取權管理 (IAM) 權限。
如要建立值區，您需要 storage.buckets.create IAM 權限。
如要將模型上傳至 Cloud Storage，您需要 storage.objects.create 和 storage.objects.get IAM 權限。
如要建立連線資源，您需要下列 IAM 權限：
- bigquery.connections.create
- bigquery.connections.get
如要將模型載入 BigQuery ML，您需要下列身分與存取權管理權限：
- bigquery.jobs.create
- bigquery.models.create
- bigquery.models.getData
- bigquery.models.updateData
如要執行推論，您必須具備下列 IAM 權限：
- bigquery.tables.getData 物件資料表
- 模型上的 bigquery.models.getData
- bigquery.jobs.create

費用

在本文件中，您會使用 Google Cloud的下列計費元件：

BigQuery: You incur costs for the queries that you run in BigQuery.
BigQuery ML: You incur costs for the model that you create and the inference that you perform in BigQuery ML.
Cloud Storage: You incur costs for the objects that you store in Cloud Storage.
Vertex AI: If you follow the instructions for generating the BERT model, then you incur costs for deploying the model to an endpoint.

如要根據預測用量估算費用，請使用 Pricing Calculator。

初次使用 Google Cloud 的使用者可能符合免費試用資格。

詳情請參閱下列資源：

事前準備

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

Enable the APIs

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.

Enable the APIs

建立資料集

如要建立名為 tf_models_tutorial 的資料集來儲存您建立的模型，請選取下列其中一個選項：

SQL

使用 CREATE SCHEMA 陳述式：

前往 Google Cloud 控制台的「BigQuery」頁面。

前往「BigQuery」
在查詢編輯器中輸入下列陳述式：
```
CREATE SCHEMA `PROJECT_ID.tf_models_tutorial`;
```
將 PROJECT_ID 替換為您的專案 ID。
按一下「執行」。

如要進一步瞭解如何執行查詢，請參閱「執行互動式查詢」。

bq

在 Google Cloud 控制台中，啟動 Cloud Shell。

啟用 Cloud Shell
如要建立資料集，請執行 bq mk 指令：
```
bq mk --dataset --location=us PROJECT_ID:tf_models_tutorial
```
將 PROJECT_ID 替換為您的專案 ID。

生成模型並上傳至 Cloud Storage

如需使用預先訓練的 TensorFlow 模型產生文字嵌入的詳細操作說明，請參閱 Colab 筆記本。否則請選取下列其中一個模型：

NNLM

使用 pip 安裝 bigquery-ml-utils 程式庫：
```
pip install bigquery-ml-utils
```

生成 NNLM 模型。下列 Python 程式碼會從 TensorFlow Hub 載入 NNLM 模型，並為 BigQuery 準備模型：

from bigquery_ml_utils import model_generator
import tensorflow_text

# Establish an instance of TextEmbeddingModelGenerator.
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()

# Generate an NNLM model.
text_embedding_model_generator.generate_text_embedding_model('nnlm', OUTPUT_MODEL_PATH)

請將 OUTPUT_MODEL_PATH 替換為本機資料夾的路徑，您可以在該資料夾中暫時儲存模型。

選用：列印產生的模型簽章：

import tensorflow as tf

reload_embedding_model = tf.saved_model.load(OUTPUT_MODEL_PATH)
print(reload_embedding_model.signatures["serving_default"])

如要將產生的模型從本機資料夾複製到 Cloud Storage bucket，請使用 Google Cloud CLI：
```
gcloud storage cp OUTPUT_MODEL_PATH gs://BUCKET_PATH/nnlm_model --recursive
```
將 BUCKET_PATH 替換為要複製模型的 Cloud Storage 值區名稱。

SWIVEL

使用 pip 安裝 bigquery-ml-utils 程式庫：
```
pip install bigquery-ml-utils
```

生成 SWIVEL 模型。下列 Python 程式碼會從 TensorFlow Hub 載入 SWIVEL 模型，並為 BigQuery 準備模型：

from bigquery_ml_utils import model_generator
import tensorflow_text

# Establish an instance of TextEmbeddingModelGenerator.
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()

# Generate a SWIVEL model.
text_embedding_model_generator.generate_text_embedding_model('swivel', OUTPUT_MODEL_PATH)

請將 OUTPUT_MODEL_PATH 替換為本機資料夾的路徑，您可以在該資料夾中暫時儲存模型。

選用：列印產生的模型簽章：

import tensorflow as tf

reload_embedding_model = tf.saved_model.load(OUTPUT_MODEL_PATH)
print(reload_embedding_model.signatures["serving_default"])

如要將產生的模型從本機資料夾複製到 Cloud Storage bucket，請使用 Google Cloud CLI：
```
gcloud storage cp OUTPUT_MODEL_PATH gs://BUCKET_PATH/swivel_model --recursive
```
將 BUCKET_PATH 替換為要複製模型的 Cloud Storage 值區名稱。

BERT

使用 pip 安裝 bigquery-ml-utils 程式庫：
```
pip install bigquery-ml-utils
```

生成 BERT 模型。下列 Python 程式碼會從 TensorFlow Hub 載入 BERT 模型，並為 BigQuery 準備模型：

from bigquery_ml_utils import model_generator
import tensorflow_text

# Establish an instance of TextEmbeddingModelGenerator.
text_embedding_model_generator = model_generator.TextEmbeddingModelGenerator()

# Generate a BERT model.
text_embedding_model_generator.generate_text_embedding_model('bert', OUTPUT_MODEL_PATH)

請將 OUTPUT_MODEL_PATH 替換為本機資料夾的路徑，您可以在該資料夾中暫時儲存模型。

選用：列印產生的模型簽章：

import tensorflow as tf

reload_embedding_model = tf.saved_model.load(OUTPUT_MODEL_PATH)
print(reload_embedding_model.signatures["serving_default"])

如要將產生的模型從本機資料夾複製到 Cloud Storage bucket，請使用 Google Cloud CLI：
```
gcloud storage cp OUTPUT_MODEL_PATH gs://BUCKET_PATH/bert_model --recursive
```
將 BUCKET_PATH 替換為要複製模型的 Cloud Storage 值區名稱。

將模型載入 BigQuery

選取下列其中一個模型：

NNLM

使用 CREATE MODEL 陳述式：

前往 Google Cloud 控制台的「BigQuery」頁面。

前往「BigQuery」

在查詢編輯器中輸入下列陳述式：

CREATE OR REPLACE MODEL tf_models_tutorial.nnlm_model
OPTIONS (
  model_type = 'TENSORFLOW',
  model_path = 'gs://BUCKET_NAME/nnlm_model/*');

將 BUCKET_NAME 改成您先前建立的值區名稱。

按一下「執行」。

如要進一步瞭解如何執行查詢，請參閱「執行互動式查詢」。

SWIVEL

使用 CREATE MODEL 陳述式：

前往 Google Cloud 控制台的「BigQuery」頁面。

前往「BigQuery」

在查詢編輯器中輸入下列陳述式：

CREATE OR REPLACE MODEL tf_models_tutorial.swivel_model
OPTIONS (
  model_type = 'TENSORFLOW',
  model_path = 'gs://BUCKET_NAME/swivel_model/*');

將 BUCKET_NAME 改成您先前建立的值區名稱。

按一下「執行」。

如要進一步瞭解如何執行查詢，請參閱「執行互動式查詢」。

BERT

如要將 BERT 模型載入 BigQuery，請將 BERT 模型匯入 Vertex AI、將模型部署至 Vertex AI 端點、建立連線，然後在 BigQuery 中建立遠端模型。

如要將 BERT 模型匯入 Vertex AI，請按照下列步驟操作：

前往 Google Cloud 控制台的 Vertex AI「模型登錄」頁面。

前往「Model registry」
按一下「匯入」，然後執行下列操作：
- 在「Name」(名稱) 中輸入 BERT。
- 在「Region」(地區) 中，選取與 Cloud Storage 值區地區相符的地區。
按一下「繼續」，然後執行下列操作：
- 在「Model framework version」(模型架構版本) 中選取「1.12」2.8。
- 在「模型構件位置」中，輸入儲存模型檔案的 Cloud Storage bucket 路徑。例如：gs://BUCKET_PATH/bert_model。
按一下 [匯入]。匯入完成後，模型會顯示在「模型登錄」頁面。

如要將 BERT 模型部署至 Vertex AI 端點，並連結至 BigQuery，請按照下列步驟操作：

前往 Google Cloud 控制台的 Vertex AI「模型登錄」頁面。

前往「Model registry」
按一下模型名稱。
按一下「Deploy & test」(部署及測試)。
按一下「Deploy to endpoint」(部署至端點)。
在「端點名稱」部分，輸入 bert_model_endpoint。
按一下「繼續」。
選取運算資源。
按一下 [Deploy] (部署)。
建立 BigQuery Cloud 資源連線，並授予連線服務帳戶存取權。

如要根據 Vertex AI 端點建立遠端模型，請使用 CREATE MODEL 陳述式：

前往 Google Cloud 控制台的「BigQuery」頁面。

前往「BigQuery」
在查詢編輯器中輸入下列陳述式：
```
CREATE OR REPLACE MODEL tf_models_tutorial.bert_model
INPUT(content STRING)
OUTPUT(embedding ARRAY<FLOAT64>)
REMOTE WITH CONNECTION `PROJECT_ID.CONNECTION_LOCATION.CONNECTION_ID`
OPTIONS (
  ENDPOINT = "https://ENDPOINT_LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/ENDPOINT_LOCATION/endpoints/ENDPOINT_ID");
```
取代下列項目：
- PROJECT_ID：專案 ID
- CONNECTION_LOCATION：BigQuery 連線的位置
- CONNECTION_ID：BigQuery 連線的 ID
  在 Google Cloud 控制台中查看連線詳細資料時，這是「連線 ID」中顯示的完整連線 ID 最後一個部分的值，例如 projects/myproject/locations/connection_location/connections/myconnection
- ENDPOINT_LOCATION：Vertex AI 端點的位置。例如：「us-central1」。
- ENDPOINT_ID：模型端點的 ID
按一下「執行」。

如要進一步瞭解如何執行查詢，請參閱「執行互動式查詢」。

生成文字嵌入

在本節中，您將使用ML.PREDICT()推論函式，從公開資料集 bigquery-public-data.imdb.reviews 的 review 欄生成文字嵌入。查詢會將表格限制為 500 列，以減少處理的資料量。

NNLM

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `tf_models_tutorial.nnlm_model`,
    (
    SELECT
      review AS content
    FROM
      `bigquery-public-data.imdb.reviews`
    LIMIT
      500)
  );

結果會類似如下：

+-----------------------+----------------------------------------+
| embedding             | content                                |
+-----------------------+----------------------------------------+
|  0.08599445223808289  | Isabelle Huppert must be one of the... |
| -0.04862852394580841  |                                        |
| -0.017750458791851997 |                                        |
|  0.8658871650695801   |                                        |
| ...                   |                                        |
+-----------------------+----------------------------------------+

SWIVEL

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `tf_models_tutorial.swivel_model`,
    (
    SELECT
      review AS content
    FROM
      `bigquery-public-data.imdb.reviews`
    LIMIT
      500)
  );

結果會類似如下：

+----------------------+----------------------------------------+
| embedding            | content                                |
+----------------------+----------------------------------------+
|  2.5952553749084473  | Isabelle Huppert must be one of the... |
| -4.015787601470947   |                                        |
|  3.6275434494018555  |                                        |
| -6.045154333114624   |                                        |
| ...                  |                                        |
+----------------------+----------------------------------------+

BERT

SELECT
  *
FROM
  ML.PREDICT(
    MODEL `tf_models_tutorial.bert_model`,
    (
    SELECT
      review AS content
    FROM
      `bigquery-public-data.imdb.reviews`
    LIMIT
      500)
  );

結果會類似如下：

+--------------+---------------------+----------------------------------------+
| embedding    | remote_model_status | content                                |
+--------------+---------------------+----------------------------------------+
| -0.694072425 | null                | Isabelle Huppert must be one of the... |
|  0.439208865 |                     |                                        |
|  0.99988997  |                     |                                        |
| -0.993487895 |                     |                                        |
| ...          |                     |                                        |
+--------------+---------------------+----------------------------------------+

清除所用資源

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.