Text Embeddings API 将文本数据转换为数值向量。这些向量表示旨在捕获它们所表示字词的语义含义和上下文。
支持的模型:
- 英语模型
- textembedding-gecko@001
- textembedding-gecko@002
- textembedding-gecko@003
- textembedding-gecko@latest
- text-embedding-preview-0409
- 多语言模型
- textembedding-gecko-multilingual@001
- textembedding-gecko-multilingual@latest
- text-multilingual-embedding-preview-0409
语法
- PROJECT_ID =
PROJECT_ID
- REGION =
REGION
- MODEL_ID =
us-central1
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict -d \ '{ "instances": [ ... ], "parameters": { ... } }'
Python
import vertexai from vertexai.language_models import TextEmbeddingModel vertexai.init(project=PROJECT_ID, location=REGION) model = TextEmbeddingModel.from_pretrained(MODEL_ID) embeddings = model.get_embeddings(...)
参数列表
参数 | |
---|---|
|
|
|
|
|
可选: 如果设置为 true,输入文本将被截断。设置为 false 时,如果输入文本长度超过模型支持的最大长度,则返回错误。默认值为 true。 |
|
可选: 用于指定输出嵌入大小。如果设置此参数,则输出嵌入将被截断为指定的大小。 |
TextEmbeddingInput
您要为其生成嵌入的文本。
参数 | |
---|---|
|
您要为其生成嵌入的文本。 |
|
可选: 用于传达预期的下游应用,以帮助模型生成更好的嵌入。 |
|
可选: 用于帮助模型生成更好的嵌入。 |
示例
- PROJECT_ID =
PROJECT_ID
- REGION =
REGION
- MODEL_ID =
us-central1
基本用例
以下示例展示了如何获取文本字符串的嵌入。
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict -d \ '{ "instances": [ { "content": "What is life?"} ], }'
Python
import vertexai from vertexai.language_models import TextEmbeddingModel vertexai.init(project=PROJECT_ID, location=REGION) model = TextEmbeddingModel.from_pretrained(MODEL_ID) embeddings = model.get_embeddings(["What is life?"]) vector = embeddings[0].values print(f"Length of Embedding Vector: {len(vector)}")
高级用例
以下示例演示了一些高级功能
- 使用 task_type 和 title 提高嵌入质量。
- 使用参数来控制 API 的行为。
curl
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${MODEL_ID}:predict \ -d '{ "instances": [ { "content": "What is life?", "task_type": "RETRIEVAL_DOCUMENT", "title": "life question", }, ], "parameters": { "autoTruncate": false, "outputDimensionality": 256 } }'
Python
import vertexai from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel vertexai.init(project=PROJECT_ID, location=REGION) model = TextEmbeddingModel.from_pretrained(MODEL_ID) embeddings = model.get_embeddings( texts=[ TextEmbeddingInput( text="What is life?", task_type="RETRIEVAL_DOCUMENT", title="life question" ) ], auto_truncate=False, output_dimensionality=256, ) print("embeddings\n", embeddings)
深入探索
如需详细文档,请参阅以下内容: