使用指定的参数调整嵌入模型
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
此代码示例演示了如何使用 Vertex AI 对嵌入模型进行微调。此示例使用预训练模型,并基于特定数据集对其进行调优。
深入探索
如需查看包含此代码示例的详细文档,请参阅以下内容:
代码示例
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],[],[],[],null,["# Tune an embedding model using the specified parameters\n\nThis code sample demonstrates how to fine-tune an embedding model using Vertex AI. The sample uses a pre-trained model and tunes it on a specific dataset.\n\nExplore further\n---------------\n\n\nFor detailed documentation that includes this code sample, see the following:\n\n- [Tune text embeddings](/vertex-ai/generative-ai/docs/models/tune-embeddings)\n\nCode sample\n-----------\n\n### Python\n\n\nBefore trying this sample, follow the Python setup instructions in the\n[Vertex AI quickstart using\nclient libraries](/vertex-ai/docs/start/client-libraries).\n\n\nFor more information, see the\n[Vertex AI Python API\nreference documentation](/python/docs/reference/aiplatform/latest).\n\n\nTo authenticate to Vertex AI, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import re\n\n from google.cloud.aiplatform import initializer as aiplatform_init\n from vertexai.language_models import TextEmbeddingModel\n\n\n def tune_embedding_model(\n api_endpoint: str,\n base_model_name: str = \"text-embedding-005\",\n corpus_path: str = \"gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/corpus.jsonl\",\n queries_path: str = \"gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/queries.jsonl\",\n train_label_path: str = \"gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/train.tsv\",\n test_label_path: str = \"gs://cloud-samples-data/ai-platform/embedding/goog-10k-2024/r11/test.tsv\",\n ): # noqa: ANN201\n \"\"\"Tune an embedding model using the specified parameters.\n Args:\n api_endpoint (str): The API endpoint for the Vertex AI service.\n base_model_name (str): The name of the base model to use for tuning.\n corpus_path (str): GCS URI of the JSONL file containing the corpus data.\n queries_path (str): GCS URI of the JSONL file containing the queries data.\n train_label_path (str): GCS URI of the TSV file containing the training labels.\n test_label_path (str): GCS URI of the TSV file containing the test labels.\n \"\"\"\n match = re.search(r\"^(\\w+-\\w+)\", api_endpoint)\n location = match.group(1) if match else \"us-central1\"\n base_model = TextEmbeddingModel.from_pretrained(base_model_name)\n tuning_job = base_model.https://cloud.google.com/python/docs/reference/vertexai/latest/vertexai.language_models._language_models._TunableModelMixin.html#vertexai_language_models__language_models__TunableModelMixin_tune_model(\n task_type=\"DEFAULT\",\n corpus_data=corpus_path,\n queries_data=queries_path,\n training_data=train_label_path,\n test_data=test_label_path,\n batch_size=128, # The batch size to use for training.\n train_steps=1000, # The number of training steps.\n tuned_model_location=location,\n output_dimensionality=768, # The dimensionality of the output embeddings.\n learning_rate_multiplier=1.0, # The multiplier for the learning rate.\n )\n return tuning_job\n\nWhat's next\n-----------\n\n\nTo search and filter code samples for other Google Cloud products, see the\n[Google Cloud sample browser](/docs/samples?product=generativeaionvertexai)."]]