{"systemInstruction":{"role":string,"parts":[{"text":string}]},"contents":[{"role":string,"parts":[{// Union field data can be only one of the following:"text":string,"fileData":{"mimeType":string,"fileUri":string}}]}]}
参数
示例包含具有以下参数的数据:
参数
contents
必需:Content
与模型当前对话的内容。
对于单轮查询,这是单个实例。对于多轮查询,这是包含对话历史记录和最新请求的重复字段。
systemInstruction
可选:Content。
适用于 gemini-1.5-flash 和 gemini-1.5-pro。
有关引导模型获得更好性能的说明。例如,“回答尽可能简明扼要”或“请勿在回答中使用技术词汇”。
text 字符串会计入 token 限制。
systemInstruction 的 role 字段会被忽略,不会影响模型的性能。
目录
包含消息的多部分内容的基本结构化数据类型。
此类包含两个主要属性:role 和 parts。role 属性表示生成内容的个人,而 parts 属性包含多个元素,每个元素表示消息中的一段数据。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-04。"],[],[],null,["# Prepare supervised fine-tuning data for Gemini models\n\nThis document describes how to define a supervised fine-tuning dataset for a Gemini\nmodel. You can tune [text](/vertex-ai/generative-ai/docs/models/tune_gemini/text_tune), [image](/vertex-ai/generative-ai/docs/models/tune_gemini/image_tune), [audio](/vertex-ai/generative-ai/docs/models/tune_gemini/audio_tune), and [document](/vertex-ai/generative-ai/docs/models/tune_gemini/doc_tune) data types.\n\nAbout supervised fine-tuning datasets\n-------------------------------------\n\nA supervised fine-tuning dataset is used to fine-tune a pre-trained model to a\nspecific task or domain. The input data should be similar to what\nyou expect the model to encounter in real-world use. The output labels should\nrepresent the correct answers or outcomes for each input.\n\n**Training dataset**\n\nTo tune a model, you provide a *training dataset*. For best results, we recommend\nthat you start with 100 examples. You can scale up to thousands of examples if\nneeded. The quality of the dataset is far more important than the quantity.\n\n**Validation dataset**\n\nWe strongly recommend that you provide a validation dataset. A validation dataset\nhelps you measure the effectiveness of a tuning job.\n\n**Limitations**\n\nFor limitations on datasets, such as maximum input and output tokens,\nmaximum validation dataset size, and maximum training dataset file size, see\n[About supervised fine-tuning for Gemini models](/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning#limitations).\n\n### Dataset format\n\nWe support the following data formats:\n\n- [Multimodal dataset on Vertex AI (preview)](/vertex-ai/generative-ai/docs/multimodal/datasets).\n\n- [JSON Lines](https://jsonlines.org/) (JSONL) format, where each line contains a single tuning example.\n Before tuning your model, you must\n [upload your dataset to a Cloud Storage bucket](#upload-datasets).\n\nDataset example for Gemini\n--------------------------\n\n {\n \"systemInstruction\": {\n \"role\": string,\n \"parts\": [\n {\n \"text\": string\n }\n ]\n },\n \"contents\": [\n {\n \"role\": string,\n \"parts\": [\n {\n // Union field data can be only one of the following:\n \"text\": string,\n \"fileData\": {\n \"mimeType\": string,\n \"fileUri\": string\n }\n }\n ]\n }\n ]\n }\n\n### Parameters\n\nThe example contains data with the following parameters:\n\n### Contents\n\nThe base structured data type containing multi-part content of a message.\n\nThis class consists of two main properties: `role` and `parts`. The `role` property\ndenotes the individual producing the content, while the `parts` property contains\nmultiple elements, each representing a segment of data within a message.\n\n### Parts\n\nA data type containing media that is part of a multi-part `Content` message.\n\nDataset example\n---------------\n\nEach conversation example in a tuning dataset is composed of a required\n`messages` field and an optional `context` field.\n\nThe `messages` field consists of an array of role-content pairs:\n\n- The `role` field refers to the author of the message and is set to either `system`, `user`, or `model`. The `system` role is optional and can only occur at the first element of the messages list. The `user` and `model` roles are required and can repeat in an alternating manner.\n- The `content` field is the content of the message.\n\nFor each example, the maximum token length for `context` and `messages` combined\nis 131,072 tokens. Additionally, each `content` field for the `model` field shouldn't\nexceed 8,192 tokens. \n\n {\n \"messages\": [\n {\n \"role\": string,\n \"content\": string\n }\n ]\n }\n\n### Maintain consistency with production data\n\nThe examples in your datasets should match your expected production traffic. If\nyour dataset contains specific formatting, keywords, instructions, or\ninformation, the production data should be formatted in the same way and contain\nthe same instructions.\n\nFor example, if the examples in your dataset include a `\"question:\"` and a\n`\"context:\"`, production traffic should also be formatted to include a\n`\"question:\"` and a `\"context:\"` in the same order as it appears in the dataset\nexamples. If you exclude the context, the model will not recognize the pattern,\neven if the exact question was in an example in the dataset.\n\n### Upload tuning datasets to Cloud Storage\n\nTo run a tuning job, you need to upload one or more datasets to a\nCloud Storage bucket. You can either\n[create a new Cloud Storage bucket](/storage/docs/creating-buckets#create_a_new_bucket)\nor use an existing one to store dataset files. The region of the bucket doesn't\nmatter, but we recommend that you use a bucket that's in the same\nGoogle Cloud project where you plan to tune your model.\n\nAfter your bucket is ready,\n[upload](/storage/docs/uploading-objects#uploading-an-object) your dataset file\nto the bucket.\n\n### Follow the best practice of prompt design\n\nOnce you have your training dataset and you've trained the model, it's time to design\nprompts. It's important to follow the [best practice of prompt design](/vertex-ai/generative-ai/docs/learn/prompts/prompt-design-strategies) in your training dataset to give detailed description of the task to be performed and how the output\nshould look like.\n\nWhat's next\n-----------\n\n- Choose a region to [tune a model](/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning-region-settings).\n- To learn how supervised fine-tuning can be used in a solution that builds a generative AI knowledge base, see [Jump Start Solution: Generative AI\n knowledge base](/architecture/ai-ml/generative-ai-knowledge-base)."]]