조정 작업을 실행하려면 Cloud Storage 버킷에 하나 이상의 데이터 세트를 업로드해야 합니다. 새 Cloud Storage 버킷을 만들거나 기존 버킷을 사용하여 데이터 세트 파일을 저장할 수 있습니다. 버킷의 리전은 중요하지 않지만 모델을 조정하려는 동일한Google Cloud 프로젝트에 있는 버킷을 사용하는 것이 좋습니다.
[[["이해하기 쉬움","easyToUnderstand","thumb-up"],["문제가 해결됨","solvedMyProblem","thumb-up"],["기타","otherUp","thumb-up"]],[["이해하기 어려움","hardToUnderstand","thumb-down"],["잘못된 정보 또는 샘플 코드","incorrectInformationOrSampleCode","thumb-down"],["필요한 정보/샘플이 없음","missingTheInformationSamplesINeed","thumb-down"],["번역 문제","translationIssue","thumb-down"],["기타","otherDown","thumb-down"]],["최종 업데이트: 2025-09-04(UTC)"],[],[],null,["# Prepare supervised fine-tuning data for Translation LLM models\n\nThis document describes how to define a supervised fine-tuning dataset for a Translation LLM\nmodel. You can tune text data types.\n\nAbout supervised fine-tuning datasets\n-------------------------------------\n\nA supervised fine-tuning dataset is used to fine-tune a pre-trained model to a\nspecific domain. The input data should be similar to what\nyou expect the model to encounter in real-world use. The output labels should\nrepresent the correct answers or outcomes for each input.\n\n**Training dataset**\n\nTo tune a model, you provide a *training dataset*. For best results, we recommend\nthat you start with 100 examples. You can scale up to thousands of examples if\nneeded. The quality of the dataset is far more important than the quantity.\n\nLimitations:\n\n- Max input and out token per examples: 1,000\n- Max file size of training dataset: Up to 1GB for JSONL.\n\n**Validation dataset**\n\nWe strongly recommend that you provide a validation dataset. A validation dataset\nhelps you measure the effectiveness of a tuning job.\n\nLimitations:\n\n- Max input and out token per examples: 1,000\n- Max numbers of examples in validation dataset: 1024\n- Max file size of training dataset: Up to 1GB for JSONL.\n\n### Dataset format\n\nYour model tuning dataset must be in the [JSON Lines](https://jsonlines.org/) (JSONL) format, where each line contains a single tuning example.\nBefore tuning your model, you must\n[upload your dataset to a Cloud Storage bucket](#upload-datasets). Make sure to upload to us-central1. \n\n {\n \"contents\": [\n {\n \"role\": string,\n \"parts\": [\n {\n \"text\": string,\n }\n ]\n }\n ]\n }\n\n### Parameters\n\nThe example contains data with the following parameters:\n\nDataset example for `translation-llm-002`\n-----------------------------------------\n\n {\n \"contents\": [\n {\n \"role\": \"user\",\n \"parts\": [\n {\n \"text\": \"English: Hello. Spanish:\",\n }\n ]\n }\n {\n \"role\": \"model\"\",\n \"parts\": [\n {\n \"text\": \"Hola.\",\n }\n ]\n }\n ]\n }\n\n### Contents\n\nThe base structured data type containing multi-part content of a message.\n\nThis class consists of two main properties: `role` and `parts`. The `role` property\ndenotes the individual producing the content, while the `parts` property contains\nmultiple elements, each representing a segment of data within a message.\n\n### Parts\n\nA data type containing media that is part of a multi-part `Content` message.\n\n### Upload tuning datasets to Cloud Storage\n\nTo run a tuning job, you need to upload one or more datasets to a\nCloud Storage bucket. You can either\n[create a new Cloud Storage bucket](/storage/docs/creating-buckets#create_a_new_bucket)\nor use an existing one to store dataset files. The region of the bucket doesn't\nmatter, but we recommend that you use a bucket that's in the same\nGoogle Cloud project where you plan to tune your model.\n\nAfter your bucket is ready,\n[upload](/storage/docs/uploading-objects#uploading-an-object) your dataset file\nto the bucket.\n\nNotebook examples for preparing data\n------------------------------------\n\nHere are some Colab notebook examples to help you get started.\n\n### AutoML Translation Dataset\n\nIf you already have Translation Datasets uploaded to AutoML Translation,\nyou can follow the Colab example to export them for tuning.\n\n### Local Dataset\n\nIf you have your data in a TSV, CSV, or TMX format locally, you can upload them to\nColab for tuning.\n\nWhat's next\n-----------\n\n- Run a [supervised fine-tuning job](/vertex-ai/generative-ai/docs/models/translation-use-supervised-tuning)."]]