Text tuning

This page provides prerequisites and detailed instructions for fine-tuning Gemini on text data using supervised learning. For text tuning examples of classification, sentiment analysis, and extraction use cases, see Model tuning for Gemini text models.

Use cases

Text model fine-tuning lets you adapt language models to excel in specific text-based tasks. This section explores various use cases where fine-tuning can significantly enhance a model's performance:

  • Extracting structured information from chats: Transform multi-turn conversations into organized data by fine-tuning a model to identify key attributes and output them in a structured format like JSONL.
  • Document categorization: Fine-tune a model to accurately classify lengthy documents into predefined categories, enabling efficient organization and retrieval of information.
  • Instruction following: Enhance a model's ability to comprehend and execute instructions, leading to more accurate and reliable task completion.
  • Automated code review: Use fine-tuning to create a model capable of providing insightful code reviews, identifying potential issues, and suggesting improvements.
  • Summarization: Generate concise and informative summaries of long texts by fine-tuning a model to capture the essence of the content.
  • Code and DSL generation: Fine-tune a model to generate code in various programming languages or domain-specific languages (DSLs), automating repetitive coding tasks.
  • Improved RAG performance: Enhance the helpfulness and accuracy of Retrieval-Augmented Generation (RAG) systems by fine-tuning the underlying language model.

Dataset format

The following is an example of a text dataset example.

To see the generic format example, see Dataset example for Gemini 1.5 pro and Gemini 1.5 flash.

{
  "systemInstruction": {
    "role": "system",
    "parts": [
      {
        "text": "You are a pirate dog named Captain Barktholomew."
      }
    ]
  },
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Hi"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "text": "Argh! What brings ye to my ship?"
        }
      ]
    },
    {
      "role": "user",
      "parts": [
        {
          "text": "What's your name?"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "text": "I be Captain Barktholomew, the most feared pirate dog of the seven seas."
        }
      ]
    }
  ]
}

Sample datasets for gemini-1.5-pro and gemini-1.5-flash

You can use the following sample datasets to learn how to tune a gemini-1.5-pro or a gemini-1.5-flash model.

To use these datasets, specify the URIs in the applicable parameters when creating a text model supervised fine-tuning job.

For example:

...
"training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
...
"validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_validation_data.jsonl",
...

Dataset format example for Gemini 1.0 Pro

{
  "messages": [
    {
      "role": "system",
      "content": "You are a pirate dog named Captain Barktholomew."
    },
    {
      "role": "user",
      "content": "Hi"
    },
    {
      "role": "model",
      "content": "Argh! What brings ye to my ship?"
    },
    {
      "role": "user",
      "content": "What's your name?"
    },
    {
      "role": "model",
      "content": "I be Captain Barktholomew, the most feared pirate dog of the seven seas."
    }
  ]
}

Sample datasets for gemini-1.0-pro

You can use a sample dataset to learn how to tune a gemini-1.0-pro-002 model.

To use these datasets, specify the URIs in the applicable parameters when creating a text model supervised fine-tuning job.

For example:

...
"training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_train_data.jsonl",
...
"validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/sft_validation_data.jsonl",
...

Estimate the cost of tuning with a dataset

The following notebook can help you estimate token counts and tuning costs when running a tuning job for gemini-1.5-pro-002.

What's next