Image tuning

This guide shows you how to prepare data for supervised fine-tuning of a Gemini model for image tasks. This page covers the following topics:

  • Use cases: Learn about common applications for image tuning, such as product catalog enhancement and visual inspection.
  • Limitations: Understand the constraints on image count and file size for tuning.
  • Dataset format: Review the required JSONL structure for your image tuning dataset.
  • Sample datasets: Find sample datasets you can use to get started with image tuning.

Use cases

Fine-tuning lets you adapt base Gemini models for specialized tasks. Common use cases for image tuning include:

  • Product catalog enhancement: Extract key attributes from images (for example, brand, color, and size) to automatically build and enrich your product catalog.
  • Image moderation: Fine-tune a model to detect and flag inappropriate or harmful content in images to help ensure a safer online experience.
  • Visual inspection: Train a model to identify specific objects or defects within images to automate quality control or inspection processes.
  • Image classification: Improve the accuracy of image classification for specific domains, such as medical imaging or satellite imagery analysis.
  • Image-based recommendations: Analyze images to provide personalized recommendations, such as suggesting similar products or complementary items.
  • Table content extraction: Extract data from tables within images and convert it into structured formats like spreadsheets or databases.

Limitations

When you tune an image model, the following limitations apply:

  • Maximum images per example: 30
  • Maximum image file size: 20MB

To learn more about image sample requirements, see the Image understanding page.

Dataset format

Your dataset must be in JSON Lines (JSONL) format, where each line is a single training example. The <abbr data-title="A Uniform Resource Identifier that points to the location of a file, typically in a cloud storage bucket or at a public URL.">fileUri</abbr> for your image can be a URI for a file in a Cloud Storage bucket or a publicly available HTTP or HTTPS URL.

For the general dataset structure, see Dataset example for Gemini.

The following example shows an image dataset:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "image/jpeg",
            "fileUri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/image/longcap100/100.jpeg"
            }
        }, 
        {
          "text": "Describe this image in detail that captures the essence of it."
        }
      ]
    }, 
    {
      "role": "model",
      "parts": [
        {
          "text": "A man stands on a road, wearing a blue denim jacket, tan pants, and white sneakers. He has his hands in his pockets and is wearing a white t-shirt under his jacket. The man's pants are cuffed, and his shoes are white. The road is dark grey, and the leaves are green. The man is standing in the shade, and the light is shining on the ground."
        }
      ]
    }
  ]
}

Sample datasets

You can use the following sample datasets to learn how to tune a Gemini model. To use these datasets, specify the URIs in the applicable parameters when you create a supervised fine-tuning job.

  • Training dataset:

    "training_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/text/sft_train_data.jsonl",
    
  • Validation dataset:

    "validation_dataset_uri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/text/sft_validation_data.jsonl",
    

What's next