Document tuning

This page provides prerequisites and detailed instructions for fine-tuning Gemini on document data using supervised learning.

Use cases

Fine-tuning lets you customize powerful language models for your specific needs. Here are some key use cases where fine-tuning with your own set of PDFs can significantly enhance a model's performance:

  • Internal knowledge base: Convert your internal documents into an AI-powered knowledge base that provides instant answers and insights. For example, a sales representative could instantly access product specifications and pricing details from past training materials.
  • Research assistant: Create a research assistant capable of analyzing a collection of research papers, articles, and books. A researcher studying climate change could quickly analyze scientific papers to identify trends in sea level rise or assess the effectiveness of different mitigation strategies.
  • Legal or regulatory compliance: Fine-tuning on legal documents can help automate contract review, flagging potential inconsistencies or areas of risk. This allows legal professionals to focus on higher-level tasks while ensuring compliance.
  • Automated report generation: Automate the analysis of complex financial reports, extracting key performance indicators and generating summaries for stakeholders. This can save time and reduce the risk of errors compared to manual analysis.
  • Content summarization and analysis: Summarize lengthy PDF documents, extract key insights, and analyze trends. For example, a market research team could analyze a collection of customer surveys to identify key themes and sentiment.
  • Document comparison and version control: Compare different versions of a document to identify changes and track revisions. This can be particularly useful in collaborative environments where multiple authors contribute to a document.

Limitations

  • Maximum pages per example: 16
  • Maximum PDF files per example: 4
  • Maximum PDF file size: 20MB

To learn more about document understanding requirements, see Document understanding.

Dataset format

The following is an example of a document dataset example.

To see the generic format example, see Dataset example for Gemini 1.5 pro and Gemini 1.5 flash.

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "fileData": {
            "mimeType": "application/pdf",
            "fileUri": "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"
            }
        },
        {
          "text": "You are a very professional document summarization specialist.Please summarize the given document."
        }
      ]
    }, 
    {
      "role": "model",
      "parts": [
        {
          "text": "The report introduces Gemini 1.5 Pro, a multimodal AI model developed by Google DeepMind. The report positions Gemini 1.5 Pro as a significant advancement in multimodal AI, pushing the boundaries of long-context understanding and opening new avenues for future research and applications."
        }
      ]
    }
  ]
}

What's next