Hundreds of organizations are fine-tuning Gemini models. Here's their favorite use cases.
Mikhail Chrestkha
Applied AI Solutions Manager
May Hu
Product Manager, Vertex AI
Many organizations continue experimenting with the best ways to utilize AI models within their organizations, particularly when it comes to the specific needs of their teams or their sector. Supervised fine-tuning is one effective way to customize model behavior using your unique data.
During fine-tuning, the model learns to perform specialized tasks, adapt to your industry or function, and improve output formats and styles aligning to your organization’s needs. On Vertex AI, you can fine-tune Gemini models with a few clicks in the console or with simple client libraries and SDKs (software development kits). All you need to do is prepare a labeled training dataset, choose the Gemini model to tune, and click “start.”
Valuable as fine tuning is, it's still new enough — like gen AI itself — that many of the teams and leaders we met with say it helps to know where to start. For that, we've created this handy guide to some of the top tuning use cases, which is itself just a starting point. The best way to discover your organization's AI needs is to dive in and get piloting in Vertex AI. (Google Cloud Consulting is here to help, too.)
The Vertex AI tuning service launched into general availability last quarter, supporting the latest Gemini 1.5 Pro and Gemini 1.5 Flash models — meaning the AI model can now be customized using inputs across text, image, audio, and documents. To help determine when to use fine-tuning for Gemini, we created a quick introduction on what supervised fine-tuning is, when to embrace it, and how it compares to other methods for optimizing your models output.
In our discussions with hundreds of enterprise customers, we've been able to identify many of the top tuning use cases, which we share below. These are followed by two case studies, where customers share their detailed experience with Vertex AI tuning and how it helped them achieve real business results.
Lessons from enterprises fine-tuning Gemini
Over the last few months, we’ve helped hundreds of Google Cloud customers explore the potential for fine-tuning to improve Gemini performance on their unique use cases. The majority of customers were looking to improve the accuracy and performance of model output with others looking to decrease cost or latency, optimize output formatting, and improve domain-specific understanding and factuality.
Tuning has proven very effective at adapting models to perform more complex tasks; and at improving performance in specialized domains such as biotech, games, healthcare, transportation, finance and retail. We also saw many customers move from fine-tuning text-based tasks to fine-tuning multimodal tasks with Gemini’s tuning capabilities supporting text, documents, images, and audio.
Historically, data scientists would require separate machine learning training & inference pipelines for individual ML tasks across image classification, text classification, sentiment analysis, document extraction, audio transcription, visual inspection, and more. Many of these pipelines would require learning new libraries, frameworks and model architectures.
Gemini’s new multimodal fine-tuning capabilities allows teams and developers to consolidate various predictive and generative AI tasks across modalities under a single foundation model API. This results in simplifying machine learning workflows and accelerates velocity of applying AI to many more use cases across an enterprise.
We’ve summarized top tuning use cases by modality below. Improving attribute extraction was the most common use case among the customers we worked with; this is because attribute extraction is a key component for function calling (a capability for models to call APIs, functions, and tools) and downstream agent-based systems that require highly accurate key-value pairs (data structure to accurately pass information to many enterprise and external systems). This signals that enterprise AI teams are connecting the power of large language models with their own unique data and application systems.
Top use cases customers fine-tuned Gemini for
Text, Documents, and Code
- Attribute extraction: Transform text and chat logs into organized data by fine-tuning a model to identify key attributes and output them in a structured format like JSONL.
- Classify long documents into predefined categories: Fine-tune a model to accurately classify lengthy documents into predefined categories, enabling efficient organization and retrieval of information.
- Code review: Use fine-tuning to create a model capable of providing insightful code reviews, identifying potential issues, and suggesting improvements.
- Code generation and translation: Fine-tune a model to generate code in various programming languages or domain-specific languages, automating repetitive coding tasks.
- Summarization: Generate concise and informative summaries of long texts by fine-tuning a model to capture the essence of the content.
- Improve helpfulness from RAG output: Enhance the helpfulness and accuracy of retrieval-augmented generation (RAG) systems by fine-tuning the underlying language model.
Image
- Product catalog enhancement: Extract key attributes from images (e.g., brand, color, size) to automatically build and enrich your product catalog.
- Image moderation: Fine-tune a model to detect and flag inappropriate or harmful content in images, ensuring a safer online experience.
- Visual inspection: Train a model to identify specific objects or defects within images, automating quality control or inspection processes.
- Image classification: Improve the accuracy of image classification for specific domains, such as medical imaging or satellite imagery analysis.
- Table content extraction: Extract data from tables within images and convert it into structured formats like spreadsheets or databases.
Audio
- Audio transcription: Generate highly accurate transcripts, even in noisy environments, and capable of complex or technical topics such as legal or medical information.
- Audio classification: Categorize music based on genre, mood, or other characteristics.
You can explore other Gemini prompting and tuning use cases in Vertex AI’s prompt gallery and tuning examples documentation.
Real-world customer examples
Within a few weeks and months, teams were able to fine-tune Gemini models, measure performance improvement, and drive business value — leading to increased productivity, improved customer experience, stronger guideline adherence, and cost savings. These use cases spanned industries across biotech, games, healthcare, transportation, finance, and retail. Below, we look at how NextNet and Augmedix have leveraged fine-tuning to drive model improvements and more importantly business impact.
“The fine-tuned Gemini Flash model was able to extract information really coherently, even though the specific sentence didn't have all the necessary information, and the context was linguistically complex. We were able to leverage Gemini fine-tuning to improve accuracy by 80% and reduce cost by 90%.” - Derek Park, Head of Data and Science, NextNet
“Gemini fine-tuning allows us to generate higher quality medical notes between doctors and patients faster than prompt-only based approaches.” - Ian Shakil, Founder, Augmedix
The future is yours to fine tune
When building AI applications tuning is one of the key levers that users have to boost model performance and drive value from their unique data. Tuning is becoming one of the core tools in an AI developer’s toolkit alongside prompt engineering, RAG, and function calling. Collecting feedback from many Gemini tuning users and enterprises, it has become clear how tuning can drive measurable improvements and outcomes across various use cases and industries. And with Gemini’s new multimodal tuning capabilities, developers can unlock even more AI-powered tasks across different media types and environments.
Happy tuning!
Ready to get started?
-
Learn more about Vertex AI customer use cases and stories.
-
Learn about Gemini fine-tuning and best practices.
-
Head over to the Vertex AI tuning documentation to try tuning on your text, code, document, image and audio data.
-
Dive into our Generative AI repository and explore tuning notebooks and samples.
Big thank you to Erwin Huizenga, Christos Aniftos, Amir Imani, Advait Bopardikar, Skander Hannachi, and Bethany Wang for contributing to this blog.
Opening image created with Imagen 3 running on Vertex AI, using the prompt: A bunch of hands reaching in with wrenches and other tools... to fine-tune a vast, complex futuristic looking machine... made of widgets, gizmos, and pipes... drawn in a flat, cheerful style... using the colors blue red yellow green.