We provide several examples of how you can use the Generative AI on Vertex AI evaluation service to perform evaluations on your generative AI models.
Evaluate your models in real time
The Vertex AI rapid evaluation service lets you evaluate your generative AI models in real time. To learn how to use rapid evaluation, see Run a rapid evaluation.
Evaluate and optimize prompt template design
Use the rapid evaluation SDK to evaluate the effect of prompt engineering. Examine the statistics corresponding with each prompt template to understand how differences in prompting impacts evaluation results.
Evaluate and select LLMs using benchmark metrics
Use the rapid evaluation SDK to score both Gemini Pro and Text Bison models on a benchmark dataset and a task.
Evaluate and select model-generation settings
Use the rapid evaluation SDK to adjust the temperature of Gemini Pro
on a summarization task and to evaluate quality
, fluency
,
safety
, and verbosity
.
Define your metrics
Use the rapid evaluation SDK to evaluate multiple prompt templates with your custom defined metrics.
Evaluate tool use and function calling
Use the rapid evaluation SDK to define an API function and a tool for the Gemini model. You can also use the SDK to evaluate tool use and function-calling quality for Gemini.
Evaluate generated answers from RAG for question answering
Use the rapid evaluation SDK to evaluate a question-answering task from Retrieval-Augmented Generation (RAG) generated answers.
Evaluate an LLM in Vertex AI Model Registry against a third-party model
Use AutoSxS to evaluate responses between two models and determine a winner. You can either provide the responses or generate them using Vertex AI Batch Predictions.
Check autorater alignment against a human-preference dataset
Use AutoSxS to check how well autorater ratings align with a set of human ratings you provide for a particular task. Determine if AutoSxS is sufficient for your use case, or if it needs further customization.
Evaluate Langchain chains
Use the rapid evaluation SDK to evaluate your Langchain chains. Prepare your data, set up your Langchain chain, and run your evaluation.
What's next
- Learn about generative AI evaluation.
- Learn about online evaluation with rapid evaluation.
- Learn about model-based pairwise evaluation with AutoSxS pipeline.
- Learn about the computation-based evaluation pipeline.
- Learn how to tune a foundation model.