We provide several examples of how you can use the rapid evaluation SDK to perform evaluations on your generative AI models.
Evaluate your models in real time
The Vertex AI rapid evaluation service lets you evaluate your generative AI models in real time. To learn how to use rapid evaluation, see Run a rapid evaluation.
For an end-to-end example, see the colab notebook for the Vertex AI SDK for Python with rapid evaluation.
Evaluate and optimize prompt template design
Use the rapid evaluation SDK to evaluate the effect of prompt engineering. Examine the statistics corresponding with each prompt template to understand how differences in prompting impacts evaluation results.
For an end-to-end example, see the notebook Evaluate and Optimize Prompt Template Design for Better Results.
Evaluate and select LLM models using benchmark metrics
Use the rapid evaluation SDK to score both Gemini Pro and Text Bison models on a benchmark dataset and a task.
For an end-to-end example, see the notebook Score and Select LLM Models.
Evaluate and select model-generation settings
Use the rapid evaluation SDK to adjust the temperature of Gemini Pro
on a summarization task and to evaluate quality
, fluency
,
safety
, and verbosity
.
For an end-to-end example, see the notebook Evaluate and Select Model Generation Settings.
Define your metrics
Use the rapid evaluation SDK to evaluate multiple prompt templates with your custom defined metrics.
For an end-to-end example, see the notebook Define Your Own Metrics.
Evaluate tool use
Use the rapid evaluation SDK to define an API function and a tool for the Gemini model. You can also use the SDK to evaluate tool use and function-calling quality for Gemini.
For an end-to-end example, see the notebook Evaluate Generative Model Tool Use and Function Calling.
Evaluate generated answers from RAG for question answering
Use the rapid evaluation SDK to evaluate a question-answering task from Retrieval-Augmented Generation (RAG) generated answers.
For an end-to-end example, see the notebook Evaluate Generated Answers from RAG for Question Answering.
What's next
- Learn about generative AI evaluation.
- Learn about online evaluation with rapid evaluation.
- Learn about model-based pairwise evaluation with AutoSxS pipeline.
- Learn about the computation-based evaluation pipeline.
- Learn how to tune a foundation model.