Vertex AI provides model evaluation metrics for both predictive AI and generative AI models. This page provides an overview of the evaluation service for generative AI models. To evaluate a predictive AI model, see Model evaluation in Vertex AI. This page provides an overview of the Gen AI evaluation service service, which you can use to evaluate generative models and applications against your own criteria. This document covers the following topics: The following diagram summarizes the overall workflow for evaluating a generative model: You can use the Gen AI evaluation service in Vertex AI to evaluate any generative model or application against your own criteria. While public leaderboards offer general insights, the evaluation service helps you understand how a model performs on your specific tasks and data. Evaluation is a critical step throughout the generative AI development lifecycle, including model selection, prompt engineering, and model customization. The service is integrated within Vertex AI to help you launch and reuse evaluations as needed. The Gen AI evaluation service can help you with the following tasks: To evaluate a generative AI model or application using Gen AI evaluation service, follow these steps: The following Vertex AI SDK for Python notebooks demonstrate various generative AI evaluation use cases. The Vertex AI Gen AI evaluation service supports Google's foundation models, third-party models, and open models. You can provide pre-generated predictions directly, or automatically generate candidate model responses. The following table helps you choose the right integration method for your model. This section describes the language support for model-based and translation metrics. For Gemini model-based metrics, the Gen AI evaluation service supports all input languages that are supported by Gemini 2.0 Flash. However, the quality of evaluations for non-English inputs might not be as high as the quality for English inputs. For translation tasks, you can use the following model-based metrics, which support the languages listed in this section: MetricX Supported languages for MetricX: COMET Supported languages for COMET:
Gen AI evaluation service capabilities
Evaluation process
<abbr data-title="A reusable object in the Vertex AI evaluation service that encapsulates your evaluation logic, including models, metrics, and dataset.">EvalTask</abbr>
to reuse your evaluation logic through Vertex AI.
Notebooks for evaluation use cases
Evaluate models
Evaluate prompt templates
Evaluate Gen AI applications
Evaluate Gen AI agents
Metric customization
Other topics
Supported models
Model Source
Description
Use Case
Google's foundation models
Directly use Google's models like Gemini 2.0 Flash for response generation.
When you want to leverage Google's latest models without managing infrastructure.
Vertex AI Model Registry
Use any model (custom-trained, imported) that is deployed as an endpoint in the Vertex AI Model Registry.
For evaluating your fine-tuned models or other models managed within Vertex AI.
Third-party and open models (via SDK)
Integrate with external model APIs using their respective SDKs.
When your model is hosted outside of Google Cloud and provides an SDK for access.
Wrapped model endpoints
Create a wrapper around external model endpoints using the Vertex AI SDK.
For models that are accessible via an API endpoint but don't have a direct SDK integration.
Supported languages
For model-based metrics
For translation metrics
Metric
Description
MetricX
A family of model-based metrics for evaluating text generation tasks, including translation, by comparing model output to a reference.
COMET
A neural framework for training multilingual machine translation evaluation models which has been shown to have high correlation with human judgments of translation quality.
What's next
Gen AI evaluation service overview
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-18 UTC.