Evaluates instances based on a given metric.
Endpoint
posthttps://{endpoint}/v1beta1/{location}:evaluateInstances
Where {service-endpoint}
is one of the supported service endpoints.
Path parameters
location
string
Required. The resource name of the Location to evaluate the instances. Format: projects/{project}/locations/{location}
Request body
The request body contains data with the following structure:
metric_inputs
Union type
metric_inputs
can be only one of the following:Auto metric instances. Instances and metric spec for exact match metric.
Instances and metric spec for bleu metric.
Instances and metric spec for rouge metric.
LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric.
Input for coherence metric.
Input for safety metric.
Input for groundedness metric.
Input for fulfillment metric.
Input for summarization quality metric.
Input for pairwise summarization quality metric.
Input for summarization helpfulness metric.
Input for summarization verbosity metric.
Input for question answering quality metric.
Input for pairwise question answering quality metric.
Input for question answering relevance metric.
Input for question answering helpfulness metric.
Input for question answering correctness metric.
Input for pointwise metric.
Input for pairwise metric.
Tool call metric instances. Input for tool call valid metric.
Input for tool name match metric.
Input for tool parameter key match metric.
Input for tool parameter key value match metric.
Translation metrics. Input for Comet metric.
Input for trajectory exact match metric.
Input for trajectory in order match metric.
Input for trajectory match any order metric.
Input for trajectory precision metric.
Input for trajectory recall metric.
Input for trajectory single tool use metric.
Example request
Python
Response body
Response message for EvaluationService.EvaluateInstances.
If successful, the response body contains data with the following structure:
evaluation_results
Union type
evaluation_results
can be only one of the following:Auto metric evaluation results. Results for exact match metric.
Results for bleu metric.
Results for rouge metric.
LLM-based metric evaluation result. General text generation metrics, applicable to other categories. result for fluency metric.
result for coherence metric.
result for safety metric.
result for groundedness metric.
result for fulfillment metric.
Summarization only metrics. result for summarization quality metric.
result for pairwise summarization quality metric.
result for summarization helpfulness metric.
result for summarization verbosity metric.
Question answering only metrics. result for question answering quality metric.
result for pairwise question answering quality metric.
result for question answering relevance metric.
result for question answering helpfulness metric.
result for question answering correctness metric.
Generic metrics. result for pointwise metric.
result for pairwise metric.
Tool call metrics. Results for tool call valid metric.
Results for tool name match metric.
Results for tool parameter key match metric.
Results for tool parameter key value match metric.
Translation metrics. result for Comet metric.
result for trajectory exact match metric.
result for trajectory in order match metric.
result for trajectory any order match metric.
result for trajectory precision metric.
Results for trajectory recall metric.
Results for trajectory single tool use metric.
JSON representation |
---|
{ // evaluation_results "exactMatchResults": { object ( |
ExactMatchInput
Input for exact match metric.
Required. Spec for exact match metric.
Required. Repeated exact match instances.
JSON representation |
---|
{ "metricSpec": { object ( |
ExactMatchSpec
This type has no fields.
Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.
ExactMatchInstance
Spec for exact match instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
BleuInput
Input for bleu metric.
Required. Spec for bleu score metric.
Required. Repeated bleu instances.
JSON representation |
---|
{ "metricSpec": { object ( |
BleuSpec
Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.
useEffectiveOrder
boolean
Optional. Whether to useEffectiveOrder to compute bleu score.
JSON representation |
---|
{ "useEffectiveOrder": boolean } |
BleuInstance
Spec for bleu instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
RougeInput
Input for rouge metric.
Required. Spec for rouge score metric.
Required. Repeated rouge instances.
JSON representation |
---|
{ "metricSpec": { object ( |
RougeSpec
Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.
rougeType
string
Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
useStemmer
boolean
Optional. Whether to use stemmer to compute rouge score.
splitSummaries
boolean
Optional. Whether to split summaries while using rougeLsum.
JSON representation |
---|
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean } |
RougeInstance
Spec for rouge instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
FluencyInput
Input for fluency metric.
Required. Spec for fluency score metric.
Required. Fluency instance.
JSON representation |
---|
{ "metricSpec": { object ( |
FluencySpec
Spec for fluency score metric.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "version": integer } |
FluencyInstance
Spec for fluency instance.
prediction
string
Required. Output of the evaluated model.
JSON representation |
---|
{ "prediction": string } |
CoherenceInput
Input for coherence metric.
Required. Spec for coherence score metric.
Required. Coherence instance.
JSON representation |
---|
{ "metricSpec": { object ( |
CoherenceSpec
Spec for coherence score metric.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "version": integer } |
CoherenceInstance
Spec for coherence instance.
prediction
string
Required. Output of the evaluated model.
JSON representation |
---|
{ "prediction": string } |
SafetyInput
Input for safety metric.
Required. Spec for safety metric.
Required. Safety instance.
JSON representation |
---|
{ "metricSpec": { object ( |
SafetySpec
Spec for safety metric.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "version": integer } |
SafetyInstance
Spec for safety instance.
prediction
string
Required. Output of the evaluated model.
JSON representation |
---|
{ "prediction": string } |
GroundednessInput
Input for groundedness metric.
Required. Spec for groundedness metric.
Required. Groundedness instance.
JSON representation |
---|
{ "metricSpec": { object ( |
GroundednessSpec
Spec for groundedness metric.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "version": integer } |
GroundednessInstance
Spec for groundedness instance.
prediction
string
Required. Output of the evaluated model.
context
string
Required. Background information provided in context used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "context": string } |
FulfillmentInput
Input for fulfillment metric.
Required. Spec for fulfillment score metric.
Required. Fulfillment instance.
JSON representation |
---|
{ "metricSpec": { object ( |
FulfillmentSpec
Spec for fulfillment metric.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "version": integer } |
FulfillmentInstance
Spec for fulfillment instance.
prediction
string
Required. Output of the evaluated model.
instruction
string
Required. Inference instruction prompt to compare prediction with.
JSON representation |
---|
{ "prediction": string, "instruction": string } |
SummarizationQualityInput
Input for summarization quality metric.
Required. Spec for summarization quality score metric.
Required. Summarization quality instance.
JSON representation |
---|
{ "metricSpec": { object ( |
SummarizationQualitySpec
Spec for summarization quality score metric.
useReference
boolean
Optional. Whether to use instance.reference to compute summarization quality.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
SummarizationQualityInstance
Spec for summarization quality instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Required. Text to be summarized.
instruction
string
Required. Summarization prompt for LLM.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
PairwiseSummarizationQualityInput
Input for pairwise summarization quality metric.
Required. Spec for pairwise summarization quality score metric.
Required. Pairwise summarization quality instance.
JSON representation |
---|
{ "metricSpec": { object ( |
PairwiseSummarizationQualitySpec
Spec for pairwise summarization quality score metric.
useReference
boolean
Optional. Whether to use instance.reference to compute pairwise summarization quality.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
PairwiseSummarizationQualityInstance
Spec for pairwise summarization quality instance.
prediction
string
Required. Output of the candidate model.
baselinePrediction
string
Required. Output of the baseline model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Required. Text to be summarized.
instruction
string
Required. Summarization prompt for LLM.
JSON representation |
---|
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string } |
SummarizationHelpfulnessInput
Input for summarization helpfulness metric.
Required. Spec for summarization helpfulness score metric.
Required. Summarization helpfulness instance.
JSON representation |
---|
{ "metricSpec": { object ( |
SummarizationHelpfulnessSpec
Spec for summarization helpfulness score metric.
useReference
boolean
Optional. Whether to use instance.reference to compute summarization helpfulness.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
SummarizationHelpfulnessInstance
Spec for summarization helpfulness instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Required. Text to be summarized.
instruction
string
Optional. Summarization prompt for LLM.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
SummarizationVerbosityInput
Input for summarization verbosity metric.
Required. Spec for summarization verbosity score metric.
Required. Summarization verbosity instance.
JSON representation |
---|
{ "metricSpec": { object ( |
SummarizationVerbositySpec
Spec for summarization verbosity score metric.
useReference
boolean
Optional. Whether to use instance.reference to compute summarization verbosity.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
SummarizationVerbosityInstance
Spec for summarization verbosity instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Required. Text to be summarized.
instruction
string
Optional. Summarization prompt for LLM.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringQualityInput
Input for question answering quality metric.
Required. Spec for question answering quality score metric.
Required. Question answering quality instance.
JSON representation |
---|
{ "metricSpec": { object ( |
QuestionAnsweringQualitySpec
Spec for question answering quality score metric.
useReference
boolean
Optional. Whether to use instance.reference to compute question answering quality.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringQualityInstance
Spec for question answering quality instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Required. Text to answer the question.
instruction
string
Required. Question Answering prompt for LLM.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
PairwiseQuestionAnsweringQualityInput
Input for pairwise question answering quality metric.
Required. Spec for pairwise question answering quality score metric.
Required. Pairwise question answering quality instance.
JSON representation |
---|
{ "metricSpec": { object ( |
PairwiseQuestionAnsweringQualitySpec
Spec for pairwise question answering quality score metric.
useReference
boolean
Optional. Whether to use instance.reference to compute question answering quality.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
PairwiseQuestionAnsweringQualityInstance
Spec for pairwise question answering quality instance.
prediction
string
Required. Output of the candidate model.
baselinePrediction
string
Required. Output of the baseline model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Required. Text to answer the question.
instruction
string
Required. Question Answering prompt for LLM.
JSON representation |
---|
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringRelevanceInput
Input for question answering relevance metric.
Required. Spec for question answering relevance score metric.
Required. Question answering relevance instance.
JSON representation |
---|
{ "metricSpec": { object ( |
QuestionAnsweringRelevanceSpec
Spec for question answering relevance metric.
useReference
boolean
Optional. Whether to use instance.reference to compute question answering relevance.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringRelevanceInstance
Spec for question answering relevance instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Optional. Text provided as context to answer the question.
instruction
string
Required. The question asked and other instruction in the inference prompt.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringHelpfulnessInput
Input for question answering helpfulness metric.
Required. Spec for question answering helpfulness score metric.
Required. Question answering helpfulness instance.
JSON representation |
---|
{ "metricSpec": { object ( |
QuestionAnsweringHelpfulnessSpec
Spec for question answering helpfulness metric.
useReference
boolean
Optional. Whether to use instance.reference to compute question answering helpfulness.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringHelpfulnessInstance
Spec for question answering helpfulness instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Optional. Text provided as context to answer the question.
instruction
string
Required. The question asked and other instruction in the inference prompt.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
QuestionAnsweringCorrectnessInput
Input for question answering correctness metric.
Required. Spec for question answering correctness score metric.
Required. Question answering correctness instance.
JSON representation |
---|
{ "metricSpec": { object ( |
QuestionAnsweringCorrectnessSpec
Spec for question answering correctness metric.
useReference
boolean
Optional. Whether to use instance.reference to compute question answering correctness.
version
integer
Optional. Which version to use for evaluation.
JSON representation |
---|
{ "useReference": boolean, "version": integer } |
QuestionAnsweringCorrectnessInstance
Spec for question answering correctness instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
context
string
Optional. Text provided as context to answer the question.
instruction
string
Required. The question asked and other instruction in the inference prompt.
JSON representation |
---|
{ "prediction": string, "reference": string, "context": string, "instruction": string } |
PointwiseMetricInput
Input for pointwise metric.
Required. Spec for pointwise metric.
Required. Pointwise metric instance.
JSON representation |
---|
{ "metricSpec": { object ( |
PointwiseMetricSpec
Spec for pointwise metric.
metricPromptTemplate
string
Required. Metric prompt template for pointwise metric.
JSON representation |
---|
{ "metricPromptTemplate": string } |
PointwiseMetricInstance
Pointwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.
instance
Union type
instance
can be only one of the following:jsonInstance
string
Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PointwiseMetricSpec.instance_prompt_template.
JSON representation |
---|
{ // instance "jsonInstance": string // Union type } |
PairwiseMetricInput
Input for pairwise metric.
Required. Spec for pairwise metric.
Required. Pairwise metric instance.
JSON representation |
---|
{ "metricSpec": { object ( |
PairwiseMetricSpec
Spec for pairwise metric.
metricPromptTemplate
string
Required. Metric prompt template for pairwise metric.
JSON representation |
---|
{ "metricPromptTemplate": string } |
PairwiseMetricInstance
Pairwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.
instance
Union type
instance
can be only one of the following:jsonInstance
string
Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PairwiseMetricSpec.instance_prompt_template.
JSON representation |
---|
{ // instance "jsonInstance": string // Union type } |
ToolCallValidInput
Input for tool call valid metric.
Required. Spec for tool call valid metric.
Required. Repeated tool call valid instances.
JSON representation |
---|
{ "metricSpec": { object ( |
ToolCallValidSpec
This type has no fields.
Spec for tool call valid metric.
ToolCallValidInstance
Spec for tool call valid instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
ToolNameMatchInput
Input for tool name match metric.
Required. Spec for tool name match metric.
Required. Repeated tool name match instances.
JSON representation |
---|
{ "metricSpec": { object ( |
ToolNameMatchSpec
This type has no fields.
Spec for tool name match metric.
ToolNameMatchInstance
Spec for tool name match instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
ToolParameterKeyMatchInput
Input for tool parameter key match metric.
Required. Spec for tool parameter key match metric.
Required. Repeated tool parameter key match instances.
JSON representation |
---|
{ "metricSpec": { object ( |
ToolParameterKeyMatchSpec
This type has no fields.
Spec for tool parameter key match metric.
ToolParameterKeyMatchInstance
Spec for tool parameter key match instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
ToolParameterKVMatchInput
Input for tool parameter key value match metric.
Required. Spec for tool parameter key value match metric.
Required. Repeated tool parameter key value match instances.
JSON representation |
---|
{ "metricSpec": { object ( |
ToolParameterKVMatchSpec
Spec for tool parameter key value match metric.
useStrictStringMatch
boolean
Optional. Whether to use STRICT string match on parameter values.
JSON representation |
---|
{ "useStrictStringMatch": boolean } |
ToolParameterKVMatchInstance
Spec for tool parameter key value match instance.
prediction
string
Required. Output of the evaluated model.
reference
string
Required. Ground truth used to compare against the prediction.
JSON representation |
---|
{ "prediction": string, "reference": string } |
CometInput
Input for Comet metric.
Required. Spec for comet metric.
Required. Comet instance.
JSON representation |
---|
{ "metricSpec": { object ( |
CometSpec
Spec for Comet metric.
sourceLanguage
string
Optional. Source language in BCP-47 format.
targetLanguage
string
Optional. Target language in BCP-47 format. Covers both prediction and reference.
Required. Which version to use for evaluation.
JSON representation |
---|
{
"sourceLanguage": string,
"targetLanguage": string,
"version": enum ( |
CometVersion
Comet version options.
Enums | |
---|---|
COMET_VERSION_UNSPECIFIED |
Comet version unspecified. |
COMET_22_SRC_REF |
Comet 22 for translation + source + reference (source-reference-combined). |
CometInstance
Spec for Comet instance - The fields used for evaluation are dependent on the comet version.
prediction
string
Required. Output of the evaluated model.
reference
string
Optional. Ground truth used to compare against the prediction.
source
string
Optional. Source text in original language.
JSON representation |
---|
{ "prediction": string, "reference": string, "source": string } |
TrajectoryExactMatchInput
Instances and metric spec for TrajectoryExactMatch metric.
Required. Spec for TrajectoryExactMatch metric.
Required. Repeated TrajectoryExactMatch instance.
JSON representation |
---|
{ "metricSpec": { object ( |
TrajectoryExactMatchSpec
This type has no fields.
Spec for TrajectoryExactMatch metric - returns 1 if tool calls in the reference trajectory exactly match the predicted trajectory, else 0.
TrajectoryExactMatchInstance
Spec for TrajectoryExactMatch instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
JSON representation |
---|
{ "predictedTrajectory": { object ( |
Trajectory
ToolCall
Spec for tool call.
toolName
string
Required. Spec for tool name
toolInput
string
Optional. Spec for tool input
JSON representation |
---|
{ "toolName": string, "toolInput": string } |
TrajectoryInOrderMatchInput
Instances and metric spec for TrajectoryInOrderMatch metric.
Required. Spec for TrajectoryInOrderMatch metric.
Required. Repeated TrajectoryInOrderMatch instance.
JSON representation |
---|
{ "metricSpec": { object ( |
TrajectoryInOrderMatchSpec
This type has no fields.
Spec for TrajectoryInOrderMatch metric - returns 1 if tool calls in the reference trajectory appear in the predicted trajectory in the same order, else 0.
TrajectoryInOrderMatchInstance
Spec for TrajectoryInOrderMatch instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
JSON representation |
---|
{ "predictedTrajectory": { object ( |
TrajectoryAnyOrderMatchInput
Instances and metric spec for TrajectoryAnyOrderMatch metric.
Required. Spec for TrajectoryAnyOrderMatch metric.
Required. Repeated TrajectoryAnyOrderMatch instance.
JSON representation |
---|
{ "metricSpec": { object ( |
TrajectoryAnyOrderMatchSpec
This type has no fields.
Spec for TrajectoryAnyOrderMatch metric - returns 1 if all tool calls in the reference trajectory appear in the predicted trajectory in any order, else 0.
TrajectoryAnyOrderMatchInstance
Spec for TrajectoryAnyOrderMatch instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
JSON representation |
---|
{ "predictedTrajectory": { object ( |
TrajectoryPrecisionInput
Instances and metric spec for TrajectoryPrecision metric.
Required. Spec for TrajectoryPrecision metric.
Required. Repeated TrajectoryPrecision instance.
JSON representation |
---|
{ "metricSpec": { object ( |
TrajectoryPrecisionSpec
This type has no fields.
Spec for TrajectoryPrecision metric - returns a float score based on average precision of individual tool calls.
TrajectoryPrecisionInstance
Spec for TrajectoryPrecision instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
JSON representation |
---|
{ "predictedTrajectory": { object ( |
TrajectoryRecallInput
Instances and metric spec for TrajectoryRecall metric.
Required. Spec for TrajectoryRecall metric.
Required. Repeated TrajectoryRecall instance.
JSON representation |
---|
{ "metricSpec": { object ( |
TrajectoryRecallSpec
This type has no fields.
Spec for TrajectoryRecall metric - returns a float score based on average recall of individual tool calls.
TrajectoryRecallInstance
Spec for TrajectoryRecall instance.
Required. Spec for predicted tool call trajectory.
Required. Spec for reference tool call trajectory.
JSON representation |
---|
{ "predictedTrajectory": { object ( |
TrajectorySingleToolUseInput
Instances and metric spec for TrajectorySingleToolUse metric.
Required. Spec for TrajectorySingleToolUse metric.
Required. Repeated TrajectorySingleToolUse instance.
JSON representation |
---|
{ "metricSpec": { object ( |
TrajectorySingleToolUseSpec
Spec for TrajectorySingleToolUse metric - returns 1 if tool is present in the predicted trajectory, else 0.
toolName
string
Required. Spec for tool name to be checked for in the predicted trajectory.
JSON representation |
---|
{ "toolName": string } |
TrajectorySingleToolUseInstance
Spec for TrajectorySingleToolUse instance.
Required. Spec for predicted tool call trajectory.
JSON representation |
---|
{
"predictedTrajectory": {
object ( |
ExactMatchResults
Results for exact match metric.
Output only. Exact match metric values.
JSON representation |
---|
{
"exactMatchMetricValues": [
{
object ( |
ExactMatchMetricValue
Exact match metric value for an instance.
score
number
Output only. Exact match score.
JSON representation |
---|
{ "score": number } |
BleuResults
Results for bleu metric.
Output only. Bleu metric values.
JSON representation |
---|
{
"bleuMetricValues": [
{
object ( |
BleuMetricValue
Bleu metric value for an instance.
score
number
Output only. Bleu score.
JSON representation |
---|
{ "score": number } |
RougeResults
Results for rouge metric.
Output only. Rouge metric values.
JSON representation |
---|
{
"rougeMetricValues": [
{
object ( |
RougeMetricValue
Rouge metric value for an instance.
score
number
Output only. Rouge score.
JSON representation |
---|
{ "score": number } |
FluencyResult
Spec for fluency result.
explanation
string
Output only. Explanation for fluency score.
score
number
Output only. Fluency score.
confidence
number
Output only. confidence for fluency score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
CoherenceResult
Spec for coherence result.
explanation
string
Output only. Explanation for coherence score.
score
number
Output only. Coherence score.
confidence
number
Output only. confidence for coherence score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
SafetyResult
Spec for safety result.
explanation
string
Output only. Explanation for safety score.
score
number
Output only. Safety score.
confidence
number
Output only. confidence for safety score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
GroundednessResult
Spec for groundedness result.
explanation
string
Output only. Explanation for groundedness score.
score
number
Output only. Groundedness score.
confidence
number
Output only. confidence for groundedness score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
FulfillmentResult
Spec for fulfillment result.
explanation
string
Output only. Explanation for fulfillment score.
score
number
Output only. Fulfillment score.
confidence
number
Output only. confidence for fulfillment score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
SummarizationQualityResult
Spec for summarization quality result.
explanation
string
Output only. Explanation for summarization quality score.
score
number
Output only. Summarization Quality score.
confidence
number
Output only. confidence for summarization quality score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
PairwiseSummarizationQualityResult
Spec for pairwise summarization quality result.
Output only. Pairwise summarization prediction choice.
explanation
string
Output only. Explanation for summarization quality score.
confidence
number
Output only. confidence for summarization quality score.
JSON representation |
---|
{
"pairwiseChoice": enum ( |
PairwiseChoice
Pairwise prediction autorater preference.
Enums | |
---|---|
PAIRWISE_CHOICE_UNSPECIFIED |
Unspecified prediction choice. |
BASELINE |
baseline prediction wins |
CANDIDATE |
Candidate prediction wins |
TIE |
Winner cannot be determined |
SummarizationHelpfulnessResult
Spec for summarization helpfulness result.
explanation
string
Output only. Explanation for summarization helpfulness score.
score
number
Output only. Summarization Helpfulness score.
confidence
number
Output only. confidence for summarization helpfulness score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
SummarizationVerbosityResult
Spec for summarization verbosity result.
explanation
string
Output only. Explanation for summarization verbosity score.
score
number
Output only. Summarization Verbosity score.
confidence
number
Output only. confidence for summarization verbosity score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
QuestionAnsweringQualityResult
Spec for question answering quality result.
explanation
string
Output only. Explanation for question answering quality score.
score
number
Output only. Question Answering Quality score.
confidence
number
Output only. confidence for question answering quality score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
PairwiseQuestionAnsweringQualityResult
Spec for pairwise question answering quality result.
Output only. Pairwise question answering prediction choice.
explanation
string
Output only. Explanation for question answering quality score.
confidence
number
Output only. confidence for question answering quality score.
JSON representation |
---|
{
"pairwiseChoice": enum ( |
QuestionAnsweringRelevanceResult
Spec for question answering relevance result.
explanation
string
Output only. Explanation for question answering relevance score.
score
number
Output only. Question Answering Relevance score.
confidence
number
Output only. confidence for question answering relevance score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
QuestionAnsweringHelpfulnessResult
Spec for question answering helpfulness result.
explanation
string
Output only. Explanation for question answering helpfulness score.
score
number
Output only. Question Answering Helpfulness score.
confidence
number
Output only. confidence for question answering helpfulness score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
QuestionAnsweringCorrectnessResult
Spec for question answering correctness result.
explanation
string
Output only. Explanation for question answering correctness score.
score
number
Output only. Question Answering Correctness score.
confidence
number
Output only. confidence for question answering correctness score.
JSON representation |
---|
{ "explanation": string, "score": number, "confidence": number } |
PointwiseMetricResult
Spec for pointwise metric result.
explanation
string
Output only. Explanation for pointwise metric score.
score
number
Output only. Pointwise metric score.
JSON representation |
---|
{ "explanation": string, "score": number } |
PairwiseMetricResult
Spec for pairwise metric result.
Output only. Pairwise metric choice.
explanation
string
Output only. Explanation for pairwise metric score.
JSON representation |
---|
{
"pairwiseChoice": enum ( |
ToolCallValidResults
Results for tool call valid metric.
Output only. Tool call valid metric values.
JSON representation |
---|
{
"toolCallValidMetricValues": [
{
object ( |
ToolCallValidMetricValue
Tool call valid metric value for an instance.
score
number
Output only. Tool call valid score.
JSON representation |
---|
{ "score": number } |
ToolNameMatchResults
Results for tool name match metric.
Output only. Tool name match metric values.
JSON representation |
---|
{
"toolNameMatchMetricValues": [
{
object ( |
ToolNameMatchMetricValue
Tool name match metric value for an instance.
score
number
Output only. Tool name match score.
JSON representation |
---|
{ "score": number } |
ToolParameterKeyMatchResults
Results for tool parameter key match metric.
Output only. Tool parameter key match metric values.
JSON representation |
---|
{
"toolParameterKeyMatchMetricValues": [
{
object ( |
ToolParameterKeyMatchMetricValue
Tool parameter key match metric value for an instance.
score
number
Output only. Tool parameter key match score.
JSON representation |
---|
{ "score": number } |
ToolParameterKVMatchResults
Results for tool parameter key value match metric.
Output only. Tool parameter key value match metric values.
JSON representation |
---|
{
"toolParameterKvMatchMetricValues": [
{
object ( |
ToolParameterKVMatchMetricValue
Tool parameter key value match metric value for an instance.
score
number
Output only. Tool parameter key value match score.
JSON representation |
---|
{ "score": number } |
CometResult
Spec for Comet result - calculates the comet score for the given instance using the version specified in the spec.
score
number
Output only. Comet score. Range depends on version.
JSON representation |
---|
{ "score": number } |
TrajectoryExactMatchResults
Results for TrajectoryExactMatch metric.
Output only. TrajectoryExactMatch metric values.
JSON representation |
---|
{
"trajectoryExactMatchMetricValues": [
{
object ( |
TrajectoryExactMatchMetricValue
TrajectoryExactMatch metric value for an instance.
score
number
Output only. TrajectoryExactMatch score.
JSON representation |
---|
{ "score": number } |
TrajectoryInOrderMatchResults
Results for TrajectoryInOrderMatch metric.
Output only. TrajectoryInOrderMatch metric values.
JSON representation |
---|
{
"trajectoryInOrderMatchMetricValues": [
{
object ( |
TrajectoryInOrderMatchMetricValue
TrajectoryInOrderMatch metric value for an instance.
score
number
Output only. TrajectoryInOrderMatch score.
JSON representation |
---|
{ "score": number } |
TrajectoryAnyOrderMatchResults
Results for TrajectoryAnyOrderMatch metric.
Output only. TrajectoryAnyOrderMatch metric values.
JSON representation |
---|
{
"trajectoryAnyOrderMatchMetricValues": [
{
object ( |
TrajectoryAnyOrderMatchMetricValue
TrajectoryAnyOrderMatch metric value for an instance.
score
number
Output only. TrajectoryAnyOrderMatch score.
JSON representation |
---|
{ "score": number } |
TrajectoryPrecisionResults
Results for TrajectoryPrecision metric.
Output only. TrajectoryPrecision metric values.
JSON representation |
---|
{
"trajectoryPrecisionMetricValues": [
{
object ( |
TrajectoryPrecisionMetricValue
TrajectoryPrecision metric value for an instance.
score
number
Output only. TrajectoryPrecision score.
JSON representation |
---|
{ "score": number } |
TrajectoryRecallResults
Results for TrajectoryRecall metric.
Output only. TrajectoryRecall metric values.
JSON representation |
---|
{
"trajectoryRecallMetricValues": [
{
object ( |
TrajectoryRecallMetricValue
TrajectoryRecall metric value for an instance.
score
number
Output only. TrajectoryRecall score.
JSON representation |
---|
{ "score": number } |
TrajectorySingleToolUseResults
Results for TrajectorySingleToolUse metric.
Output only. TrajectorySingleToolUse metric values.
JSON representation |
---|
{
"trajectorySingleToolUseMetricValues": [
{
object ( |
TrajectorySingleToolUseMetricValue
TrajectorySingleToolUse metric value for an instance.
score
number
Output only. TrajectorySingleToolUse score.
JSON representation |
---|
{ "score": number } |