Method: projects.locations.evaluateInstances

Evaluates instances based on a given metric.

Endpoint

post https://{endpoint}/v1beta1/{location}:evaluateInstances

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

location string

Required. The resource name of the Location to evaluate the instances. Format: projects/{project}/locations/{location}

Request body

The request body contains data with the following structure:

Fields
Union field metric_inputs. Instances and specs for evaluation metric_inputs can be only one of the following:
exactMatchInput object (ExactMatchInput)

Auto metric instances. Instances and metric spec for exact match metric.

bleuInput object (BleuInput)

Instances and metric spec for bleu metric.

rougeInput object (RougeInput)

Instances and metric spec for rouge metric.

fluencyInput object (FluencyInput)

LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric.

coherenceInput object (CoherenceInput)

Input for coherence metric.

safetyInput object (SafetyInput)

Input for safety metric.

groundednessInput object (GroundednessInput)

Input for groundedness metric.

fulfillmentInput object (FulfillmentInput)

Input for fulfillment metric.

summarizationQualityInput object (SummarizationQualityInput)

Input for summarization quality metric.

pairwiseSummarizationQualityInput object (PairwiseSummarizationQualityInput)

Input for pairwise summarization quality metric.

summarizationHelpfulnessInput object (SummarizationHelpfulnessInput)

Input for summarization helpfulness metric.

summarizationVerbosityInput object (SummarizationVerbosityInput)

Input for summarization verbosity metric.

questionAnsweringQualityInput object (QuestionAnsweringQualityInput)

Input for question answering quality metric.

pairwiseQuestionAnsweringQualityInput object (PairwiseQuestionAnsweringQualityInput)

Input for pairwise question answering quality metric.

questionAnsweringRelevanceInput object (QuestionAnsweringRelevanceInput)

Input for question answering relevance metric.

questionAnsweringHelpfulnessInput object (QuestionAnsweringHelpfulnessInput)

Input for question answering helpfulness metric.

questionAnsweringCorrectnessInput object (QuestionAnsweringCorrectnessInput)

Input for question answering correctness metric.

pointwiseMetricInput object (PointwiseMetricInput)

Input for pointwise metric.

pairwiseMetricInput object (PairwiseMetricInput)

Input for pairwise metric.

toolCallValidInput object (ToolCallValidInput)

Tool call metric instances. Input for tool call valid metric.

toolNameMatchInput object (ToolNameMatchInput)

Input for tool name match metric.

toolParameterKeyMatchInput object (ToolParameterKeyMatchInput)

Input for tool parameter key match metric.

toolParameterKvMatchInput object (ToolParameterKVMatchInput)

Input for tool parameter key value match metric.

Example request

Python

import pandas as pd

import vertexai
from vertexai.preview.evaluation import EvalTask, MetricPromptTemplateExamples

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

eval_dataset = pd.DataFrame(
    {
        "instruction": [
            "Summarize the text in one sentence.",
            "Summarize the text such that a five-year-old can understand.",
        ],
        "context": [
            """As part of a comprehensive initiative to tackle urban congestion and foster
            sustainable urban living, a major city has revealed ambitious plans for an
            extensive overhaul of its public transportation system. The project aims not
            only to improve the efficiency and reliability of public transit but also to
            reduce the city\'s carbon footprint and promote eco-friendly commuting options.
            City officials anticipate that this strategic investment will enhance
            accessibility for residents and visitors alike, ushering in a new era of
            efficient, environmentally conscious urban transportation.""",
            """A team of archaeologists has unearthed ancient artifacts shedding light on a
            previously unknown civilization. The findings challenge existing historical
            narratives and provide valuable insights into human history.""",
        ],
        "response": [
            "A major city is revamping its public transportation system to fight congestion, reduce emissions, and make getting around greener and easier.",
            "Some people who dig for old things found some very special tools and objects that tell us about people who lived a long, long time ago! What they found is like a new puzzle piece that helps us understand how people used to live.",
        ],
    }
)

eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[
        MetricPromptTemplateExamples.Pointwise.SUMMARIZATION_QUALITY,
        MetricPromptTemplateExamples.Pointwise.GROUNDEDNESS,
        MetricPromptTemplateExamples.Pointwise.VERBOSITY,
        MetricPromptTemplateExamples.Pointwise.INSTRUCTION_FOLLOWING,
    ],
)

prompt_template = (
    "Instruction: {instruction}. Article: {context}. Summary: {response}"
)
result = eval_task.evaluate(prompt_template=prompt_template)

print("Summary Metrics:\n")

for key, value in result.summary_metrics.items():
    print(f"{key}: \t{value}")

print("\n\nMetrics Table:\n")
print(result.metrics_table)
# Example response:
# Summary Metrics:
# row_count:      2
# summarization_quality/mean:     3.5
# summarization_quality/std:      2.1213203435596424
# ...

Response body

Response message for EvaluationService.EvaluateInstances.

If successful, the response body contains data with the following structure:

Fields
Union field evaluation_results. Evaluation results will be served in the same order as presented in EvaluationRequest.instances. evaluation_results can be only one of the following:
exactMatchResults object (ExactMatchResults)

Auto metric evaluation results. Results for exact match metric.

bleuResults object (BleuResults)

Results for bleu metric.

rougeResults object (RougeResults)

Results for rouge metric.

fluencyResult object (FluencyResult)

LLM-based metric evaluation result. General text generation metrics, applicable to other categories. result for fluency metric.

coherenceResult object (CoherenceResult)

result for coherence metric.

safetyResult object (SafetyResult)

result for safety metric.

groundednessResult object (GroundednessResult)

result for groundedness metric.

fulfillmentResult object (FulfillmentResult)

result for fulfillment metric.

summarizationQualityResult object (SummarizationQualityResult)

Summarization only metrics. result for summarization quality metric.

pairwiseSummarizationQualityResult object (PairwiseSummarizationQualityResult)

result for pairwise summarization quality metric.

summarizationHelpfulnessResult object (SummarizationHelpfulnessResult)

result for summarization helpfulness metric.

summarizationVerbosityResult object (SummarizationVerbosityResult)

result for summarization verbosity metric.

questionAnsweringQualityResult object (QuestionAnsweringQualityResult)

Question answering only metrics. result for question answering quality metric.

pairwiseQuestionAnsweringQualityResult object (PairwiseQuestionAnsweringQualityResult)

result for pairwise question answering quality metric.

questionAnsweringRelevanceResult object (QuestionAnsweringRelevanceResult)

result for question answering relevance metric.

questionAnsweringHelpfulnessResult object (QuestionAnsweringHelpfulnessResult)

result for question answering helpfulness metric.

questionAnsweringCorrectnessResult object (QuestionAnsweringCorrectnessResult)

result for question answering correctness metric.

pointwiseMetricResult object (PointwiseMetricResult)

Generic metrics. result for pointwise metric.

pairwiseMetricResult object (PairwiseMetricResult)

result for pairwise metric.

toolCallValidResults object (ToolCallValidResults)

Tool call metrics. Results for tool call valid metric.

toolNameMatchResults object (ToolNameMatchResults)

Results for tool name match metric.

toolParameterKeyMatchResults object (ToolParameterKeyMatchResults)

Results for tool parameter key match metric.

toolParameterKvMatchResults object (ToolParameterKVMatchResults)

Results for tool parameter key value match metric.

JSON representation
{

  // Union field evaluation_results can be only one of the following:
  "exactMatchResults": {
    object (ExactMatchResults)
  },
  "bleuResults": {
    object (BleuResults)
  },
  "rougeResults": {
    object (RougeResults)
  },
  "fluencyResult": {
    object (FluencyResult)
  },
  "coherenceResult": {
    object (CoherenceResult)
  },
  "safetyResult": {
    object (SafetyResult)
  },
  "groundednessResult": {
    object (GroundednessResult)
  },
  "fulfillmentResult": {
    object (FulfillmentResult)
  },
  "summarizationQualityResult": {
    object (SummarizationQualityResult)
  },
  "pairwiseSummarizationQualityResult": {
    object (PairwiseSummarizationQualityResult)
  },
  "summarizationHelpfulnessResult": {
    object (SummarizationHelpfulnessResult)
  },
  "summarizationVerbosityResult": {
    object (SummarizationVerbosityResult)
  },
  "questionAnsweringQualityResult": {
    object (QuestionAnsweringQualityResult)
  },
  "pairwiseQuestionAnsweringQualityResult": {
    object (PairwiseQuestionAnsweringQualityResult)
  },
  "questionAnsweringRelevanceResult": {
    object (QuestionAnsweringRelevanceResult)
  },
  "questionAnsweringHelpfulnessResult": {
    object (QuestionAnsweringHelpfulnessResult)
  },
  "questionAnsweringCorrectnessResult": {
    object (QuestionAnsweringCorrectnessResult)
  },
  "pointwiseMetricResult": {
    object (PointwiseMetricResult)
  },
  "pairwiseMetricResult": {
    object (PairwiseMetricResult)
  },
  "toolCallValidResults": {
    object (ToolCallValidResults)
  },
  "toolNameMatchResults": {
    object (ToolNameMatchResults)
  },
  "toolParameterKeyMatchResults": {
    object (ToolParameterKeyMatchResults)
  },
  "toolParameterKvMatchResults": {
    object (ToolParameterKVMatchResults)
  }
  // End of list of possible types for union field evaluation_results.
}

ExactMatchInput

Input for exact match metric.

Fields
metricSpec object (ExactMatchSpec)

Required. Spec for exact match metric.

instances[] object (ExactMatchInstance)

Required. Repeated exact match instances.

JSON representation
{
  "metricSpec": {
    object (ExactMatchSpec)
  },
  "instances": [
    {
      object (ExactMatchInstance)
    }
  ]
}

ExactMatchSpec

This type has no fields.

Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.

ExactMatchInstance

Spec for exact match instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

BleuInput

Input for bleu metric.

Fields
metricSpec object (BleuSpec)

Required. Spec for bleu score metric.

instances[] object (BleuInstance)

Required. Repeated bleu instances.

JSON representation
{
  "metricSpec": {
    object (BleuSpec)
  },
  "instances": [
    {
      object (BleuInstance)
    }
  ]
}

BleuSpec

Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.

Fields
useEffectiveOrder boolean

Optional. Whether to useEffectiveOrder to compute bleu score.

JSON representation
{
  "useEffectiveOrder": boolean
}

BleuInstance

Spec for bleu instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

RougeInput

Input for rouge metric.

Fields
metricSpec object (RougeSpec)

Required. Spec for rouge score metric.

instances[] object (RougeInstance)

Required. Repeated rouge instances.

JSON representation
{
  "metricSpec": {
    object (RougeSpec)
  },
  "instances": [
    {
      object (RougeInstance)
    }
  ]
}

RougeSpec

Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.

Fields
rougeType string

Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.

useStemmer boolean

Optional. Whether to use stemmer to compute rouge score.

splitSummaries boolean

Optional. Whether to split summaries while using rougeLsum.

JSON representation
{
  "rougeType": string,
  "useStemmer": boolean,
  "splitSummaries": boolean
}

RougeInstance

Spec for rouge instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

FluencyInput

Input for fluency metric.

Fields
metricSpec object (FluencySpec)

Required. Spec for fluency score metric.

instance object (FluencyInstance)

Required. Fluency instance.

JSON representation
{
  "metricSpec": {
    object (FluencySpec)
  },
  "instance": {
    object (FluencyInstance)
  }
}

FluencySpec

Spec for fluency score metric.

Fields
version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "version": integer
}

FluencyInstance

Spec for fluency instance.

Fields
prediction string

Required. Output of the evaluated model.

JSON representation
{
  "prediction": string
}

CoherenceInput

Input for coherence metric.

Fields
metricSpec object (CoherenceSpec)

Required. Spec for coherence score metric.

instance object (CoherenceInstance)

Required. Coherence instance.

JSON representation
{
  "metricSpec": {
    object (CoherenceSpec)
  },
  "instance": {
    object (CoherenceInstance)
  }
}

CoherenceSpec

Spec for coherence score metric.

Fields
version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "version": integer
}

CoherenceInstance

Spec for coherence instance.

Fields
prediction string

Required. Output of the evaluated model.

JSON representation
{
  "prediction": string
}

SafetyInput

Input for safety metric.

Fields
metricSpec object (SafetySpec)

Required. Spec for safety metric.

instance object (SafetyInstance)

Required. Safety instance.

JSON representation
{
  "metricSpec": {
    object (SafetySpec)
  },
  "instance": {
    object (SafetyInstance)
  }
}

SafetySpec

Spec for safety metric.

Fields
version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "version": integer
}

SafetyInstance

Spec for safety instance.

Fields
prediction string

Required. Output of the evaluated model.

JSON representation
{
  "prediction": string
}

GroundednessInput

Input for groundedness metric.

Fields
metricSpec object (GroundednessSpec)

Required. Spec for groundedness metric.

instance object (GroundednessInstance)

Required. Groundedness instance.

JSON representation
{
  "metricSpec": {
    object (GroundednessSpec)
  },
  "instance": {
    object (GroundednessInstance)
  }
}

GroundednessSpec

Spec for groundedness metric.

Fields
version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "version": integer
}

GroundednessInstance

Spec for groundedness instance.

Fields
prediction string

Required. Output of the evaluated model.

context string

Required. Background information provided in context used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "context": string
}

FulfillmentInput

Input for fulfillment metric.

Fields
metricSpec object (FulfillmentSpec)

Required. Spec for fulfillment score metric.

instance object (FulfillmentInstance)

Required. Fulfillment instance.

JSON representation
{
  "metricSpec": {
    object (FulfillmentSpec)
  },
  "instance": {
    object (FulfillmentInstance)
  }
}

FulfillmentSpec

Spec for fulfillment metric.

Fields
version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "version": integer
}

FulfillmentInstance

Spec for fulfillment instance.

Fields
prediction string

Required. Output of the evaluated model.

instruction string

Required. Inference instruction prompt to compare prediction with.

JSON representation
{
  "prediction": string,
  "instruction": string
}

SummarizationQualityInput

Input for summarization quality metric.

Fields
metricSpec object (SummarizationQualitySpec)

Required. Spec for summarization quality score metric.

instance object (SummarizationQualityInstance)

Required. Summarization quality instance.

JSON representation
{
  "metricSpec": {
    object (SummarizationQualitySpec)
  },
  "instance": {
    object (SummarizationQualityInstance)
  }
}

SummarizationQualitySpec

Spec for summarization quality score metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute summarization quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

SummarizationQualityInstance

Spec for summarization quality instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Required. Summarization prompt for LLM.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

PairwiseSummarizationQualityInput

Input for pairwise summarization quality metric.

Fields
metricSpec object (PairwiseSummarizationQualitySpec)

Required. Spec for pairwise summarization quality score metric.

Required. Pairwise summarization quality instance.

JSON representation
{
  "metricSpec": {
    object (PairwiseSummarizationQualitySpec)
  },
  "instance": {
    object (PairwiseSummarizationQualityInstance)
  }
}

PairwiseSummarizationQualitySpec

Spec for pairwise summarization quality score metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute pairwise summarization quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

PairwiseSummarizationQualityInstance

Spec for pairwise summarization quality instance.

Fields
prediction string

Required. Output of the candidate model.

baselinePrediction string

Required. Output of the baseline model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Required. Summarization prompt for LLM.

JSON representation
{
  "prediction": string,
  "baselinePrediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

SummarizationHelpfulnessInput

Input for summarization helpfulness metric.

Fields
metricSpec object (SummarizationHelpfulnessSpec)

Required. Spec for summarization helpfulness score metric.

Required. Summarization helpfulness instance.

JSON representation
{
  "metricSpec": {
    object (SummarizationHelpfulnessSpec)
  },
  "instance": {
    object (SummarizationHelpfulnessInstance)
  }
}

SummarizationHelpfulnessSpec

Spec for summarization helpfulness score metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute summarization helpfulness.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

SummarizationHelpfulnessInstance

Spec for summarization helpfulness instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Optional. Summarization prompt for LLM.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

SummarizationVerbosityInput

Input for summarization verbosity metric.

Fields
metricSpec object (SummarizationVerbositySpec)

Required. Spec for summarization verbosity score metric.

instance object (SummarizationVerbosityInstance)

Required. Summarization verbosity instance.

JSON representation
{
  "metricSpec": {
    object (SummarizationVerbositySpec)
  },
  "instance": {
    object (SummarizationVerbosityInstance)
  }
}

SummarizationVerbositySpec

Spec for summarization verbosity score metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute summarization verbosity.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

SummarizationVerbosityInstance

Spec for summarization verbosity instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Optional. Summarization prompt for LLM.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

QuestionAnsweringQualityInput

Input for question answering quality metric.

Fields
metricSpec object (QuestionAnsweringQualitySpec)

Required. Spec for question answering quality score metric.

Required. Question answering quality instance.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringQualitySpec)
  },
  "instance": {
    object (QuestionAnsweringQualityInstance)
  }
}

QuestionAnsweringQualitySpec

Spec for question answering quality score metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute question answering quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

QuestionAnsweringQualityInstance

Spec for question answering quality instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to answer the question.

instruction string

Required. Question Answering prompt for LLM.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

PairwiseQuestionAnsweringQualityInput

Input for pairwise question answering quality metric.

Fields

Required. Spec for pairwise question answering quality score metric.

Required. Pairwise question answering quality instance.

JSON representation
{
  "metricSpec": {
    object (PairwiseQuestionAnsweringQualitySpec)
  },
  "instance": {
    object (PairwiseQuestionAnsweringQualityInstance)
  }
}

PairwiseQuestionAnsweringQualitySpec

Spec for pairwise question answering quality score metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute question answering quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

PairwiseQuestionAnsweringQualityInstance

Spec for pairwise question answering quality instance.

Fields
prediction string

Required. Output of the candidate model.

baselinePrediction string

Required. Output of the baseline model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to answer the question.

instruction string

Required. Question Answering prompt for LLM.

JSON representation
{
  "prediction": string,
  "baselinePrediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

QuestionAnsweringRelevanceInput

Input for question answering relevance metric.

Fields
metricSpec object (QuestionAnsweringRelevanceSpec)

Required. Spec for question answering relevance score metric.

Required. Question answering relevance instance.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringRelevanceSpec)
  },
  "instance": {
    object (QuestionAnsweringRelevanceInstance)
  }
}

QuestionAnsweringRelevanceSpec

Spec for question answering relevance metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute question answering relevance.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

QuestionAnsweringRelevanceInstance

Spec for question answering relevance instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Optional. Text provided as context to answer the question.

instruction string

Required. The question asked and other instruction in the inference prompt.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

QuestionAnsweringHelpfulnessInput

Input for question answering helpfulness metric.

Fields
metricSpec object (QuestionAnsweringHelpfulnessSpec)

Required. Spec for question answering helpfulness score metric.

Required. Question answering helpfulness instance.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringHelpfulnessSpec)
  },
  "instance": {
    object (QuestionAnsweringHelpfulnessInstance)
  }
}

QuestionAnsweringHelpfulnessSpec

Spec for question answering helpfulness metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute question answering helpfulness.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

QuestionAnsweringHelpfulnessInstance

Spec for question answering helpfulness instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Optional. Text provided as context to answer the question.

instruction string

Required. The question asked and other instruction in the inference prompt.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

QuestionAnsweringCorrectnessInput

Input for question answering correctness metric.

Fields
metricSpec object (QuestionAnsweringCorrectnessSpec)

Required. Spec for question answering correctness score metric.

Required. Question answering correctness instance.

JSON representation
{
  "metricSpec": {
    object (QuestionAnsweringCorrectnessSpec)
  },
  "instance": {
    object (QuestionAnsweringCorrectnessInstance)
  }
}

QuestionAnsweringCorrectnessSpec

Spec for question answering correctness metric.

Fields
useReference boolean

Optional. Whether to use instance.reference to compute question answering correctness.

version integer

Optional. Which version to use for evaluation.

JSON representation
{
  "useReference": boolean,
  "version": integer
}

QuestionAnsweringCorrectnessInstance

Spec for question answering correctness instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Optional. Text provided as context to answer the question.

instruction string

Required. The question asked and other instruction in the inference prompt.

JSON representation
{
  "prediction": string,
  "reference": string,
  "context": string,
  "instruction": string
}

PointwiseMetricInput

Input for pointwise metric.

Fields
metricSpec object (PointwiseMetricSpec)

Required. Spec for pointwise metric.

instance object (PointwiseMetricInstance)

Required. Pointwise metric instance.

JSON representation
{
  "metricSpec": {
    object (PointwiseMetricSpec)
  },
  "instance": {
    object (PointwiseMetricInstance)
  }
}

PointwiseMetricSpec

Spec for pointwise metric.

Fields
metricPromptTemplate string

Required. Metric prompt template for pointwise metric.

JSON representation
{
  "metricPromptTemplate": string
}

PointwiseMetricInstance

Pointwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.

Fields
Union field instance. Instance for pointwise metric. instance can be only one of the following:
jsonInstance string

Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PointwiseMetricSpec.instance_prompt_template.

JSON representation
{

  // Union field instance can be only one of the following:
  "jsonInstance": string
  // End of list of possible types for union field instance.
}

PairwiseMetricInput

Input for pairwise metric.

Fields
metricSpec object (PairwiseMetricSpec)

Required. Spec for pairwise metric.

instance object (PairwiseMetricInstance)

Required. Pairwise metric instance.

JSON representation
{
  "metricSpec": {
    object (PairwiseMetricSpec)
  },
  "instance": {
    object (PairwiseMetricInstance)
  }
}

PairwiseMetricSpec

Spec for pairwise metric.

Fields
metricPromptTemplate string

Required. Metric prompt template for pairwise metric.

JSON representation
{
  "metricPromptTemplate": string
}

PairwiseMetricInstance

Pairwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.

Fields
Union field instance. Instance for pairwise metric. instance can be only one of the following:
jsonInstance string

Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PairwiseMetricSpec.instance_prompt_template.

JSON representation
{

  // Union field instance can be only one of the following:
  "jsonInstance": string
  // End of list of possible types for union field instance.
}

ToolCallValidInput

Input for tool call valid metric.

Fields
metricSpec object (ToolCallValidSpec)

Required. Spec for tool call valid metric.

instances[] object (ToolCallValidInstance)

Required. Repeated tool call valid instances.

JSON representation
{
  "metricSpec": {
    object (ToolCallValidSpec)
  },
  "instances": [
    {
      object (ToolCallValidInstance)
    }
  ]
}

ToolCallValidSpec

This type has no fields.

Spec for tool call valid metric.

ToolCallValidInstance

Spec for tool call valid instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

ToolNameMatchInput

Input for tool name match metric.

Fields
metricSpec object (ToolNameMatchSpec)

Required. Spec for tool name match metric.

instances[] object (ToolNameMatchInstance)

Required. Repeated tool name match instances.

JSON representation
{
  "metricSpec": {
    object (ToolNameMatchSpec)
  },
  "instances": [
    {
      object (ToolNameMatchInstance)
    }
  ]
}

ToolNameMatchSpec

This type has no fields.

Spec for tool name match metric.

ToolNameMatchInstance

Spec for tool name match instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

ToolParameterKeyMatchInput

Input for tool parameter key match metric.

Fields
metricSpec object (ToolParameterKeyMatchSpec)

Required. Spec for tool parameter key match metric.

instances[] object (ToolParameterKeyMatchInstance)

Required. Repeated tool parameter key match instances.

JSON representation
{
  "metricSpec": {
    object (ToolParameterKeyMatchSpec)
  },
  "instances": [
    {
      object (ToolParameterKeyMatchInstance)
    }
  ]
}

ToolParameterKeyMatchSpec

This type has no fields.

Spec for tool parameter key match metric.

ToolParameterKeyMatchInstance

Spec for tool parameter key match instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

ToolParameterKVMatchInput

Input for tool parameter key value match metric.

Fields
metricSpec object (ToolParameterKVMatchSpec)

Required. Spec for tool parameter key value match metric.

instances[] object (ToolParameterKVMatchInstance)

Required. Repeated tool parameter key value match instances.

JSON representation
{
  "metricSpec": {
    object (ToolParameterKVMatchSpec)
  },
  "instances": [
    {
      object (ToolParameterKVMatchInstance)
    }
  ]
}

ToolParameterKVMatchSpec

Spec for tool parameter key value match metric.

Fields
useStrictStringMatch boolean

Optional. Whether to use STRICT string match on parameter values.

JSON representation
{
  "useStrictStringMatch": boolean
}

ToolParameterKVMatchInstance

Spec for tool parameter key value match instance.

Fields
prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{
  "prediction": string,
  "reference": string
}

ExactMatchResults

Results for exact match metric.

Fields
exactMatchMetricValues[] object (ExactMatchMetricValue)

Output only. Exact match metric values.

JSON representation
{
  "exactMatchMetricValues": [
    {
      object (ExactMatchMetricValue)
    }
  ]
}

ExactMatchMetricValue

Exact match metric value for an instance.

Fields
score number

Output only. Exact match score.

JSON representation
{
  "score": number
}

BleuResults

Results for bleu metric.

Fields
bleuMetricValues[] object (BleuMetricValue)

Output only. Bleu metric values.

JSON representation
{
  "bleuMetricValues": [
    {
      object (BleuMetricValue)
    }
  ]
}

BleuMetricValue

Bleu metric value for an instance.

Fields
score number

Output only. Bleu score.

JSON representation
{
  "score": number
}

RougeResults

Results for rouge metric.

Fields
rougeMetricValues[] object (RougeMetricValue)

Output only. Rouge metric values.

JSON representation
{
  "rougeMetricValues": [
    {
      object (RougeMetricValue)
    }
  ]
}

RougeMetricValue

Rouge metric value for an instance.

Fields
score number

Output only. Rouge score.

JSON representation
{
  "score": number
}

FluencyResult

Spec for fluency result.

Fields
explanation string

Output only. Explanation for fluency score.

score number

Output only. Fluency score.

confidence number

Output only. confidence for fluency score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

CoherenceResult

Spec for coherence result.

Fields
explanation string

Output only. Explanation for coherence score.

score number

Output only. Coherence score.

confidence number

Output only. confidence for coherence score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

SafetyResult

Spec for safety result.

Fields
explanation string

Output only. Explanation for safety score.

score number

Output only. Safety score.

confidence number

Output only. confidence for safety score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

GroundednessResult

Spec for groundedness result.

Fields
explanation string

Output only. Explanation for groundedness score.

score number

Output only. Groundedness score.

confidence number

Output only. confidence for groundedness score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

FulfillmentResult

Spec for fulfillment result.

Fields
explanation string

Output only. Explanation for fulfillment score.

score number

Output only. Fulfillment score.

confidence number

Output only. confidence for fulfillment score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

SummarizationQualityResult

Spec for summarization quality result.

Fields
explanation string

Output only. Explanation for summarization quality score.

score number

Output only. Summarization Quality score.

confidence number

Output only. confidence for summarization quality score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

PairwiseSummarizationQualityResult

Spec for pairwise summarization quality result.

Fields
pairwiseChoice enum (PairwiseChoice)

Output only. Pairwise summarization prediction choice.

explanation string

Output only. Explanation for summarization quality score.

confidence number

Output only. confidence for summarization quality score.

JSON representation
{
  "pairwiseChoice": enum (PairwiseChoice),
  "explanation": string,
  "confidence": number
}

PairwiseChoice

Pairwise prediction autorater preference.

Enums
PAIRWISE_CHOICE_UNSPECIFIED Unspecified prediction choice.
BASELINE baseline prediction wins
CANDIDATE Candidate prediction wins
TIE Winner cannot be determined

SummarizationHelpfulnessResult

Spec for summarization helpfulness result.

Fields
explanation string

Output only. Explanation for summarization helpfulness score.

score number

Output only. Summarization Helpfulness score.

confidence number

Output only. confidence for summarization helpfulness score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

SummarizationVerbosityResult

Spec for summarization verbosity result.

Fields
explanation string

Output only. Explanation for summarization verbosity score.

score number

Output only. Summarization Verbosity score.

confidence number

Output only. confidence for summarization verbosity score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

QuestionAnsweringQualityResult

Spec for question answering quality result.

Fields
explanation string

Output only. Explanation for question answering quality score.

score number

Output only. Question Answering Quality score.

confidence number

Output only. confidence for question answering quality score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

PairwiseQuestionAnsweringQualityResult

Spec for pairwise question answering quality result.

Fields
pairwiseChoice enum (PairwiseChoice)

Output only. Pairwise question answering prediction choice.

explanation string

Output only. Explanation for question answering quality score.

confidence number

Output only. confidence for question answering quality score.

JSON representation
{
  "pairwiseChoice": enum (PairwiseChoice),
  "explanation": string,
  "confidence": number
}

QuestionAnsweringRelevanceResult

Spec for question answering relevance result.

Fields
explanation string

Output only. Explanation for question answering relevance score.

score number

Output only. Question Answering Relevance score.

confidence number

Output only. confidence for question answering relevance score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

QuestionAnsweringHelpfulnessResult

Spec for question answering helpfulness result.

Fields
explanation string

Output only. Explanation for question answering helpfulness score.

score number

Output only. Question Answering Helpfulness score.

confidence number

Output only. confidence for question answering helpfulness score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

QuestionAnsweringCorrectnessResult

Spec for question answering correctness result.

Fields
explanation string

Output only. Explanation for question answering correctness score.

score number

Output only. Question Answering Correctness score.

confidence number

Output only. confidence for question answering correctness score.

JSON representation
{
  "explanation": string,
  "score": number,
  "confidence": number
}

PointwiseMetricResult

Spec for pointwise metric result.

Fields
explanation string

Output only. Explanation for pointwise metric score.

score number

Output only. Pointwise metric score.

JSON representation
{
  "explanation": string,
  "score": number
}

PairwiseMetricResult

Spec for pairwise metric result.

Fields
pairwiseChoice enum (PairwiseChoice)

Output only. Pairwise metric choice.

explanation string

Output only. Explanation for pairwise metric score.

JSON representation
{
  "pairwiseChoice": enum (PairwiseChoice),
  "explanation": string
}

ToolCallValidResults

Results for tool call valid metric.

Fields
toolCallValidMetricValues[] object (ToolCallValidMetricValue)

Output only. Tool call valid metric values.

JSON representation
{
  "toolCallValidMetricValues": [
    {
      object (ToolCallValidMetricValue)
    }
  ]
}

ToolCallValidMetricValue

Tool call valid metric value for an instance.

Fields
score number

Output only. Tool call valid score.

JSON representation
{
  "score": number
}

ToolNameMatchResults

Results for tool name match metric.

Fields
toolNameMatchMetricValues[] object (ToolNameMatchMetricValue)

Output only. Tool name match metric values.

JSON representation
{
  "toolNameMatchMetricValues": [
    {
      object (ToolNameMatchMetricValue)
    }
  ]
}

ToolNameMatchMetricValue

Tool name match metric value for an instance.

Fields
score number

Output only. Tool name match score.

JSON representation
{
  "score": number
}

ToolParameterKeyMatchResults

Results for tool parameter key match metric.

Fields
toolParameterKeyMatchMetricValues[] object (ToolParameterKeyMatchMetricValue)

Output only. Tool parameter key match metric values.

JSON representation
{
  "toolParameterKeyMatchMetricValues": [
    {
      object (ToolParameterKeyMatchMetricValue)
    }
  ]
}

ToolParameterKeyMatchMetricValue

Tool parameter key match metric value for an instance.

Fields
score number

Output only. Tool parameter key match score.

JSON representation
{
  "score": number
}

ToolParameterKVMatchResults

Results for tool parameter key value match metric.

Fields
toolParameterKvMatchMetricValues[] object (ToolParameterKVMatchMetricValue)

Output only. Tool parameter key value match metric values.

JSON representation
{
  "toolParameterKvMatchMetricValues": [
    {
      object (ToolParameterKVMatchMetricValue)
    }
  ]
}

ToolParameterKVMatchMetricValue

Tool parameter key value match metric value for an instance.

Fields
score number

Output only. Tool parameter key value match score.

JSON representation
{
  "score": number
}