Method: projects.locations.evaluateInstances

Evaluates instances based on a given metric.

Endpoint

post https://{endpoint}/v1beta1/{location}:evaluateInstances

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

location string

Required. The resource name of the Location to evaluate the instances. Format: projects/{project}/locations/{location}

Request body

The request body contains data with the following structure:

Fields

metric_inputs Union type

Instances and specs for evaluation metric_inputs can be only one of the following:

exactMatchInput object (ExactMatchInput)

Auto metric instances. Instances and metric spec for exact match metric.

bleuInput object (BleuInput)

Instances and metric spec for bleu metric.

rougeInput object (RougeInput)

Instances and metric spec for rouge metric.

fluencyInput object (FluencyInput)

LLM-based metric instance. General text generation metrics, applicable to other categories. Input for fluency metric.

coherenceInput object (CoherenceInput)

Input for coherence metric.

safetyInput object (SafetyInput)

Input for safety metric.

groundednessInput object (GroundednessInput)

Input for groundedness metric.

fulfillmentInput object (FulfillmentInput)

Input for fulfillment metric.

summarizationQualityInput object (SummarizationQualityInput)

Input for summarization quality metric.

pairwiseSummarizationQualityInput object (PairwiseSummarizationQualityInput)

Input for pairwise summarization quality metric.

summarizationHelpfulnessInput object (SummarizationHelpfulnessInput)

Input for summarization helpfulness metric.

summarizationVerbosityInput object (SummarizationVerbosityInput)

Input for summarization verbosity metric.

questionAnsweringQualityInput object (QuestionAnsweringQualityInput)

Input for question answering quality metric.

pairwiseQuestionAnsweringQualityInput object (PairwiseQuestionAnsweringQualityInput)

Input for pairwise question answering quality metric.

questionAnsweringRelevanceInput object (QuestionAnsweringRelevanceInput)

Input for question answering relevance metric.

questionAnsweringHelpfulnessInput object (QuestionAnsweringHelpfulnessInput)

Input for question answering helpfulness metric.

questionAnsweringCorrectnessInput object (QuestionAnsweringCorrectnessInput)

Input for question answering correctness metric.

pointwiseMetricInput object (PointwiseMetricInput)

Input for pointwise metric.

pairwiseMetricInput object (PairwiseMetricInput)

Input for pairwise metric.

toolCallValidInput object (ToolCallValidInput)

Tool call metric instances. Input for tool call valid metric.

toolNameMatchInput object (ToolNameMatchInput)

Input for tool name match metric.

toolParameterKeyMatchInput object (ToolParameterKeyMatchInput)

Input for tool parameter key match metric.

toolParameterKvMatchInput object (ToolParameterKVMatchInput)

Input for tool parameter key value match metric.

cometInput object (CometInput)

Translation metrics. Input for Comet metric.

metricxInput object (MetricxInput)

Input for Metricx metric.

trajectoryExactMatchInput object (TrajectoryExactMatchInput)

Input for trajectory exact match metric.

trajectoryInOrderMatchInput object (TrajectoryInOrderMatchInput)

Input for trajectory in order match metric.

trajectoryAnyOrderMatchInput object (TrajectoryAnyOrderMatchInput)

Input for trajectory match any order metric.

trajectoryPrecisionInput object (TrajectoryPrecisionInput)

Input for trajectory precision metric.

trajectoryRecallInput object (TrajectoryRecallInput)

Input for trajectory recall metric.

trajectorySingleToolUseInput object (TrajectorySingleToolUseInput)

Input for trajectory single tool use metric.

Example request

Python

import pandas as pd

import vertexai
from vertexai.preview.evaluation import EvalTask, MetricPromptTemplateExamples

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

eval_dataset = pd.DataFrame(
    {
        "instruction": [
            "Summarize the text in one sentence.",
            "Summarize the text such that a five-year-old can understand.",
        ],
        "context": [
            """As part of a comprehensive initiative to tackle urban congestion and foster
            sustainable urban living, a major city has revealed ambitious plans for an
            extensive overhaul of its public transportation system. The project aims not
            only to improve the efficiency and reliability of public transit but also to
            reduce the city\'s carbon footprint and promote eco-friendly commuting options.
            City officials anticipate that this strategic investment will enhance
            accessibility for residents and visitors alike, ushering in a new era of
            efficient, environmentally conscious urban transportation.""",
            """A team of archaeologists has unearthed ancient artifacts shedding light on a
            previously unknown civilization. The findings challenge existing historical
            narratives and provide valuable insights into human history.""",
        ],
        "response": [
            "A major city is revamping its public transportation system to fight congestion, reduce emissions, and make getting around greener and easier.",
            "Some people who dig for old things found some very special tools and objects that tell us about people who lived a long, long time ago! What they found is like a new puzzle piece that helps us understand how people used to live.",
        ],
    }
)

eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[
        MetricPromptTemplateExamples.Pointwise.SUMMARIZATION_QUALITY,
        MetricPromptTemplateExamples.Pointwise.GROUNDEDNESS,
        MetricPromptTemplateExamples.Pointwise.VERBOSITY,
        MetricPromptTemplateExamples.Pointwise.INSTRUCTION_FOLLOWING,
    ],
)

prompt_template = (
    "Instruction: {instruction}. Article: {context}. Summary: {response}"
)
result = eval_task.evaluate(prompt_template=prompt_template)

print("Summary Metrics:\n")

for key, value in result.summary_metrics.items():
    print(f"{key}: \t{value}")

print("\n\nMetrics Table:\n")
print(result.metrics_table)
# Example response:
# Summary Metrics:
# row_count:      2
# summarization_quality/mean:     3.5
# summarization_quality/std:      2.1213203435596424
# ...
create_evaluation_task_example.py

Response body

Response message for EvaluationService.EvaluateInstances.

If successful, the response body contains data with the following structure:

Fields

evaluation_results Union type

Evaluation results will be served in the same order as presented in EvaluationRequest.instances. evaluation_results can be only one of the following:

exactMatchResults object (ExactMatchResults)

Auto metric evaluation results. Results for exact match metric.

bleuResults object (BleuResults)

Results for bleu metric.

rougeResults object (RougeResults)

Results for rouge metric.

fluencyResult object (FluencyResult)

LLM-based metric evaluation result. General text generation metrics, applicable to other categories. result for fluency metric.

coherenceResult object (CoherenceResult)

result for coherence metric.

safetyResult object (SafetyResult)

result for safety metric.

groundednessResult object (GroundednessResult)

result for groundedness metric.

fulfillmentResult object (FulfillmentResult)

result for fulfillment metric.

summarizationQualityResult object (SummarizationQualityResult)

Summarization only metrics. result for summarization quality metric.

pairwiseSummarizationQualityResult object (PairwiseSummarizationQualityResult)

result for pairwise summarization quality metric.

summarizationHelpfulnessResult object (SummarizationHelpfulnessResult)

result for summarization helpfulness metric.

summarizationVerbosityResult object (SummarizationVerbosityResult)

result for summarization verbosity metric.

questionAnsweringQualityResult object (QuestionAnsweringQualityResult)

Question answering only metrics. result for question answering quality metric.

pairwiseQuestionAnsweringQualityResult object (PairwiseQuestionAnsweringQualityResult)

result for pairwise question answering quality metric.

questionAnsweringRelevanceResult object (QuestionAnsweringRelevanceResult)

result for question answering relevance metric.

questionAnsweringHelpfulnessResult object (QuestionAnsweringHelpfulnessResult)

result for question answering helpfulness metric.

questionAnsweringCorrectnessResult object (QuestionAnsweringCorrectnessResult)

result for question answering correctness metric.

pointwiseMetricResult object (PointwiseMetricResult)

Generic metrics. result for pointwise metric.

pairwiseMetricResult object (PairwiseMetricResult)

result for pairwise metric.

toolCallValidResults object (ToolCallValidResults)

Tool call metrics. Results for tool call valid metric.

toolNameMatchResults object (ToolNameMatchResults)

Results for tool name match metric.

toolParameterKeyMatchResults object (ToolParameterKeyMatchResults)

Results for tool parameter key match metric.

toolParameterKvMatchResults object (ToolParameterKVMatchResults)

Results for tool parameter key value match metric.

cometResult object (CometResult)

Translation metrics. result for Comet metric.

metricxResult object (MetricxResult)

result for Metricx metric.

trajectoryExactMatchResults object (TrajectoryExactMatchResults)

result for trajectory exact match metric.

trajectoryInOrderMatchResults object (TrajectoryInOrderMatchResults)

result for trajectory in order match metric.

trajectoryAnyOrderMatchResults object (TrajectoryAnyOrderMatchResults)

result for trajectory any order match metric.

trajectoryPrecisionResults object (TrajectoryPrecisionResults)

result for trajectory precision metric.

trajectoryRecallResults object (TrajectoryRecallResults)

Results for trajectory recall metric.

trajectorySingleToolUseResults object (TrajectorySingleToolUseResults)

Results for trajectory single tool use metric.

JSON representation

JSON representation
{ // evaluation_results "exactMatchResults": { object (`ExactMatchResults`) }, "bleuResults": { object (`BleuResults`) }, "rougeResults": { object (`RougeResults`) }, "fluencyResult": { object (`FluencyResult`) }, "coherenceResult": { object (`CoherenceResult`) }, "safetyResult": { object (`SafetyResult`) }, "groundednessResult": { object (`GroundednessResult`) }, "fulfillmentResult": { object (`FulfillmentResult`) }, "summarizationQualityResult": { object (`SummarizationQualityResult`) }, "pairwiseSummarizationQualityResult": { object (`PairwiseSummarizationQualityResult`) }, "summarizationHelpfulnessResult": { object (`SummarizationHelpfulnessResult`) }, "summarizationVerbosityResult": { object (`SummarizationVerbosityResult`) }, "questionAnsweringQualityResult": { object (`QuestionAnsweringQualityResult`) }, "pairwiseQuestionAnsweringQualityResult": { object (`PairwiseQuestionAnsweringQualityResult`) }, "questionAnsweringRelevanceResult": { object (`QuestionAnsweringRelevanceResult`) }, "questionAnsweringHelpfulnessResult": { object (`QuestionAnsweringHelpfulnessResult`) }, "questionAnsweringCorrectnessResult": { object (`QuestionAnsweringCorrectnessResult`) }, "pointwiseMetricResult": { object (`PointwiseMetricResult`) }, "pairwiseMetricResult": { object (`PairwiseMetricResult`) }, "toolCallValidResults": { object (`ToolCallValidResults`) }, "toolNameMatchResults": { object (`ToolNameMatchResults`) }, "toolParameterKeyMatchResults": { object (`ToolParameterKeyMatchResults`) }, "toolParameterKvMatchResults": { object (`ToolParameterKVMatchResults`) }, "cometResult": { object (`CometResult`) }, "metricxResult": { object (`MetricxResult`) }, "trajectoryExactMatchResults": { object (`TrajectoryExactMatchResults`) }, "trajectoryInOrderMatchResults": { object (`TrajectoryInOrderMatchResults`) }, "trajectoryAnyOrderMatchResults": { object (`TrajectoryAnyOrderMatchResults`) }, "trajectoryPrecisionResults": { object (`TrajectoryPrecisionResults`) }, "trajectoryRecallResults": { object (`TrajectoryRecallResults`) }, "trajectorySingleToolUseResults": { object (`TrajectorySingleToolUseResults`) } // Union type }

{

  // evaluation_results
  "exactMatchResults": {
    object (ExactMatchResults)
  },
  "bleuResults": {
    object (BleuResults)
  },
  "rougeResults": {
    object (RougeResults)
  },
  "fluencyResult": {
    object (FluencyResult)
  },
  "coherenceResult": {
    object (CoherenceResult)
  },
  "safetyResult": {
    object (SafetyResult)
  },
  "groundednessResult": {
    object (GroundednessResult)
  },
  "fulfillmentResult": {
    object (FulfillmentResult)
  },
  "summarizationQualityResult": {
    object (SummarizationQualityResult)
  },
  "pairwiseSummarizationQualityResult": {
    object (PairwiseSummarizationQualityResult)
  },
  "summarizationHelpfulnessResult": {
    object (SummarizationHelpfulnessResult)
  },
  "summarizationVerbosityResult": {
    object (SummarizationVerbosityResult)
  },
  "questionAnsweringQualityResult": {
    object (QuestionAnsweringQualityResult)
  },
  "pairwiseQuestionAnsweringQualityResult": {
    object (PairwiseQuestionAnsweringQualityResult)
  },
  "questionAnsweringRelevanceResult": {
    object (QuestionAnsweringRelevanceResult)
  },
  "questionAnsweringHelpfulnessResult": {
    object (QuestionAnsweringHelpfulnessResult)
  },
  "questionAnsweringCorrectnessResult": {
    object (QuestionAnsweringCorrectnessResult)
  },
  "pointwiseMetricResult": {
    object (PointwiseMetricResult)
  },
  "pairwiseMetricResult": {
    object (PairwiseMetricResult)
  },
  "toolCallValidResults": {
    object (ToolCallValidResults)
  },
  "toolNameMatchResults": {
    object (ToolNameMatchResults)
  },
  "toolParameterKeyMatchResults": {
    object (ToolParameterKeyMatchResults)
  },
  "toolParameterKvMatchResults": {
    object (ToolParameterKVMatchResults)
  },
  "cometResult": {
    object (CometResult)
  },
  "metricxResult": {
    object (MetricxResult)
  },
  "trajectoryExactMatchResults": {
    object (TrajectoryExactMatchResults)
  },
  "trajectoryInOrderMatchResults": {
    object (TrajectoryInOrderMatchResults)
  },
  "trajectoryAnyOrderMatchResults": {
    object (TrajectoryAnyOrderMatchResults)
  },
  "trajectoryPrecisionResults": {
    object (TrajectoryPrecisionResults)
  },
  "trajectoryRecallResults": {
    object (TrajectoryRecallResults)
  },
  "trajectorySingleToolUseResults": {
    object (TrajectorySingleToolUseResults)
  }
  // Union type
}

ExactMatchInput

Input for exact match metric.

Fields

metricSpec object (ExactMatchSpec)

Required. Spec for exact match metric.

instances[] object (ExactMatchInstance)

Required. Repeated exact match instances.

JSON representation
{ "metricSpec": { object (`ExactMatchSpec`) }, "instances": [ { object (`ExactMatchInstance`) } ] }

ExactMatchSpec

This type has no fields.

Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.

ExactMatchInstance

Spec for exact match instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

BleuInput

Input for bleu metric.

Fields

metricSpec object (BleuSpec)

Required. Spec for bleu score metric.

instances[] object (BleuInstance)

Required. Repeated bleu instances.

JSON representation
{ "metricSpec": { object (`BleuSpec`) }, "instances": [ { object (`BleuInstance`) } ] }

BleuSpec

Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.

Fields

useEffectiveOrder boolean

Optional. Whether to useEffectiveOrder to compute bleu score.

JSON representation
{ "useEffectiveOrder": boolean }

BleuInstance

Spec for bleu instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

RougeInput

Input for rouge metric.

Fields

metricSpec object (RougeSpec)

Required. Spec for rouge score metric.

instances[] object (RougeInstance)

Required. Repeated rouge instances.

JSON representation
{ "metricSpec": { object (`RougeSpec`) }, "instances": [ { object (`RougeInstance`) } ] }

RougeSpec

Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.

Fields

rougeType string

Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.

useStemmer boolean

Optional. Whether to use stemmer to compute rouge score.

splitSummaries boolean

Optional. Whether to split summaries while using rougeLsum.

JSON representation
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean }

RougeInstance

Spec for rouge instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

FluencyInput

Input for fluency metric.

Fields

metricSpec object (FluencySpec)

Required. Spec for fluency score metric.

instance object (FluencyInstance)

Required. Fluency instance.

JSON representation
{ "metricSpec": { object (`FluencySpec`) }, "instance": { object (`FluencyInstance`) } }

FluencySpec

Spec for fluency score metric.

Fields

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "version": integer }

FluencyInstance

Spec for fluency instance.

Fields

prediction string

Required. Output of the evaluated model.

JSON representation
{ "prediction": string }

CoherenceInput

Input for coherence metric.

Fields

metricSpec object (CoherenceSpec)

Required. Spec for coherence score metric.

instance object (CoherenceInstance)

Required. Coherence instance.

JSON representation
{ "metricSpec": { object (`CoherenceSpec`) }, "instance": { object (`CoherenceInstance`) } }

CoherenceSpec

Spec for coherence score metric.

Fields

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "version": integer }

CoherenceInstance

Spec for coherence instance.

Fields

prediction string

Required. Output of the evaluated model.

JSON representation
{ "prediction": string }

SafetyInput

Input for safety metric.

Fields

metricSpec object (SafetySpec)

Required. Spec for safety metric.

instance object (SafetyInstance)

Required. Safety instance.

JSON representation
{ "metricSpec": { object (`SafetySpec`) }, "instance": { object (`SafetyInstance`) } }

SafetySpec

Spec for safety metric.

Fields

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "version": integer }

SafetyInstance

Spec for safety instance.

Fields

prediction string

Required. Output of the evaluated model.

JSON representation
{ "prediction": string }

GroundednessInput

Input for groundedness metric.

Fields

metricSpec object (GroundednessSpec)

Required. Spec for groundedness metric.

instance object (GroundednessInstance)

Required. Groundedness instance.

JSON representation
{ "metricSpec": { object (`GroundednessSpec`) }, "instance": { object (`GroundednessInstance`) } }

GroundednessSpec

Spec for groundedness metric.

Fields

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "version": integer }

GroundednessInstance

Spec for groundedness instance.

Fields

prediction string

Required. Output of the evaluated model.

context string

Required. Background information provided in context used to compare against the prediction.

JSON representation
{ "prediction": string, "context": string }

FulfillmentInput

Input for fulfillment metric.

Fields

metricSpec object (FulfillmentSpec)

Required. Spec for fulfillment score metric.

instance object (FulfillmentInstance)

Required. Fulfillment instance.

JSON representation
{ "metricSpec": { object (`FulfillmentSpec`) }, "instance": { object (`FulfillmentInstance`) } }

FulfillmentSpec

Spec for fulfillment metric.

Fields

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "version": integer }

FulfillmentInstance

Spec for fulfillment instance.

Fields

prediction string

Required. Output of the evaluated model.

instruction string

Required. Inference instruction prompt to compare prediction with.

JSON representation
{ "prediction": string, "instruction": string }

SummarizationQualityInput

Input for summarization quality metric.

Fields

metricSpec object (SummarizationQualitySpec)

Required. Spec for summarization quality score metric.

instance object (SummarizationQualityInstance)

Required. Summarization quality instance.

JSON representation
{ "metricSpec": { object (`SummarizationQualitySpec`) }, "instance": { object (`SummarizationQualityInstance`) } }

SummarizationQualitySpec

Spec for summarization quality score metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute summarization quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

SummarizationQualityInstance

Spec for summarization quality instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Required. Summarization prompt for LLM.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

PairwiseSummarizationQualityInput

Input for pairwise summarization quality metric.

Fields

metricSpec object (PairwiseSummarizationQualitySpec)

Required. Spec for pairwise summarization quality score metric.

instance object (PairwiseSummarizationQualityInstance)

Required. Pairwise summarization quality instance.

JSON representation
{ "metricSpec": { object (`PairwiseSummarizationQualitySpec`) }, "instance": { object (`PairwiseSummarizationQualityInstance`) } }

PairwiseSummarizationQualitySpec

Spec for pairwise summarization quality score metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute pairwise summarization quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

PairwiseSummarizationQualityInstance

Spec for pairwise summarization quality instance.

Fields

prediction string

Required. Output of the candidate model.

baselinePrediction string

Required. Output of the baseline model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Required. Summarization prompt for LLM.

JSON representation
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string }

SummarizationHelpfulnessInput

Input for summarization helpfulness metric.

Fields

metricSpec object (SummarizationHelpfulnessSpec)

Required. Spec for summarization helpfulness score metric.

instance object (SummarizationHelpfulnessInstance)

Required. Summarization helpfulness instance.

JSON representation
{ "metricSpec": { object (`SummarizationHelpfulnessSpec`) }, "instance": { object (`SummarizationHelpfulnessInstance`) } }

SummarizationHelpfulnessSpec

Spec for summarization helpfulness score metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute summarization helpfulness.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

SummarizationHelpfulnessInstance

Spec for summarization helpfulness instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Optional. Summarization prompt for LLM.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

SummarizationVerbosityInput

Input for summarization verbosity metric.

Fields

metricSpec object (SummarizationVerbositySpec)

Required. Spec for summarization verbosity score metric.

instance object (SummarizationVerbosityInstance)

Required. Summarization verbosity instance.

JSON representation
{ "metricSpec": { object (`SummarizationVerbositySpec`) }, "instance": { object (`SummarizationVerbosityInstance`) } }

SummarizationVerbositySpec

Spec for summarization verbosity score metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute summarization verbosity.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

SummarizationVerbosityInstance

Spec for summarization verbosity instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to be summarized.

instruction string

Optional. Summarization prompt for LLM.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

QuestionAnsweringQualityInput

Input for question answering quality metric.

Fields

metricSpec object (QuestionAnsweringQualitySpec)

Required. Spec for question answering quality score metric.

instance object (QuestionAnsweringQualityInstance)

Required. Question answering quality instance.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringQualitySpec`) }, "instance": { object (`QuestionAnsweringQualityInstance`) } }

QuestionAnsweringQualitySpec

Spec for question answering quality score metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute question answering quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

QuestionAnsweringQualityInstance

Spec for question answering quality instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to answer the question.

instruction string

Required. Question Answering prompt for LLM.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

PairwiseQuestionAnsweringQualityInput

Input for pairwise question answering quality metric.

Fields

metricSpec object (PairwiseQuestionAnsweringQualitySpec)

Required. Spec for pairwise question answering quality score metric.

instance object (PairwiseQuestionAnsweringQualityInstance)

Required. Pairwise question answering quality instance.

JSON representation
{ "metricSpec": { object (`PairwiseQuestionAnsweringQualitySpec`) }, "instance": { object (`PairwiseQuestionAnsweringQualityInstance`) } }

PairwiseQuestionAnsweringQualitySpec

Spec for pairwise question answering quality score metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute question answering quality.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

PairwiseQuestionAnsweringQualityInstance

Spec for pairwise question answering quality instance.

Fields

prediction string

Required. Output of the candidate model.

baselinePrediction string

Required. Output of the baseline model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Required. Text to answer the question.

instruction string

Required. Question Answering prompt for LLM.

JSON representation
{ "prediction": string, "baselinePrediction": string, "reference": string, "context": string, "instruction": string }

QuestionAnsweringRelevanceInput

Input for question answering relevance metric.

Fields

metricSpec object (QuestionAnsweringRelevanceSpec)

Required. Spec for question answering relevance score metric.

instance object (QuestionAnsweringRelevanceInstance)

Required. Question answering relevance instance.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringRelevanceSpec`) }, "instance": { object (`QuestionAnsweringRelevanceInstance`) } }

QuestionAnsweringRelevanceSpec

Spec for question answering relevance metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute question answering relevance.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

QuestionAnsweringRelevanceInstance

Spec for question answering relevance instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Optional. Text provided as context to answer the question.

instruction string

Required. The question asked and other instruction in the inference prompt.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

QuestionAnsweringHelpfulnessInput

Input for question answering helpfulness metric.

Fields

metricSpec object (QuestionAnsweringHelpfulnessSpec)

Required. Spec for question answering helpfulness score metric.

instance object (QuestionAnsweringHelpfulnessInstance)

Required. Question answering helpfulness instance.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringHelpfulnessSpec`) }, "instance": { object (`QuestionAnsweringHelpfulnessInstance`) } }

QuestionAnsweringHelpfulnessSpec

Spec for question answering helpfulness metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute question answering helpfulness.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

QuestionAnsweringHelpfulnessInstance

Spec for question answering helpfulness instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Optional. Text provided as context to answer the question.

instruction string

Required. The question asked and other instruction in the inference prompt.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

QuestionAnsweringCorrectnessInput

Input for question answering correctness metric.

Fields

metricSpec object (QuestionAnsweringCorrectnessSpec)

Required. Spec for question answering correctness score metric.

instance object (QuestionAnsweringCorrectnessInstance)

Required. Question answering correctness instance.

JSON representation
{ "metricSpec": { object (`QuestionAnsweringCorrectnessSpec`) }, "instance": { object (`QuestionAnsweringCorrectnessInstance`) } }

QuestionAnsweringCorrectnessSpec

Spec for question answering correctness metric.

Fields

useReference boolean

Optional. Whether to use instance.reference to compute question answering correctness.

version integer

Optional. Which version to use for evaluation.

JSON representation
{ "useReference": boolean, "version": integer }

QuestionAnsweringCorrectnessInstance

Spec for question answering correctness instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

context string

Optional. Text provided as context to answer the question.

instruction string

Required. The question asked and other instruction in the inference prompt.

JSON representation
{ "prediction": string, "reference": string, "context": string, "instruction": string }

PointwiseMetricInput

Input for pointwise metric.

Fields

metricSpec object (PointwiseMetricSpec)

Required. Spec for pointwise metric.

instance object (PointwiseMetricInstance)

Required. Pointwise metric instance.

JSON representation
{ "metricSpec": { object (`PointwiseMetricSpec`) }, "instance": { object (`PointwiseMetricInstance`) } }

PointwiseMetricSpec

Spec for pointwise metric.

Fields

metricPromptTemplate string

Required. Metric prompt template for pointwise metric.

JSON representation
{ "metricPromptTemplate": string }

PointwiseMetricInstance

Pointwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.

Fields

instance Union type

Instance for pointwise metric. instance can be only one of the following:

jsonInstance string

Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PointwiseMetricSpec.instance_prompt_template.

JSON representation
{ // instance "jsonInstance": string // Union type }

PairwiseMetricInput

Input for pairwise metric.

Fields

metricSpec object (PairwiseMetricSpec)

Required. Spec for pairwise metric.

instance object (PairwiseMetricInstance)

Required. Pairwise metric instance.

JSON representation
{ "metricSpec": { object (`PairwiseMetricSpec`) }, "instance": { object (`PairwiseMetricInstance`) } }

PairwiseMetricSpec

Spec for pairwise metric.

Fields

metricPromptTemplate string

Required. Metric prompt template for pairwise metric.

JSON representation
{ "metricPromptTemplate": string }

PairwiseMetricInstance

Pairwise metric instance. Usually one instance corresponds to one row in an evaluation dataset.

Fields

instance Union type

Instance for pairwise metric. instance can be only one of the following:

jsonInstance string

Instance specified as a json string. String key-value pairs are expected in the jsonInstance to render PairwiseMetricSpec.instance_prompt_template.

JSON representation
{ // instance "jsonInstance": string // Union type }

ToolCallValidInput

Input for tool call valid metric.

Fields

metricSpec object (ToolCallValidSpec)

Required. Spec for tool call valid metric.

instances[] object (ToolCallValidInstance)

Required. Repeated tool call valid instances.

JSON representation
{ "metricSpec": { object (`ToolCallValidSpec`) }, "instances": [ { object (`ToolCallValidInstance`) } ] }

ToolCallValidSpec

This type has no fields.

Spec for tool call valid metric.

ToolCallValidInstance

Spec for tool call valid instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

ToolNameMatchInput

Input for tool name match metric.

Fields

metricSpec object (ToolNameMatchSpec)

Required. Spec for tool name match metric.

instances[] object (ToolNameMatchInstance)

Required. Repeated tool name match instances.

JSON representation
{ "metricSpec": { object (`ToolNameMatchSpec`) }, "instances": [ { object (`ToolNameMatchInstance`) } ] }

ToolNameMatchSpec

This type has no fields.

Spec for tool name match metric.

ToolNameMatchInstance

Spec for tool name match instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

ToolParameterKeyMatchInput

Input for tool parameter key match metric.

Fields

metricSpec object (ToolParameterKeyMatchSpec)

Required. Spec for tool parameter key match metric.

instances[] object (ToolParameterKeyMatchInstance)

Required. Repeated tool parameter key match instances.

JSON representation
{ "metricSpec": { object (`ToolParameterKeyMatchSpec`) }, "instances": [ { object (`ToolParameterKeyMatchInstance`) } ] }

ToolParameterKeyMatchSpec

This type has no fields.

Spec for tool parameter key match metric.

ToolParameterKeyMatchInstance

Spec for tool parameter key match instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

ToolParameterKVMatchInput

Input for tool parameter key value match metric.

Fields

metricSpec object (ToolParameterKVMatchSpec)

Required. Spec for tool parameter key value match metric.

instances[] object (ToolParameterKVMatchInstance)

Required. Repeated tool parameter key value match instances.

JSON representation
{ "metricSpec": { object (`ToolParameterKVMatchSpec`) }, "instances": [ { object (`ToolParameterKVMatchInstance`) } ] }

ToolParameterKVMatchSpec

Spec for tool parameter key value match metric.

Fields

useStrictStringMatch boolean

Optional. Whether to use STRICT string match on parameter values.

JSON representation
{ "useStrictStringMatch": boolean }

ToolParameterKVMatchInstance

Spec for tool parameter key value match instance.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Required. Ground truth used to compare against the prediction.

JSON representation
{ "prediction": string, "reference": string }

CometInput

Input for Comet metric.

Fields

metricSpec object (CometSpec)

Required. Spec for comet metric.

instance object (CometInstance)

Required. Comet instance.

JSON representation
{ "metricSpec": { object (`CometSpec`) }, "instance": { object (`CometInstance`) } }

CometSpec

Spec for Comet metric.

Fields

sourceLanguage string

Optional. Source language in BCP-47 format.

targetLanguage string

Optional. Target language in BCP-47 format. Covers both prediction and reference.

version enum (CometVersion)

Required. Which version to use for evaluation.

JSON representation
{ "sourceLanguage": string, "targetLanguage": string, "version": enum (`CometVersion`) }

CometVersion

Comet version options.

Enums
`COMET_VERSION_UNSPECIFIED`	Comet version unspecified.
`COMET_22_SRC_REF`	Comet 22 for translation + source + reference (source-reference-combined).

CometInstance

Spec for Comet instance - The fields used for evaluation are dependent on the comet version.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

source string

Optional. Source text in original language.

JSON representation
{ "prediction": string, "reference": string, "source": string }

MetricxInput

Input for MetricX metric.

Fields

metricSpec object (MetricxSpec)

Required. Spec for Metricx metric.

instance object (MetricxInstance)

Required. Metricx instance.

JSON representation
{ "metricSpec": { object (`MetricxSpec`) }, "instance": { object (`MetricxInstance`) } }

MetricxSpec

Spec for MetricX metric.

Fields

sourceLanguage string

Optional. Source language in BCP-47 format.

targetLanguage string

Optional. Target language in BCP-47 format. Covers both prediction and reference.

version enum (MetricxVersion)

Required. Which version to use for evaluation.

JSON representation
{ "sourceLanguage": string, "targetLanguage": string, "version": enum (`MetricxVersion`) }

MetricxVersion

MetricX version options.

Enums
`METRICX_VERSION_UNSPECIFIED`	MetricX version unspecified.
`METRICX_24_REF`	MetricX 2024 (2.6) for translation + reference (reference-based).
`METRICX_24_SRC`	MetricX 2024 (2.6) for translation + source (QE).
`METRICX_24_SRC_REF`	MetricX 2024 (2.6) for translation + source + reference (source-reference-combined).

MetricxInstance

Spec for MetricX instance - The fields used for evaluation are dependent on the MetricX version.

Fields

prediction string

Required. Output of the evaluated model.

reference string

Optional. Ground truth used to compare against the prediction.

source string

Optional. Source text in original language.

JSON representation
{ "prediction": string, "reference": string, "source": string }

TrajectoryExactMatchInput

Instances and metric spec for TrajectoryExactMatch metric.

Fields

metricSpec object (TrajectoryExactMatchSpec)

Required. Spec for TrajectoryExactMatch metric.

instances[] object (TrajectoryExactMatchInstance)

Required. Repeated TrajectoryExactMatch instance.

JSON representation
{ "metricSpec": { object (`TrajectoryExactMatchSpec`) }, "instances": [ { object (`TrajectoryExactMatchInstance`) } ] }

TrajectoryExactMatchSpec

This type has no fields.

Spec for TrajectoryExactMatch metric - returns 1 if tool calls in the reference trajectory exactly match the predicted trajectory, else 0.

TrajectoryExactMatchInstance

Spec for TrajectoryExactMatch instance.

Fields

predictedTrajectory object (Trajectory)

Required. Spec for predicted tool call trajectory.

referenceTrajectory object (Trajectory)

Required. Spec for reference tool call trajectory.

JSON representation
{ "predictedTrajectory": { object (`Trajectory`) }, "referenceTrajectory": { object (`Trajectory`) } }

Trajectory

Spec for trajectory.

Fields

toolCalls[] object (ToolCall)

Required. Tool calls in the trajectory.

JSON representation
{ "toolCalls": [ { object (`ToolCall`) } ] }

ToolCall

Spec for tool call.

Fields

toolName string

Required. Spec for tool name

toolInput string

Optional. Spec for tool input

JSON representation
{ "toolName": string, "toolInput": string }

TrajectoryInOrderMatchInput

Instances and metric spec for TrajectoryInOrderMatch metric.

Fields

metricSpec object (TrajectoryInOrderMatchSpec)

Required. Spec for TrajectoryInOrderMatch metric.

instances[] object (TrajectoryInOrderMatchInstance)

Required. Repeated TrajectoryInOrderMatch instance.

JSON representation
{ "metricSpec": { object (`TrajectoryInOrderMatchSpec`) }, "instances": [ { object (`TrajectoryInOrderMatchInstance`) } ] }

TrajectoryInOrderMatchSpec

This type has no fields.

Spec for TrajectoryInOrderMatch metric - returns 1 if tool calls in the reference trajectory appear in the predicted trajectory in the same order, else 0.

TrajectoryInOrderMatchInstance

Spec for TrajectoryInOrderMatch instance.

Fields

predictedTrajectory object (Trajectory)

Required. Spec for predicted tool call trajectory.

referenceTrajectory object (Trajectory)

Required. Spec for reference tool call trajectory.

JSON representation
{ "predictedTrajectory": { object (`Trajectory`) }, "referenceTrajectory": { object (`Trajectory`) } }

TrajectoryAnyOrderMatchInput

Instances and metric spec for TrajectoryAnyOrderMatch metric.

Fields

metricSpec object (TrajectoryAnyOrderMatchSpec)

Required. Spec for TrajectoryAnyOrderMatch metric.

instances[] object (TrajectoryAnyOrderMatchInstance)

Required. Repeated TrajectoryAnyOrderMatch instance.

JSON representation
{ "metricSpec": { object (`TrajectoryAnyOrderMatchSpec`) }, "instances": [ { object (`TrajectoryAnyOrderMatchInstance`) } ] }

TrajectoryAnyOrderMatchSpec

This type has no fields.

Spec for TrajectoryAnyOrderMatch metric - returns 1 if all tool calls in the reference trajectory appear in the predicted trajectory in any order, else 0.

TrajectoryAnyOrderMatchInstance

Spec for TrajectoryAnyOrderMatch instance.

Fields

predictedTrajectory object (Trajectory)

Required. Spec for predicted tool call trajectory.

referenceTrajectory object (Trajectory)

Required. Spec for reference tool call trajectory.

JSON representation
{ "predictedTrajectory": { object (`Trajectory`) }, "referenceTrajectory": { object (`Trajectory`) } }

TrajectoryPrecisionInput

Instances and metric spec for TrajectoryPrecision metric.

Fields

metricSpec object (TrajectoryPrecisionSpec)

Required. Spec for TrajectoryPrecision metric.

instances[] object (TrajectoryPrecisionInstance)

Required. Repeated TrajectoryPrecision instance.

JSON representation
{ "metricSpec": { object (`TrajectoryPrecisionSpec`) }, "instances": [ { object (`TrajectoryPrecisionInstance`) } ] }

TrajectoryPrecisionSpec

This type has no fields.

Spec for TrajectoryPrecision metric - returns a float score based on average precision of individual tool calls.

TrajectoryPrecisionInstance

Spec for TrajectoryPrecision instance.

Fields

predictedTrajectory object (Trajectory)

Required. Spec for predicted tool call trajectory.

referenceTrajectory object (Trajectory)

Required. Spec for reference tool call trajectory.

JSON representation
{ "predictedTrajectory": { object (`Trajectory`) }, "referenceTrajectory": { object (`Trajectory`) } }

TrajectoryRecallInput

Instances and metric spec for TrajectoryRecall metric.

Fields

metricSpec object (TrajectoryRecallSpec)

Required. Spec for TrajectoryRecall metric.

instances[] object (TrajectoryRecallInstance)

Required. Repeated TrajectoryRecall instance.

JSON representation
{ "metricSpec": { object (`TrajectoryRecallSpec`) }, "instances": [ { object (`TrajectoryRecallInstance`) } ] }

TrajectoryRecallSpec

This type has no fields.

Spec for TrajectoryRecall metric - returns a float score based on average recall of individual tool calls.

TrajectoryRecallInstance

Spec for TrajectoryRecall instance.

Fields

predictedTrajectory object (Trajectory)

Required. Spec for predicted tool call trajectory.

referenceTrajectory object (Trajectory)

Required. Spec for reference tool call trajectory.

JSON representation
{ "predictedTrajectory": { object (`Trajectory`) }, "referenceTrajectory": { object (`Trajectory`) } }

TrajectorySingleToolUseInput

Instances and metric spec for TrajectorySingleToolUse metric.

Fields

metricSpec object (TrajectorySingleToolUseSpec)

Required. Spec for TrajectorySingleToolUse metric.

instances[] object (TrajectorySingleToolUseInstance)

Required. Repeated TrajectorySingleToolUse instance.

JSON representation
{ "metricSpec": { object (`TrajectorySingleToolUseSpec`) }, "instances": [ { object (`TrajectorySingleToolUseInstance`) } ] }

TrajectorySingleToolUseSpec

Spec for TrajectorySingleToolUse metric - returns 1 if tool is present in the predicted trajectory, else 0.

Fields

toolName string

Required. Spec for tool name to be checked for in the predicted trajectory.

JSON representation
{ "toolName": string }

TrajectorySingleToolUseInstance

Spec for TrajectorySingleToolUse instance.

Fields

predictedTrajectory object (Trajectory)

Required. Spec for predicted tool call trajectory.

JSON representation
{ "predictedTrajectory": { object (`Trajectory`) } }

ExactMatchResults

Results for exact match metric.

Fields

exactMatchMetricValues[] object (ExactMatchMetricValue)

Output only. Exact match metric values.

JSON representation
{ "exactMatchMetricValues": [ { object (`ExactMatchMetricValue`) } ] }

ExactMatchMetricValue

Exact match metric value for an instance.

Fields

score number

Output only. Exact match score.

JSON representation
{ "score": number }

BleuResults

Results for bleu metric.

Fields

bleuMetricValues[] object (BleuMetricValue)

Output only. Bleu metric values.

JSON representation
{ "bleuMetricValues": [ { object (`BleuMetricValue`) } ] }

BleuMetricValue

Bleu metric value for an instance.

Fields

score number

Output only. Bleu score.

JSON representation
{ "score": number }

RougeResults

Results for rouge metric.

Fields

rougeMetricValues[] object (RougeMetricValue)

Output only. Rouge metric values.

JSON representation
{ "rougeMetricValues": [ { object (`RougeMetricValue`) } ] }

RougeMetricValue

Rouge metric value for an instance.

Fields

score number

Output only. Rouge score.

JSON representation
{ "score": number }

FluencyResult

Spec for fluency result.

Fields

explanation string

Output only. Explanation for fluency score.

score number

Output only. Fluency score.

confidence number

Output only. confidence for fluency score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

CoherenceResult

Spec for coherence result.

Fields

explanation string

Output only. Explanation for coherence score.

score number

Output only. Coherence score.

confidence number

Output only. confidence for coherence score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

SafetyResult

Spec for safety result.

Fields

explanation string

Output only. Explanation for safety score.

score number

Output only. Safety score.

confidence number

Output only. confidence for safety score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

GroundednessResult

Spec for groundedness result.

Fields

explanation string

Output only. Explanation for groundedness score.

score number

Output only. Groundedness score.

confidence number

Output only. confidence for groundedness score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

FulfillmentResult

Spec for fulfillment result.

Fields

explanation string

Output only. Explanation for fulfillment score.

score number

Output only. Fulfillment score.

confidence number

Output only. confidence for fulfillment score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

SummarizationQualityResult

Spec for summarization quality result.

Fields

explanation string

Output only. Explanation for summarization quality score.

score number

Output only. Summarization Quality score.

confidence number

Output only. confidence for summarization quality score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

PairwiseSummarizationQualityResult

Spec for pairwise summarization quality result.

Fields

pairwiseChoice enum (PairwiseChoice)

Output only. Pairwise summarization prediction choice.

explanation string

Output only. Explanation for summarization quality score.

confidence number

Output only. confidence for summarization quality score.

JSON representation
{ "pairwiseChoice": enum (`PairwiseChoice`), "explanation": string, "confidence": number }

PairwiseChoice

Pairwise prediction autorater preference.

Enums
`PAIRWISE_CHOICE_UNSPECIFIED`	Unspecified prediction choice.
`BASELINE`	baseline prediction wins
`CANDIDATE`	Candidate prediction wins
`TIE`	Winner cannot be determined

SummarizationHelpfulnessResult

Spec for summarization helpfulness result.

Fields

explanation string

Output only. Explanation for summarization helpfulness score.

score number

Output only. Summarization Helpfulness score.

confidence number

Output only. confidence for summarization helpfulness score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

SummarizationVerbosityResult

Spec for summarization verbosity result.

Fields

explanation string

Output only. Explanation for summarization verbosity score.

score number

Output only. Summarization Verbosity score.

confidence number

Output only. confidence for summarization verbosity score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

QuestionAnsweringQualityResult

Spec for question answering quality result.

Fields

explanation string

Output only. Explanation for question answering quality score.

score number

Output only. Question Answering Quality score.

confidence number

Output only. confidence for question answering quality score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

PairwiseQuestionAnsweringQualityResult

Spec for pairwise question answering quality result.

Fields

pairwiseChoice enum (PairwiseChoice)

Output only. Pairwise question answering prediction choice.

explanation string

Output only. Explanation for question answering quality score.

confidence number

Output only. confidence for question answering quality score.

JSON representation
{ "pairwiseChoice": enum (`PairwiseChoice`), "explanation": string, "confidence": number }

QuestionAnsweringRelevanceResult

Spec for question answering relevance result.

Fields

explanation string

Output only. Explanation for question answering relevance score.

score number

Output only. Question Answering Relevance score.

confidence number

Output only. confidence for question answering relevance score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

QuestionAnsweringHelpfulnessResult

Spec for question answering helpfulness result.

Fields

explanation string

Output only. Explanation for question answering helpfulness score.

score number

Output only. Question Answering Helpfulness score.

confidence number

Output only. confidence for question answering helpfulness score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

QuestionAnsweringCorrectnessResult

Spec for question answering correctness result.

Fields

explanation string

Output only. Explanation for question answering correctness score.

score number

Output only. Question Answering Correctness score.

confidence number

Output only. confidence for question answering correctness score.

JSON representation
{ "explanation": string, "score": number, "confidence": number }

PointwiseMetricResult

Spec for pointwise metric result.

Fields

explanation string

Output only. Explanation for pointwise metric score.

score number

Output only. Pointwise metric score.

JSON representation
{ "explanation": string, "score": number }

PairwiseMetricResult

Spec for pairwise metric result.

Fields

pairwiseChoice enum (PairwiseChoice)

Output only. Pairwise metric choice.

explanation string

Output only. Explanation for pairwise metric score.

JSON representation
{ "pairwiseChoice": enum (`PairwiseChoice`), "explanation": string }

ToolCallValidResults

Results for tool call valid metric.

Fields

toolCallValidMetricValues[] object (ToolCallValidMetricValue)

Output only. Tool call valid metric values.

JSON representation
{ "toolCallValidMetricValues": [ { object (`ToolCallValidMetricValue`) } ] }

ToolCallValidMetricValue

Tool call valid metric value for an instance.

Fields

score number

Output only. Tool call valid score.

JSON representation
{ "score": number }

ToolNameMatchResults

Results for tool name match metric.

Fields

toolNameMatchMetricValues[] object (ToolNameMatchMetricValue)

Output only. Tool name match metric values.

JSON representation
{ "toolNameMatchMetricValues": [ { object (`ToolNameMatchMetricValue`) } ] }

ToolNameMatchMetricValue

Tool name match metric value for an instance.

Fields

score number

Output only. Tool name match score.

JSON representation
{ "score": number }

ToolParameterKeyMatchResults

Results for tool parameter key match metric.

Fields

toolParameterKeyMatchMetricValues[] object (ToolParameterKeyMatchMetricValue)

Output only. Tool parameter key match metric values.

JSON representation
{ "toolParameterKeyMatchMetricValues": [ { object (`ToolParameterKeyMatchMetricValue`) } ] }

ToolParameterKeyMatchMetricValue

Tool parameter key match metric value for an instance.

Fields

score number

Output only. Tool parameter key match score.

JSON representation
{ "score": number }

ToolParameterKVMatchResults

Results for tool parameter key value match metric.

Fields

toolParameterKvMatchMetricValues[] object (ToolParameterKVMatchMetricValue)

Output only. Tool parameter key value match metric values.

JSON representation
{ "toolParameterKvMatchMetricValues": [ { object (`ToolParameterKVMatchMetricValue`) } ] }

ToolParameterKVMatchMetricValue

Tool parameter key value match metric value for an instance.

Fields

score number

Output only. Tool parameter key value match score.

JSON representation
{ "score": number }

CometResult

Spec for Comet result - calculates the comet score for the given instance using the version specified in the spec.

Fields

score number

Output only. Comet score. Range depends on version.

JSON representation
{ "score": number }

MetricxResult

Spec for MetricX result - calculates the MetricX score for the given instance using the version specified in the spec.

Fields

score number

Output only. MetricX score. Range depends on version.

JSON representation
{ "score": number }

TrajectoryExactMatchResults

Results for TrajectoryExactMatch metric.

Fields

trajectoryExactMatchMetricValues[] object (TrajectoryExactMatchMetricValue)

Output only. TrajectoryExactMatch metric values.

JSON representation
{ "trajectoryExactMatchMetricValues": [ { object (`TrajectoryExactMatchMetricValue`) } ] }

TrajectoryExactMatchMetricValue

TrajectoryExactMatch metric value for an instance.

Fields

score number

Output only. TrajectoryExactMatch score.

JSON representation
{ "score": number }

TrajectoryInOrderMatchResults

Results for TrajectoryInOrderMatch metric.

Fields

trajectoryInOrderMatchMetricValues[] object (TrajectoryInOrderMatchMetricValue)

Output only. TrajectoryInOrderMatch metric values.

JSON representation
{ "trajectoryInOrderMatchMetricValues": [ { object (`TrajectoryInOrderMatchMetricValue`) } ] }

TrajectoryInOrderMatchMetricValue

TrajectoryInOrderMatch metric value for an instance.

Fields

score number

Output only. TrajectoryInOrderMatch score.

JSON representation
{ "score": number }

TrajectoryAnyOrderMatchResults

Results for TrajectoryAnyOrderMatch metric.

Fields

trajectoryAnyOrderMatchMetricValues[] object (TrajectoryAnyOrderMatchMetricValue)

Output only. TrajectoryAnyOrderMatch metric values.

JSON representation
{ "trajectoryAnyOrderMatchMetricValues": [ { object (`TrajectoryAnyOrderMatchMetricValue`) } ] }

TrajectoryAnyOrderMatchMetricValue

TrajectoryAnyOrderMatch metric value for an instance.

Fields

score number

Output only. TrajectoryAnyOrderMatch score.

JSON representation
{ "score": number }

TrajectoryPrecisionResults

Results for TrajectoryPrecision metric.

Fields

trajectoryPrecisionMetricValues[] object (TrajectoryPrecisionMetricValue)

Output only. TrajectoryPrecision metric values.

JSON representation
{ "trajectoryPrecisionMetricValues": [ { object (`TrajectoryPrecisionMetricValue`) } ] }

TrajectoryPrecisionMetricValue

TrajectoryPrecision metric value for an instance.

Fields

score number

Output only. TrajectoryPrecision score.

JSON representation
{ "score": number }

TrajectoryRecallResults

Results for TrajectoryRecall metric.

Fields

trajectoryRecallMetricValues[] object (TrajectoryRecallMetricValue)

Output only. TrajectoryRecall metric values.

JSON representation
{ "trajectoryRecallMetricValues": [ { object (`TrajectoryRecallMetricValue`) } ] }

TrajectoryRecallMetricValue

TrajectoryRecall metric value for an instance.

Fields

score number

Output only. TrajectoryRecall score.

JSON representation
{ "score": number }

TrajectorySingleToolUseResults

Results for TrajectorySingleToolUse metric.

Fields

trajectorySingleToolUseMetricValues[] object (TrajectorySingleToolUseMetricValue)

Output only. TrajectorySingleToolUse metric values.

JSON representation
{ "trajectorySingleToolUseMetricValues": [ { object (`TrajectorySingleToolUseMetricValue`) } ] }

TrajectorySingleToolUseMetricValue

TrajectorySingleToolUse metric value for an instance.

Fields

score number

Output only. TrajectorySingleToolUse score.

JSON representation
{ "score": number }