The metric used for running evaluations.
Optional. The aggregation metrics to use.
metric_spec
Union type
metric_spec
can be only one of the following:The spec for a pre-defined metric.
Spec for an LLM based metric.
Spec for pointwise metric.
Spec for pairwise metric.
Spec for exact match metric.
Spec for bleu metric.
Spec for rouge metric.
JSON representation |
---|
{ "aggregationMetrics": [ enum ( |
PredefinedMetricSpec
The spec for a pre-defined metric.
metricSpecName
string
Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
Optional. The parameters needed to run the pre-defined metric.
JSON representation |
---|
{ "metricSpecName": string, "metricSpecParameters": { object } } |
LLMBasedMetricSpec
Specification for an LLM based metric.
rubrics_source
Union type
rubrics_source
can be only one of the following:rubricGroupKey
string
Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubricGroups map of EvaluationInstance.
Dynamically generate rubrics using this specification.
Dynamically generate rubrics using a predefined spec.
metricPromptTemplate
string
Required. Template for the prompt sent to the judge model.
systemInstruction
string
Optional. System instructions for the judge model.
Optional. Optional configuration for the judge LLM (Autorater).
Optional. Optional additional configuration for the metric.
JSON representation |
---|
{ // rubrics_source "rubricGroupKey": string, "rubricGenerationSpec": { object ( |
RubricGenerationSpec
Specification for how rubrics should be generated.
promptTemplate
string
Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
The type of rubric content to be generated.
rubricTypeOntology[]
string
Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies include_rubric_type
should be true, and the generated rubric types should be chosen from this ontology.
Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
JSON representation |
---|
{ "promptTemplate": string, "rubricContentType": enum ( |
RubricContentType
Specifies the type of rubric content to generate.
Enums | |
---|---|
RUBRIC_CONTENT_TYPE_UNSPECIFIED |
The content type to generate is not specified. |
PROPERTY |
Generate rubrics based on properties. |
NL_QUESTION_ANSWER |
Generate rubrics in an NL question answer format. |
PYTHON_CODE_ASSERTION |
Generate rubrics in a unit test format. |
PointwiseMetricSpec
Spec for pointwise metric.
Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the score
and explanation
fields in the corresponding metric result will be empty.
metricPromptTemplate
string
Required. Metric prompt template for pointwise metric.
systemInstruction
string
Optional. System instructions for pointwise metric.
JSON representation |
---|
{
"customOutputFormatConfig": {
object ( |
CustomOutputFormatConfig
Spec for custom output format configuration.
custom_output_format_config
Union type
custom_output_format_config
can be only one of the following:returnRawOutput
boolean
Optional. Whether to return raw output.
JSON representation |
---|
{ // custom_output_format_config "returnRawOutput": boolean // Union type } |
PairwiseMetricSpec
Spec for pairwise metric.
candidateResponseFieldName
string
Optional. The field name of the candidate response.
baselineResponseFieldName
string
Optional. The field name of the baseline response.
Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the pairwiseChoice
and explanation
fields in the corresponding metric result will be empty.
metricPromptTemplate
string
Required. Metric prompt template for pairwise metric.
systemInstruction
string
Optional. System instructions for pairwise metric.
JSON representation |
---|
{
"candidateResponseFieldName": string,
"baselineResponseFieldName": string,
"customOutputFormatConfig": {
object ( |
ExactMatchSpec
This type has no fields.
Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.
BleuSpec
Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.
useEffectiveOrder
boolean
Optional. Whether to useEffectiveOrder to compute bleu score.
JSON representation |
---|
{ "useEffectiveOrder": boolean } |
RougeSpec
Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.
rougeType
string
Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
useStemmer
boolean
Optional. Whether to use stemmer to compute rouge score.
splitSummaries
boolean
Optional. Whether to split summaries while using rougeLsum.
JSON representation |
---|
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean } |