REST Resource: projects.locations.generators.evaluations

Resource: GeneratorEvaluation

Represents evaluation result of a generator.

JSON representation
{
  "name": string,
  "displayName": string,
  "generatorEvaluationConfig": {
    object (GeneratorEvaluationConfig)
  },
  "createTime": string,
  "completeTime": string,
  "initialGenerator": {
    object (Generator)
  },
  "evaluationStatus": {
    object (EvaluationStatus)
  },

  // Union field metrics can be only one of the following:
  "summarizationMetrics": {
    object (SummarizationEvaluationMetrics)
  }
  // End of list of possible types for union field metrics.
}
Fields
name

string

Output only. Identifier. The resource name of the evaluation. Format: projects/<Project ID>/locations/<Location ID>/generators/<Generator ID>/ evaluations/<Evaluation ID>

displayName

string

Optional. The display name of the generator evaluation. At most 64 bytes long.

generatorEvaluationConfig

object (GeneratorEvaluationConfig)

Required. The configuration of the evaluation task.

createTime

string (Timestamp format)

Output only. Creation time of this generator evaluation.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

completeTime

string (Timestamp format)

Output only. Completion time of this generator evaluation.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

initialGenerator

object (Generator)

Required. The initial generator that was used when creating this evaluation. This is a copy of the generator read from storage when creating the evaluation.

evaluationStatus

object (EvaluationStatus)

Output only. The result status of the evaluation pipeline. Provides the status information including if the evaluation is still in progress, completed or failed with certain error and user actionable message.

Union field metrics. Metrics details. metrics can be only one of the following:
summarizationMetrics

object (SummarizationEvaluationMetrics)

Output only. Only available when the summarization generator is provided.

GeneratorEvaluationConfig

Generator evaluation input config.

JSON representation
{
  "inputDataConfig": {
    object (InputDataConfig)
  },
  "outputGcsBucketPath": string,

  // Union field evaluation_feature_config can be only one of the following:
  "summarizationConfig": {
    object (SummarizationConfig)
  }
  // End of list of possible types for union field evaluation_feature_config.
}
Fields
inputDataConfig

object (InputDataConfig)

Required. The config/source of input data.

outputGcsBucketPath

string

Required. The output Cloud Storage bucket path to store eval files, e.g. per_summary_accuracy_score report. This path is provided by customer and files stored in it are visible to customer, no internal data should be stored in this path.

Union field evaluation_feature_config. Feature used for evaluation. evaluation_feature_config can be only one of the following:
summarizationConfig

object (SummarizationConfig)

Evaluation configs for summarization generator.

InputDataConfig

Input data config details

JSON representation
{
  "inputDataSourceType": enum (InputDataSourceType),
  "startTime": string,
  "endTime": string,
  "sampleSize": integer,
  "isSummaryGenerationAllowed": boolean,
  "summaryGenerationOption": enum (SummaryGenerationOption),

  // Union field source_specific_config can be only one of the following:
  "agentAssistInputDataConfig": {
    object (AgentAssistInputDataConfig)
  },
  "datasetInputDataConfig": {
    object (DatasetInputDataConfig)
  }
  // End of list of possible types for union field source_specific_config.
}
Fields
inputDataSourceType
(deprecated)

enum (InputDataSourceType)

Required. The source type of input data.

startTime
(deprecated)

string (Timestamp format)

Optional. The start timestamp to fetch conversation data.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime
(deprecated)

string (Timestamp format)

Optional. The end timestamp to fetch conversation data.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

sampleSize

integer

Optional. Desired number of conversation-summary pairs to be evaluated.

isSummaryGenerationAllowed
(deprecated)

boolean

Optional. Whether the summary generation is allowed when the pre-existing qualified summaries are insufficient to cover the sample size.

summaryGenerationOption

enum (SummaryGenerationOption)

Optional. Option to control whether summaries are generated during evaluation.

Union field source_specific_config. The source specific config for the input data. source_specific_config can be only one of the following:
agentAssistInputDataConfig

object (AgentAssistInputDataConfig)

The distinctive configs for Agent Assist conversations as the conversation source.

datasetInputDataConfig

object (DatasetInputDataConfig)

The distinctive configs for dataset as the conversation source.

InputDataSourceType

Enumeration of input data source type.

Enums
INPUT_DATA_SOURCE_TYPE_UNSPECIFIED Unspecified InputDataSourceType. Should not be used.
AGENT_ASSIST_CONVERSATIONS Fetch data from Agent Assist storage. If this source type is chosen, inputDataConfig.start_time and inputDataConfig.end_timestamp must be provided.
INSIGHTS_CONVERSATIONS Fetch data from Insights storage. If this source type is chosen, inputDataConfig.start_time and inputDataConfig.end_timestamp must be provided.

SummaryGenerationOption

Summary generation options.

Enums
SUMMARY_GENERATION_OPTION_UNSPECIFIED Default option will not be used
ALWAYS_GENERATE Always Generate summary for all conversations.
GENERATE_IF_MISSING Gnerate only missing summaries.
DO_NOT_GENERATE Do not generate new summaries. Only use existing summaries found.

AgentAssistInputDataConfig

The distinctive configs for Agent Assist conversations as the conversation source.

JSON representation
{
  "startTime": string,
  "endTime": string
}
Fields
startTime

string (Timestamp format)

Required. The start of the time range for conversations to be evaluated. Only conversations created at or after this timestamp will be sampled.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

endTime

string (Timestamp format)

Required. The end of the time range for conversations to be evaluated. Only conversations ended at or before this timestamp will be sampled.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

DatasetInputDataConfig

The distinctive configs for dataset as the conversation source.

JSON representation
{
  "dataset": string
}
Fields
dataset

string

Required. The identifier of the dataset to be evaluated. Format: projects/<ProjectId>/locations/<LocationID>/datasets/<DatasetID>.

SummarizationConfig

Evaluation configs for summarization generator.

JSON representation
{
  "enableAccuracyEvaluation": boolean,
  "accuracyEvaluationVersion": string,
  "enableCompletenessEvaluation": boolean,
  "completenessEvaluationVersion": string
}
Fields
enableAccuracyEvaluation

boolean

Optional. Enable accuracy evaluation.

accuracyEvaluationVersion

string

Optional. Version for summarization accuracy. This will determine the prompt and model used at backend.

enableCompletenessEvaluation

boolean

Optional. Enable completeness evaluation.

completenessEvaluationVersion

string

Optional. Version for summarization completeness. This will determine the prompt and model used at backend.

SummarizationEvaluationMetrics

Evaluation metrics for summarization generator.

JSON representation
{
  "summarizationEvaluationResults": [
    {
      object (SummarizationEvaluationResult)
    }
  ],
  "summarizationEvaluationMergedResultsUri": string,
  "overallMetrics": [
    {
      object (OverallScoresByMetric)
    }
  ],
  "overallSectionTokens": [
    {
      object (SectionToken)
    }
  ],
  "conversationDetails": [
    {
      object (ConversationDetail)
    }
  ]
}
Fields
summarizationEvaluationResults[]

object (SummarizationEvaluationResult)

Output only. A list of evaluation results per conversation(&summary), metric and section.

summarizationEvaluationMergedResultsUri

string

Output only. User bucket uri for merged evaluation score and aggregation score csv.

overallMetrics[]

object (OverallScoresByMetric)

Output only. A list of aggregated(average) scores per metric section.

overallSectionTokens[]

object (SectionToken)

Output only. Overall token per section. This is an aggregated(sum) result of input token of summary acorss all conversations that are selected for summarization evaluation.

conversationDetails[]

object (ConversationDetail)

Output only. List of conversation details.

SummarizationEvaluationResult

Evaluation result per conversation(&summary), metric and section.

JSON representation
{
  "sessionId": string,
  "metric": string,
  "section": string,
  "score": number,
  "sectionSummary": string,
  "decompositions": [
    {
      object (Decomposition)
    }
  ],
  "evaluationResults": [
    {
      object (EvaluationResult)
    }
  ]
}
Fields
sessionId
(deprecated)

string

Output only. conversation session id

metric

string

Output only. metric name, e.g. accuracy, completeness, adherence, etc.

section

string

Output only. section/task name, e.g. action, situation, etc

score

number

Output only. score calculated from decompositions

sectionSummary

string

Output only. Summary of this section

decompositions[]
(deprecated)

object (Decomposition)

Output only. List of decompostion details

evaluationResults[]

object (EvaluationResult)

Output only. List of evaluation results.

Decomposition

Decomposition details

JSON representation
{

  // Union field decomposition can be only one of the following:
  "accuracyDecomposition": {
    object (AccuracyDecomposition)
  },
  "adherenceDecomposition": {
    object (AdherenceDecomposition)
  }
  // End of list of possible types for union field decomposition.
}
Fields
Union field decomposition. One of decomposition details. decomposition can be only one of the following:
accuracyDecomposition

object (AccuracyDecomposition)

only available for accuracy metric.

adherenceDecomposition

object (AdherenceDecomposition)

only available for adherence metric.

AccuracyDecomposition

Decomposition details for accuracy.

JSON representation
{
  "point": string,
  "accuracyReasoning": string,
  "isAccurate": boolean
}
Fields
point

string

Output only. The breakdown point of the summary.

accuracyReasoning

string

Output only. The accuracy reasoning of the breakdown point.

isAccurate

boolean

Output only. Whether the breakdown point is accurate or not.

AdherenceDecomposition

Decomposition details for adherence.

JSON representation
{
  "point": string,
  "adherenceReasoning": string,
  "isAdherent": boolean
}
Fields
point

string

Output only. The breakdown point of the given instructions.

adherenceReasoning

string

Output only. The adherence reasoning of the breakdown point.

isAdherent

boolean

Output only. Whether the breakdown point is adherent or not.

EvaluationResult

Evaluation result that contains one of accuracy, adherence or completeness evaluation result.

JSON representation
{

  // Union field result can be only one of the following:
  "accuracyDecomposition": {
    object (AccuracyDecomposition)
  },
  "adherenceRubric": {
    object (AdherenceRubric)
  },
  "completenessRubric": {
    object (CompletenessRubric)
  }
  // End of list of possible types for union field result.
}
Fields
Union field result. One of evaluation result details. result can be only one of the following:
accuracyDecomposition

object (AccuracyDecomposition)

Only available for accuracy metric.

adherenceRubric

object (AdherenceRubric)

Only available for adherence metric.

completenessRubric

object (CompletenessRubric)

Only available for completeness metric.

AdherenceRubric

Rubric result of the adherence evaluation. A rubric is ued to determine if the summary adheres to all aspects of the given instructions.

JSON representation
{
  "question": string,
  "reasoning": string,
  "isAddressed": boolean
}
Fields
question

string

Output only. The question generated from instruction that used to evaluate summary.

reasoning

string

Output only. The reasoning of the rubric question is addressed or not.

isAddressed

boolean

Output only. A boolean that indicates whether the rubric question is addressed or not.

CompletenessRubric

Rubric details of the completeness evaluation result.

JSON representation
{
  "question": string,
  "isAddressed": boolean
}
Fields
question

string

Output only. The question generated from instruction that used to evaluate summary.

isAddressed

boolean

Output only. A boolean that indicates whether the rubric question is addressed or not.

OverallScoresByMetric

Overall performance per metric. This is the aggregated score for each metric across all conversations that are selected for summarization evaluation.

JSON representation
{
  "metric": string
}
Fields
metric

string

Output only. Metric name. e.g. accuracy, adherence, completeness.

SectionToken

A pair of section name and input token count of the input summary section.

JSON representation
{
  "section": string,
  "tokenCount": string
}
Fields
section

string

Output only. The name of the summary instruction.

tokenCount

string (int64 format)

Output only. Token count.

ConversationDetail

Aggregated evaluation result on conversation level. This conatins evaluation results of all the metrics and sections.

JSON representation
{
  "messageEntries": [
    {
      object (MessageEntry)
    }
  ],
  "summarySections": [
    {
      object (SummarySection)
    }
  ],
  "metricDetails": [
    {
      object (MetricDetail)
    }
  ],
  "sectionTokens": [
    {
      object (SectionToken)
    }
  ]
}
Fields
messageEntries[]

object (MessageEntry)

Output only. Conversation transcript that used for summarization evaluation as a reference.

summarySections[]

object (SummarySection)

Output only. Summary sections that used for summarization evaluation as a reference.

metricDetails[]

object (MetricDetail)

Output only. List of metric details.

sectionTokens[]

object (SectionToken)

Output only. Conversation level token count per section. This is an aggregated(sum) result of input token of summary acorss all metrics for a single conversation.

MetricDetail

Aggregated result on metric level. This conatins the evaluation results of all the sections.

JSON representation
{
  "metric": string,
  "sectionDetails": [
    {
      object (SectionDetail)
    }
  ],
  "score": number
}
Fields
metric

string

Output only. Metrics name. e.g. accuracy, adherence, completeness.

sectionDetails[]

object (SectionDetail)

Output only. List of section details.

score

number

Output only. Aggregated(average) score on this metric across all sections.

SectionDetail

Section level result.

JSON representation
{
  "section": string,
  "sectionSummary": string,
  "evaluationResults": [
    {
      object (EvaluationResult)
    }
  ],
  "score": number
}
Fields
section

string

Output only. The name of the summary instruction.

sectionSummary

string

Output only. Summary for this section

evaluationResults[]

object (EvaluationResult)

Output only. List of evaluation result. The list only contains one kind of the evaluation result.

score

number

Output only. Aggregated(average) score on this section across all evaluation results. Either decompositions or rubrics.

EvaluationStatus

A common evalaution pipeline status.

JSON representation
{
  "pipelineStatus": {
    object (Status)
  },
  "done": boolean
}
Fields
pipelineStatus

object (Status)

Output only. The error result of the evaluation in case of failure in evaluation pipeline.

done

boolean

Output only. If the value is false, it means the evaluation is still in progress. If true, the operation is completed, and either error or response is available.

Methods

create

Creates evaluation of a generator.

delete

Deletes an evaluation of generator.

get

Gets an evaluation of generator.

list

Lists evaluations of generator.