- Resource: GeneratorEvaluation
- GeneratorEvaluationConfig
- InputDataConfig
- InputDataSourceType
- SummaryGenerationOption
- AgentAssistInputDataConfig
- DatasetInputDataConfig
- SummarizationConfig
- SummarizationEvaluationMetrics
- SummarizationEvaluationResult
- Decomposition
- AccuracyDecomposition
- AdherenceDecomposition
- EvaluationResult
- AdherenceRubric
- CompletenessRubric
- OverallScoresByMetric
- SectionToken
- ConversationDetail
- MetricDetail
- SectionDetail
- EvaluationStatus
- Methods
Resource: GeneratorEvaluation
Represents evaluation result of a generator.
JSON representation |
---|
{ "name": string, "displayName": string, "generatorEvaluationConfig": { object ( |
Fields | |
---|---|
name |
Output only. Identifier. The resource name of the evaluation. Format: |
displayName |
Optional. The display name of the generator evaluation. At most 64 bytes long. |
generatorEvaluationConfig |
Required. The configuration of the evaluation task. |
createTime |
Output only. Creation time of this generator evaluation. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
completeTime |
Output only. Completion time of this generator evaluation. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
initialGenerator |
Required. The initial generator that was used when creating this evaluation. This is a copy of the generator read from storage when creating the evaluation. |
evaluationStatus |
Output only. The result status of the evaluation pipeline. Provides the status information including if the evaluation is still in progress, completed or failed with certain error and user actionable message. |
Union field metrics . Metrics details. metrics can be only one of the following: |
|
summarizationMetrics |
Output only. Only available when the summarization generator is provided. |
GeneratorEvaluationConfig
Generator evaluation input config.
JSON representation |
---|
{ "inputDataConfig": { object ( |
Fields | |
---|---|
inputDataConfig |
Required. The config/source of input data. |
outputGcsBucketPath |
Required. The output Cloud Storage bucket path to store eval files, e.g. per_summary_accuracy_score report. This path is provided by customer and files stored in it are visible to customer, no internal data should be stored in this path. |
Union field evaluation_feature_config . Feature used for evaluation. evaluation_feature_config can be only one of the following: |
|
summarizationConfig |
Evaluation configs for summarization generator. |
InputDataConfig
Input data config details
JSON representation |
---|
{ "inputDataSourceType": enum ( |
Fields | |
---|---|
inputDataSourceType |
Required. The source type of input data. |
startTime |
Optional. The start timestamp to fetch conversation data. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Optional. The end timestamp to fetch conversation data. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
sampleSize |
Optional. Desired number of conversation-summary pairs to be evaluated. |
isSummaryGenerationAllowed |
Optional. Whether the summary generation is allowed when the pre-existing qualified summaries are insufficient to cover the sample size. |
summaryGenerationOption |
Optional. Option to control whether summaries are generated during evaluation. |
Union field source_specific_config . The source specific config for the input data. source_specific_config can be only one of the following: |
|
agentAssistInputDataConfig |
The distinctive configs for Agent Assist conversations as the conversation source. |
datasetInputDataConfig |
The distinctive configs for dataset as the conversation source. |
InputDataSourceType
Enumeration of input data source type.
Enums | |
---|---|
INPUT_DATA_SOURCE_TYPE_UNSPECIFIED |
Unspecified InputDataSourceType. Should not be used. |
AGENT_ASSIST_CONVERSATIONS |
Fetch data from Agent Assist storage. If this source type is chosen, inputDataConfig.start_time and inputDataConfig.end_timestamp must be provided. |
INSIGHTS_CONVERSATIONS |
Fetch data from Insights storage. If this source type is chosen, inputDataConfig.start_time and inputDataConfig.end_timestamp must be provided. |
SummaryGenerationOption
Summary generation options.
Enums | |
---|---|
SUMMARY_GENERATION_OPTION_UNSPECIFIED |
Default option will not be used |
ALWAYS_GENERATE |
Always Generate summary for all conversations. |
GENERATE_IF_MISSING |
Gnerate only missing summaries. |
DO_NOT_GENERATE |
Do not generate new summaries. Only use existing summaries found. |
AgentAssistInputDataConfig
The distinctive configs for Agent Assist conversations as the conversation source.
JSON representation |
---|
{ "startTime": string, "endTime": string } |
Fields | |
---|---|
startTime |
Required. The start of the time range for conversations to be evaluated. Only conversations created at or after this timestamp will be sampled. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Required. The end of the time range for conversations to be evaluated. Only conversations ended at or before this timestamp will be sampled. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
DatasetInputDataConfig
The distinctive configs for dataset as the conversation source.
JSON representation |
---|
{ "dataset": string } |
Fields | |
---|---|
dataset |
Required. The identifier of the dataset to be evaluated. Format: |
SummarizationConfig
Evaluation configs for summarization generator.
JSON representation |
---|
{ "enableAccuracyEvaluation": boolean, "accuracyEvaluationVersion": string, "enableCompletenessEvaluation": boolean, "completenessEvaluationVersion": string } |
Fields | |
---|---|
enableAccuracyEvaluation |
Optional. Enable accuracy evaluation. |
accuracyEvaluationVersion |
Optional. Version for summarization accuracy. This will determine the prompt and model used at backend. |
enableCompletenessEvaluation |
Optional. Enable completeness evaluation. |
completenessEvaluationVersion |
Optional. Version for summarization completeness. This will determine the prompt and model used at backend. |
SummarizationEvaluationMetrics
Evaluation metrics for summarization generator.
JSON representation |
---|
{ "summarizationEvaluationResults": [ { object ( |
Fields | |
---|---|
summarizationEvaluationResults[] |
Output only. A list of evaluation results per conversation(&summary), metric and section. |
summarizationEvaluationMergedResultsUri |
Output only. User bucket uri for merged evaluation score and aggregation score csv. |
overallMetrics[] |
Output only. A list of aggregated(average) scores per metric section. |
overallSectionTokens[] |
Output only. Overall token per section. This is an aggregated(sum) result of input token of summary acorss all conversations that are selected for summarization evaluation. |
conversationDetails[] |
Output only. List of conversation details. |
SummarizationEvaluationResult
Evaluation result per conversation(&summary), metric and section.
JSON representation |
---|
{ "sessionId": string, "metric": string, "section": string, "score": number, "sectionSummary": string, "decompositions": [ { object ( |
Fields | |
---|---|
sessionId |
Output only. conversation session id |
metric |
Output only. metric name, e.g. accuracy, completeness, adherence, etc. |
section |
Output only. section/task name, e.g. action, situation, etc |
score |
Output only. score calculated from decompositions |
sectionSummary |
Output only. Summary of this section |
decompositions[] |
Output only. List of decompostion details |
evaluationResults[] |
Output only. List of evaluation results. |
Decomposition
Decomposition details
JSON representation |
---|
{ // Union field |
Fields | |
---|---|
Union field decomposition . One of decomposition details. decomposition can be only one of the following: |
|
accuracyDecomposition |
only available for accuracy metric. |
adherenceDecomposition |
only available for adherence metric. |
AccuracyDecomposition
Decomposition details for accuracy.
JSON representation |
---|
{ "point": string, "accuracyReasoning": string, "isAccurate": boolean } |
Fields | |
---|---|
point |
Output only. The breakdown point of the summary. |
accuracyReasoning |
Output only. The accuracy reasoning of the breakdown point. |
isAccurate |
Output only. Whether the breakdown point is accurate or not. |
AdherenceDecomposition
Decomposition details for adherence.
JSON representation |
---|
{ "point": string, "adherenceReasoning": string, "isAdherent": boolean } |
Fields | |
---|---|
point |
Output only. The breakdown point of the given instructions. |
adherenceReasoning |
Output only. The adherence reasoning of the breakdown point. |
isAdherent |
Output only. Whether the breakdown point is adherent or not. |
EvaluationResult
Evaluation result that contains one of accuracy, adherence or completeness evaluation result.
JSON representation |
---|
{ // Union field |
Fields | |
---|---|
Union field result . One of evaluation result details. result can be only one of the following: |
|
accuracyDecomposition |
Only available for accuracy metric. |
adherenceRubric |
Only available for adherence metric. |
completenessRubric |
Only available for completeness metric. |
AdherenceRubric
Rubric result of the adherence evaluation. A rubric is ued to determine if the summary adheres to all aspects of the given instructions.
JSON representation |
---|
{ "question": string, "reasoning": string, "isAddressed": boolean } |
Fields | |
---|---|
question |
Output only. The question generated from instruction that used to evaluate summary. |
reasoning |
Output only. The reasoning of the rubric question is addressed or not. |
isAddressed |
Output only. A boolean that indicates whether the rubric question is addressed or not. |
CompletenessRubric
Rubric details of the completeness evaluation result.
JSON representation |
---|
{ "question": string, "isAddressed": boolean } |
Fields | |
---|---|
question |
Output only. The question generated from instruction that used to evaluate summary. |
isAddressed |
Output only. A boolean that indicates whether the rubric question is addressed or not. |
OverallScoresByMetric
Overall performance per metric. This is the aggregated score for each metric across all conversations that are selected for summarization evaluation.
JSON representation |
---|
{ "metric": string } |
Fields | |
---|---|
metric |
Output only. Metric name. e.g. accuracy, adherence, completeness. |
SectionToken
A pair of section name and input token count of the input summary section.
JSON representation |
---|
{ "section": string, "tokenCount": string } |
Fields | |
---|---|
section |
Output only. The name of the summary instruction. |
tokenCount |
Output only. Token count. |
ConversationDetail
Aggregated evaluation result on conversation level. This conatins evaluation results of all the metrics and sections.
JSON representation |
---|
{ "messageEntries": [ { object ( |
Fields | |
---|---|
messageEntries[] |
Output only. Conversation transcript that used for summarization evaluation as a reference. |
summarySections[] |
Output only. Summary sections that used for summarization evaluation as a reference. |
metricDetails[] |
Output only. List of metric details. |
sectionTokens[] |
Output only. Conversation level token count per section. This is an aggregated(sum) result of input token of summary acorss all metrics for a single conversation. |
MetricDetail
Aggregated result on metric level. This conatins the evaluation results of all the sections.
JSON representation |
---|
{
"metric": string,
"sectionDetails": [
{
object ( |
Fields | |
---|---|
metric |
Output only. Metrics name. e.g. accuracy, adherence, completeness. |
sectionDetails[] |
Output only. List of section details. |
score |
Output only. Aggregated(average) score on this metric across all sections. |
SectionDetail
Section level result.
JSON representation |
---|
{
"section": string,
"sectionSummary": string,
"evaluationResults": [
{
object ( |
Fields | |
---|---|
section |
Output only. The name of the summary instruction. |
sectionSummary |
Output only. Summary for this section |
evaluationResults[] |
Output only. List of evaluation result. The list only contains one kind of the evaluation result. |
score |
Output only. Aggregated(average) score on this section across all evaluation results. Either decompositions or rubrics. |
EvaluationStatus
A common evalaution pipeline status.
JSON representation |
---|
{
"pipelineStatus": {
object ( |
Fields | |
---|---|
pipelineStatus |
Output only. The error result of the evaluation in case of failure in evaluation pipeline. |
done |
Output only. If the value is |
Methods |
|
---|---|
|
Creates evaluation of a generator. |
|
Deletes an evaluation of generator. |
|
Gets an evaluation of generator. |
|
Lists evaluations of generator. |