Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window. Try Gemini 1.5 Pro, our most advanced multimodal model in Vertex AI, and see what you can build with a 1M token context window.

REST Resource: projects.conversationModels.evaluations

Resource: ConversationModelEvaluation
Methods

Resource: ConversationModelEvaluation

Represents evaluation result of a conversation model.

JSON representation

JSON representation
{ "name": string, "displayName": string, "evaluationConfig": { object (`EvaluationConfig`) }, "createTime": string, "rawHumanEvalTemplateCsv": string, // Union field `metrics` can be only one of the following: "smartReplyMetrics": { object (`SmartReplyMetrics`) } // End of list of possible types for union field `metrics`. }

{
  "name": string,
  "displayName": string,
  "evaluationConfig": {
    object (EvaluationConfig)
  },
  "createTime": string,
  "rawHumanEvalTemplateCsv": string,

  // Union field metrics can be only one of the following:
  "smartReplyMetrics": {
    object (SmartReplyMetrics)
  }
  // End of list of possible types for union field metrics.
}

Fields
`name`	`string` The resource name of the evaluation. Format: `projects/<Project ID>/conversationModels/<Conversation Model ID>/evaluations/<Evaluation ID>`
`displayName`	`string` Optional. The display name of the model evaluation. At most 64 bytes long.
`evaluationConfig`	`object (EvaluationConfig)` Optional. The configuration of the evaluation task.
`createTime`	`string (Timestamp format)` Output only. Creation time of this model. A timestamp in RFC3339 UTC "Zulu" format, with nanosecond resolution and up to nine fractional digits. Examples: `"2014-10-02T15:01:23Z"` and `"2014-10-02T15:01:23.045123456Z"`.
`rawHumanEvalTemplateCsv`	`string` Output only. Human eval template in csv format. It tooks real-world conversations provided through input dataset, generates example suggestions for customer to verify quality of the model. For Smart Reply, the generated csv file contains columns of Context, (Suggestions,Q1,Q2)*3, Actual reply. Context contains at most 10 latest messages in the conversation prior to the current suggestion. Q1: "Would you send it as the next message of agent?" Evaluated based on whether the suggest is appropriate to be sent by agent in current context. Q2: "Does the suggestion move the conversation closer to resolution?" Evaluated based on whether the suggestion provide solutions, or answers customer's question or collect information from customer to resolve the customer's issue. Actual reply column contains the actual agent reply sent in the context.
Union field `metrics`. Metrics details. `metrics` can be only one of the following:
`smartReplyMetrics`	`object (SmartReplyMetrics)` Output only. Only available when model is for smart reply.

EvaluationConfig

The configuration for model evaluation.

JSON representation

JSON representation
{ "datasets": [ { object (`InputDataset`) } ], // Union field `model_specific_config` can be only one of the following: "smartReplyConfig": { object (`SmartReplyConfig`) }, "smartComposeConfig": { object (`SmartComposeConfig`) } // End of list of possible types for union field `model_specific_config`. }

{
  "datasets": [
    {
      object (InputDataset)
    }
  ],

  // Union field model_specific_config can be only one of the following:
  "smartReplyConfig": {
    object (SmartReplyConfig)
  },
  "smartComposeConfig": {
    object (SmartComposeConfig)
  }
  // End of list of possible types for union field model_specific_config.
}

Fields
`datasets[]`	`object (InputDataset)` Required. Datasets used for evaluation.
Union field `model_specific_config`. Specific configurations for different models in order to do evaluation. `model_specific_config` can be only one of the following:
`smartReplyConfig`	`object (SmartReplyConfig)` Configuration for smart reply model evalution.
`smartComposeConfig`	`object (SmartComposeConfig)` Configuration for smart compose model evalution.

SmartReplyConfig

Smart reply specific configuration for evaluation job.

JSON representation
{ "allowlistDocument": string, "maxResultCount": integer }

Fields

Fields
`allowlistDocument`	`string` The allowlist document resource name. Format: `projects/<Project ID>/knowledgeBases/<Knowledge Base ID>/documents/<Document ID>`. Only used for smart reply model.
`maxResultCount`	`integer` Required. The model to be evaluated can return multiple results with confidence score on each query. These results will be sorted by the descending order of the scores and we only keep the first maxResultCount results as the final results to evaluate.

allowlistDocument

string

The allowlist document resource name. Format: projects/<Project ID>/knowledgeBases/<Knowledge Base ID>/documents/<Document ID>. Only used for smart reply model.

maxResultCount

integer

Required. The model to be evaluated can return multiple results with confidence score on each query. These results will be sorted by the descending order of the scores and we only keep the first maxResultCount results as the final results to evaluate.

SmartComposeConfig

Smart compose specific configuration for evaluation job.

JSON representation
{ "allowlistDocument": string, "maxResultCount": integer }

Fields

Fields
`allowlistDocument`	`string` The allowlist document resource name. Format: `projects/<Project ID>/knowledgeBases/<Knowledge Base ID>/documents/<Document ID>`. Only used for smart compose model.
`maxResultCount`	`integer` Required. The model to be evaluated can return multiple results with confidence score on each query. These results will be sorted by the descending order of the scores and we only keep the first maxResultCount results as the final results to evaluate.

allowlistDocument

string

The allowlist document resource name. Format: projects/<Project ID>/knowledgeBases/<Knowledge Base ID>/documents/<Document ID>. Only used for smart compose model.

maxResultCount

integer

SmartReplyMetrics

The evaluation metrics for smart reply model.

JSON representation
{ "allowlistCoverage": number, "topNMetrics": [ { object (`TopNMetrics`) } ], "conversationCount": string }

Fields

Fields
`allowlistCoverage`	`number` Percentage of target participant messages in the evaluation dataset for which similar messages have appeared at least once in the allowlist. Should be [0, 1].
`topNMetrics[]`	`object (TopNMetrics)` Metrics of top n smart replies, sorted by [TopNMetric.n][].
`conversationCount`	`string (int64 format)` Total number of conversations used to generate this metric.

allowlistCoverage

number

Percentage of target participant messages in the evaluation dataset for which similar messages have appeared at least once in the allowlist. Should be [0, 1].

topNMetrics[]

object (TopNMetrics)

Metrics of top n smart replies, sorted by [TopNMetric.n][].

conversationCount

string (int64 format)

Total number of conversations used to generate this metric.

TopNMetrics

Evaluation metrics when retrieving n smart replies with the model.

JSON representation
{ "n": integer, "recall": number }

Fields

Fields
`n`	`integer` Number of retrieved smart replies. For example, when `n` is 3, this evaluation contains metrics for when Dialogflow retrieves 3 smart replies with the model.
`recall`	`number` Defined as `number of queries whose top n smart replies have at least one similar (token match similarity above the defined threshold) reply as the real reply` divided by `number of queries with at least one smart reply`. Value ranges from 0.0 to 1.0 inclusive.

n

integer

Number of retrieved smart replies. For example, when n is 3, this evaluation contains metrics for when Dialogflow retrieves 3 smart replies with the model.

recall

number

Defined as number of queries whose top n smart replies have at least one similar (token match similarity above the defined threshold) reply as the real reply divided by number of queries with at least one smart reply. Value ranges from 0.0 to 1.0 inclusive.

Methods
`get`	Gets an evaluation of conversation model.
`list`	Lists evaluations of a conversation model.

REST Resource: projects.conversationModels.evaluations

Resource: ConversationModelEvaluation

EvaluationConfig

SmartReplyConfig

SmartComposeConfig

SmartReplyMetrics

TopNMetrics

Methods

`get`

`list`