Model monitoring metrics

Generative AI on Vertex AI provides a prebuilt model observability dashboard to help you view the behavior, health, and performance of fully-managed models. Fully-managed models, also known as Model as a Service (MaaS), are models that Google provides and hosts, such as Google's Gemini models and partner models with managed endpoints. The dashboard doesn't include metrics from self-hosted models.

Generative AI on Vertex AI automatically collects and reports activity from MaaS models to help you troubleshoot latency issues and monitor capacity.

A sample model observability dashboard in the Cloud Console
Model observability dashboard example

Why use the model observability dashboard

The model observability dashboard helps you understand the performance and usage of your models. As an application developer, you can use the dashboard for the following tasks:

  • Monitor user interaction: View trends in model usage, such as requests per second and invocation latencies, to understand how users interact with your models.
  • Estimate costs: Use model usage metrics to approximate the costs associated with running each model.
  • Troubleshoot issues: Diagnose problems by monitoring API error rates, first token latencies, and token throughput to verify that models are responding reliably and efficiently.

Available monitoring metrics

The model observability dashboard displays a subset of the metrics that Cloud Monitoring collects. Key metrics include the following:

  • Model requests per second (QPS)
  • Token throughput
  • First token latencies
  • API error rates

To see all available metrics and their descriptions, see the "aiplatform" section on the Google Cloud metrics page.

Limitations

Vertex AI captures dashboard metrics only for API calls to a model's endpoint. The dashboard doesn't include metrics from Google Cloud console usage, such as from Vertex AI Studio.

View the dashboard

  1. In the Vertex AI section of the Google Cloud console, go to the Dashboard page.

    Go to Vertex AI

  2. In the Model observability section, click Show all metrics to view the model observability dashboard in the Google Cloud Observability console.

  3. To view metrics for a specific model or in a particular location, set one or more filters at the top of the dashboard page.

Additional resources