Monitoring model performance and resource usage

After you deploy a model to an endpoint, you can monitor the endpoint to understand your model's performance and resource usage. You can track metrics such as traffic patterns, error rates, latency, and resource utilization to ensure that your model consistently and predictably responds to requests. For example, you might redeploy your model with a different machine type to optimize for cost. After you make the change, you can monitor the model to check whether your changes adversely affected its performance.

AI Platform (Unified) exports metrics to Cloud Monitoring. You can use Cloud Monitoring to create dashboards or configure alerts based on the metrics. For example, you can receive alerts if the model prediction latency gets too high. In Cloud Monitoring, the monitored resource type for AI Platform endpoints is aiplatform.googleapis.com/Endpoint.

AI Platform (Unified) shows some model monitoring metrics in the AI Platform Cloud Console. The following sections describe these metrics.

Performance metrics

Performance metrics can help you find information about your model's traffic patterns, errors, and latency. You can view the following performance metrics in the Cloud Console.

  • Predictions per second: The number of predictions per second across both online and batch predictions. If you have more than one instance per request, each instance is counted in this chart.
  • Prediction error percentage: The rate of errors that your model is producing. A high error rate might indicate an issue with the model or with the requests to the model. View the response codes chart to determine which errors are occurring.
  • Total latency duration: The total time that a request spends in the service, which is the model latency plus the overhead latency.

Viewing performance charts

  1. Go to the AI Platform Endpoints page in the Cloud Console.

    Go to the Endpoints page

  2. Click the name of an endpoint to view its metrics.

    You can select different chart intervals to see metric values over a particular time period, such as 1 hour, 12 hours, or 14 days.

    If you have multiple models deployed to the endpoint, you can select or deselect models to view or hide metrics for particular models. If you select multiple models, the console groups some model metrics into a single chart. If a metric provides only one value per model, the console groups the model metrics into a single chart, such as the prediction error percentage chart. For metrics that can have multiple values per model, the console provides a chart for each model. For example, the console provides a response codes chart for each model.