This legacy version of AI Platform Prediction is deprecated and will no longer be available on Google Cloud after January 31, 2025. All models, associated metadata, and deployments will be deleted after January 31, 2025. Migrate your resources to Vertex AI to get new machine learning features that are unavailable in AI Platform.

Monitoring model versions

Understanding the performance of your model is an important part of managing machine learning models. You can monitor your model's traffic patterns, error rates, latency, and resource utilization to help you spot problems with your models and find the right machine type to optimize latency and cost.

You can also use Cloud Monitoring to configure alerts based on the metrics. For example, you can receive alerts if the model prediction latency gets too high. AI Platform Prediction exports metrics to Cloud Monitoring. Each AI Platform Prediction metric type includes "prediction" in its name; for example, ml.googleapis.com/prediction/online/replicas or ml.googleapis.com/prediction/online/accelerator/duty_cycle.

Monitoring performance metrics

You can find information about the traffic patterns, errors, and latency of your model in the Google Cloud console. The following charts are available on the Version Details page, on the Performance tab:

Predictions: The number of predictions per second across both online and batch prediction. If you have more than one instance per request, each instance is counted in this chart.
Errors: The rate of errors that your model is producing. A high rate of errors is usually a sign that something is wrong with the model or the requests to the model. The response codes can be used to determine which errors are happening.
Model latency and Total latency: The latency of your model. Total latency is the total time the request spends in the service. Model latency is the time spent performing computation.

To view the performance charts, follow these steps:

Go to the AI Platform Prediction Models page in the Google Cloud console.

Go to the Models page
Click the name of your model in the list to go to the Model Details page.
Click the name of your version in the list to go to the Version Details page.
If it's not already selected, click the Performance tab.
Scroll to view each of the charts.

Monitoring resource consumption

Resource utilization charts for your model versions that use Compute Engine (N1) machine types are available in the Google Cloud console. The following charts are available on the Version Details page, on the Resource usage tab:

Replica: The number of replicas for your version. If you are using manual scaling, this chart shows the number of nodes that you chose when you deployed or last updated the version. If you have enabled auto-scaling, the chart shows how the model's replica count changes over time in response to changes to traffic.
CPU usage, Memory usage, Accelerator average duty cycle, and Accelerator memory usage: The version's CPU, GPU, and memory utilization, per replica.
Network bytes sent and Network bytes received: The job's network usage, measured in bytes per second.

Note: The Network bytes received graph can show unexpected values for autoscaling model versions. We are aware of the issue and working to fix it.

To view the resource utilization charts, follow these steps:

Go to the AI Platform Prediction Models page in the Google Cloud console.

Go to the Models page
Click the name of your model in the list to go to the Model Details page.
Click the name of your version in the list to go to the Version Details page.
Click the Resource Usage tab.
Scroll to view each of the charts.

What's next

Troubleshoot problems with your model version.
Select a machine type to decrease latency or costs.