Understanding the performance of your model is an important part of managing machine learning models. You can monitor your model's traffic patterns, error rates, latency, and resource utilization to help you spot problems with your models and find the right machine type to optimize latency and cost.
You can also use Cloud Monitoring to configure
alerts based on the metrics. For example, you can receive alerts if the
model prediction latency gets too high. AI Platform Prediction exports
metrics to Cloud Monitoring.
Each AI Platform Prediction metric type includes "prediction"
in its name; for example, ml.googleapis.com/prediction/online/replicas
or
ml.googleapis.com/prediction/online/accelerator/duty_cycle
.
Monitoring performance metrics
You can find information about the traffic patterns, errors, and latency of your model in the Google Cloud console. The following charts are available on the Version Details page, on the Performance tab:
- Predictions: The number of predictions per second across both online and batch prediction. If you have more than one instance per request, each instance is counted in this chart.
- Errors: The rate of errors that your model is producing. A high rate of errors is usually a sign that something is wrong with the model or the requests to the model. The response codes can be used to determine which errors are happening.
- Model latency and Total latency: The latency of your model. Total latency is the total time the request spends in the service. Model latency is the time spent performing computation.
To view the performance charts, follow these steps:
Go to the AI Platform Prediction Models page in the Google Cloud console.
Click the name of your model in the list to go to the Model Details page.
Click the name of your version in the list to go to the Version Details page.
If it's not already selected, click the Performance tab.
Scroll to view each of the charts.
Monitoring resource consumption
Resource utilization charts for your model versions that use Compute Engine (N1) machine types are available in the Google Cloud console. The following charts are available on the Version Details page, on the Resource usage tab:
- Replica: The number of replicas for your version. If you are using manual scaling, this chart shows the number of nodes that you chose when you deployed or last updated the version. If you have enabled auto-scaling, the chart shows how the model's replica count changes over time in response to changes to traffic.
- CPU usage, Memory usage, Accelerator average duty cycle, and Accelerator memory usage: The version's CPU, GPU, and memory utilization, per replica.
Network bytes sent and Network bytes received: The job's network usage, measured in bytes per second.
To view the resource utilization charts, follow these steps:
Go to the AI Platform Prediction Models page in the Google Cloud console.
Click the name of your model in the list to go to the Model Details page.
Click the name of your version in the list to go to the Version Details page.
Click the Resource Usage tab.
Scroll to view each of the charts.
What's next
- Troubleshoot problems with your model version.
- Select a machine type to decrease latency or costs.