When Data Labeling Service runs an evaluation job, it produces a set of evaluation metrics that vary depending on the specifics of your machine learning model. This guide describes the different types of evaluation metrics and how you can view them.
Before you begin
Before you begin, create an evaluation job and wait for it to run for the first time. By default, your evaluation job runs daily at 10:00AM UTC.
When the job runs, it first sends data to human reviewers for ground truth labeling (if you have enabled this option). Then it calculates evaluation metrics. Since human labeling takes time, if your job samples a lot of data you may need to wait more than a day to see your first evaluation metrics.
Compare mean average precision across models
In AI Platform Prediction, multiple model versions can be grouped together in a model resource. Each model version in one model should perform the same task, but they may each be trained differently.
If you have multiple model versions in a single model and have created an evaluation job for each one, you can view a chart comparing the mean average precision of the model versions over time:
Open the AI Platform models page in the Google Cloud console:
Click on the name of the model containing the model versions that you want to compare.
Click on the Evaluation tab.
The chart on this page compares the mean average precisions of each model version over time. You can change the interval of the chart.
If any of the evaluation jobs for the model version has had an error during a recent run, you will also see it displayed on this page.
View metrics for a specific model version
For more detailed evaluation metrics, view a single model version:
Open the AI Platform models page in the Google Cloud console:
Click on the name of the model containing the model version that you are interested in.
Click on the name of the model version that you are interested in.
Click on the Evaluation tab.
Similar to the comparison view discussed in the preceding section, this page has a chart of mean average precision over time. It also displays any errors from your model version's recent evaluation job runs.
Enter a date in the Enter date field to view metrics from an individual evaluation job run. You can also click All labels and select a specific label from the drop-down list to filter the metrics further. The following sections describe the metrics you can view for individual evaluation job runs.
Precision-recall curve
Precision-recall curves show how your machine learning model's precision and recall would change if you adjusted its classification threshold.
Confusion matrix
Confusion matrixes show all pairs of ground truth labels and prediction labels, so you can see patterns of how your machine learning model mistook certain labels for others.
Confusion matrixes are only generated for model versions that perform classification.
Side-by-side comparison
If your model version performs image classification or text classification, you can view a side-by-side comparison of your machine learning model's predicted labels and the ground truth labels for each prediction input.
If your model version performs image object detection, you can view a side-by-side comparison of your machine learning model's predicted bounding boxes and the ground truth bounding boxes. Hover over the bounding boxes to see the associated labels.
What's next
Learn how to update, pause, or delete an evaluation job.