In this post we will review metrics for measuring results a a classification algorithm result.
For the purpose of this, let assume we had run a classification algorithm to classify damaged parts by examining a picture of each part.
We've had run the algorithm over 10,000 parts, and then algorithm had detected 170 damaged parts. This information by itself provide zero visibility to the performance of the algorithm. To analyze this result, we need to start with a confusion matrix.
Predicted: Valid | Predicted: Damaged | |
---|---|---|
Actual: Valid | 800 | 40 |
Actual: Damaged | 30 | 130 |
The confusion metrics inspects two aspects for each part:
- What is the actual part status: Valid or Damaged
- What is the algorithm classification: Valid or Damaged.
To speak the same language regardless of the data domain, we use the terms True/False and Positive/Negative.
- True - means the algorithm classification is correct
- False - means the algorithm classification is wrong
- Positive - means the algorithm marked the part as damaged
- Negative - means the algorithm marked the part as valid
Notice that Positive/Negative can be defined vice versa, depending on the definition of the algorithm purpose. In this case we have defined the algorithm purpose as: "to classify damaged parts", hence We treat Positive as classification of a part as damaged.
We can rewrite the confusion matrix using these terms:
Prediction: False | Prediction: True | |
---|---|---|
Actual: False | True Negative = 800 | False Positive = 40 |
Actual: True | False Negative = 30 | True Positive = 130 |
And so, the terms are:
- True Negative - correct detection as a valid part
- True Positive - correct detection as a damaged part
- False Negative - wrong detection of a damaged part as a valid part
- False Positive - wrong detection of a valid part as a damaged part
To estimate the performance, we can use several more metrics, that are based on the confusion matrix.
Precision = TP / (TP+FP)
In other words, the precision specifies the correctness in case a part is detected as damaged.
Recall = TP / (TP+FN)
In other words, the recall specifies the correctness in case a part actual state is damaged.
And, last, we can use a metric to combine the Precision (P) and Recall (R).
F1 Score = 2PR/(P+R)
The F1 Score ranges between 0-1, where 1 means a perfect classifier.
Let's examine our classifier metrics:
Precision = 130 / (130+40) = 0.765
Recall = 130 / (130+30) = 0.812
F1 Score = 0.788
No comments:
Post a Comment