Monday, May 23, 2022

Classification Algorithm Performance Metrics


 


In this post we will review metrics for measuring results a a classification algorithm result.

For the purpose of this, let assume we had run a classification algorithm to classify damaged parts by examining a picture of each part.

We've had run the algorithm over 10,000 parts, and then algorithm had detected 170 damaged parts. This information by itself provide zero visibility to the performance of the algorithm. To analyze this result, we need to start with a confusion matrix.



Predicted: Valid Predicted: Damaged
Actual: Valid 800 40
Actual: Damaged  30 130


The confusion metrics inspects two aspects for each part:

  • What is the actual part status: Valid or Damaged
  • What is the algorithm classification: Valid or Damaged.

To speak the same language regardless of the data domain, we use the terms True/False and Positive/Negative. 
  • True - means the algorithm classification is correct
  • False - means the algorithm classification is wrong
  • Positive - means the algorithm marked the part as damaged
  • Negative - means the algorithm marked the part as valid

Notice that Positive/Negative can be defined vice versa, depending on the definition of the algorithm purpose. In this case we have defined the algorithm purpose as: "to classify damaged parts", hence We treat Positive as classification of a part as damaged.

We can rewrite the confusion matrix using these terms:


Prediction: False    Prediction: True
Actual: FalseTrue Negative = 800False Positive = 40
Actual: True     False Negative = 30True Positive = 130


And so, the terms are:

  • True Negative - correct detection as a valid part
  • True Positive - correct detection as a damaged part
  • False Negative - wrong detection of a damaged part as a valid part
  • False Positive - wrong detection of a valid part as a damaged part


To estimate the performance, we can use several more metrics, that are based on the confusion matrix.



Precision = TP / (TP+FP)

In other words, the precision specifies the correctness in case a part is detected as damaged.



Recall = TP / (TP+FN)

In other words, the recall specifies the correctness in case a part actual state is damaged



And, last, we can use a metric to combine the Precision (P) and Recall (R).


F1 Score = 2PR/(P+R)

The F1 Score ranges between 0-1, where 1 means a perfect classifier.



Let's examine our classifier metrics:

Precision = 130 / (130+40) = 0.765

Recall = 130 / (130+30) = 0.812

F1 Score = 0.788


No comments:

Post a Comment