run KISS: Classification Algorithm Performance Metrics

In this post we will review metrics for measuring results a a classification algorithm result.

For the purpose of this, let assume we had run a classification algorithm to classify damaged parts by examining a picture of each part.

We've had run the algorithm over 10,000 parts, and then algorithm had detected 170 damaged parts. This information by itself provide zero visibility to the performance of the algorithm. To analyze this result, we need to start with a confusion matrix.

	Predicted: Valid	Predicted: Damaged
Actual: Valid	800	40
Actual: Damaged	30	130

The confusion metrics inspects two aspects for each part:

What is the actual part status: Valid or Damaged
What is the algorithm classification: Valid or Damaged.

To speak the same language regardless of the data domain, we use the terms True/False and Positive/Negative.

True - means the algorithm classification is correct
False - means the algorithm classification is wrong
Positive - means the algorithm marked the part as damaged
Negative - means the algorithm marked the part as valid

Notice that Positive/Negative can be defined vice versa, depending on the definition of the algorithm purpose. In this case we have defined the algorithm purpose as: "to classify damaged parts", hence We treat Positive as classification of a part as damaged.

We can rewrite the confusion matrix using these terms:

	Prediction: False	Prediction: True
Actual: False	True Negative = 800	False Positive = 40
Actual: True	False Negative = 30	True Positive = 130

And so, the terms are:

True Negative - correct detection as a valid part
True Positive - correct detection as a damaged part
False Negative - wrong detection of a damaged part as a valid part
False Positive - wrong detection of a valid part as a damaged part

To estimate the performance, we can use several more metrics, that are based on the confusion matrix.

Precision = TP / (TP+FP)

In other words, the precision specifies the correctness in case a part is detected as damaged.

Recall = TP / (TP+FN)

In other words, the recall specifies the correctness in case a part actual state is damaged.

And, last, we can use a metric to combine the Precision (P) and Recall (R).

F1 Score = 2PR/(P+R)

The F1 Score ranges between 0-1, where 1 means a perfect classifier.

Let's examine our classifier metrics:

Precision = 130 / (130+40) = 0.765

Recall = 130 / (130+30) = 0.812

F1 Score = 0.788

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Monday, May 23, 2022

Classification Algorithm Performance Metrics

No comments:

Post a Comment