Model Evaluation Techniques
This notebook will only deal with commonly used evaluation metrics for classification. This list is not exhaustive, you are encouraged to look at the other metrics that can be used.
References:
(1) Scikit-Learn : https://scikit-learn.org/stable/modules/model_evaluation.html
(2) https://github.com/maykulkarni/Machine-Learning-Notebooks
import numpy as np
import matplotlib.pyplot as plt
import random
# Set the seed.
random.seed(0)
np.random.seed(0)
# Make your plot outputs appear and be stored within the notebook.
%matplotlib inline
1. Classification Metrics
1.1 Accuracy Score
# Import the function.
from sklearn.metrics import accuracy_score
"""
Assume y_true and y_pred to be the following.
"""
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
# Compute the accuracy score. Essentially it means that 50% of the test samples have been classified correctly.
accuracy_score(y_true, y_pred)
0.5
# If 'normalize' == 'False', then the number of correctly classified samples is returned.
accuracy_score(y_true, y_pred, normalize=False)
2
1.2 Confusion Matrix
# Import the function.
from sklearn.metrics import confusion_matrix
"""
Assumption of y_true and y_pred.
"""
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
"""
To understand a confusion matrix, you'll need to understand the terms : true-positive, true-negative,
false-negative and false-positive. For more information on them, refer the following link :
Link : https://en.wikipedia.org/wiki/Confusion_matrix
"""
confusion_matrix(y_true, y_pred)
array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])
People generally visually plot the confusion matrix, as it is much easier to visualize. However, we will not be doing that here. You’re free to explore about that.
1.3 Classification Report
The classification_report function builds a text report showing the main classification metrics.
# Import the function.
from sklearn.metrics import classification_report
# Dummy Dataset (Assumptions)
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
# Think about why we used print() here? Why did we not use it anywhere above?
print (classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support class 0 0.67 1.00 0.80 2 class 1 0.00 0.00 0.00 1 class 2 1.00 0.50 0.67 2 accuracy 0.60 5 macro avg 0.56 0.50 0.49 5 weighted avg 0.67 0.60 0.59 5
1.4 Precision, Recall and F1 Score
These three metrics are generally used together, because the computation of F1-Score requires the value of precision
and recall.
$ \text{Precision} = \frac{\text{True Positive}}{\text{True Positive } + \text{ False Positive}}
= \frac{\text{True Positive}}{\text{Total Predicted Positive}} $
$ \text{Recall} = \frac{\text{True Positive}}{\text{True Positive } + \text{ False Negative}}
= \frac{\text{True Positive}}{\text{Total Actual Positive}}$
$ \text{F1} = \frac{\text{2 * Precision * Recall}}{\text{Precision + Recall}}$
1.4.1 Precision
print (y_true)
print (y_pred)
[0, 1, 2, 2, 0] [0, 0, 2, 1, 0]
Precision is the ability of the classifier not to label as positive a sample that is negative. The best value is 1 and the worst value is 0.
# Import the function.
from sklearn.metrics import precision_score
# Computing the precision score.
precision_score(y_true, y_pred, average="weighted")
0.6666666666666666
1.4.2 Recall
Recall is the ability of the classifier to find all the positive samples. The best value is 1 and the worst value is 0.
# Import the function.
from sklearn.metrics import recall_score
# Computing the recall score.
recall_score(y_true, y_pred, average="weighted")
0.6
1.4.3 F1 Score
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:
# Import the function.
from sklearn.metrics import f1_score
# Computing the f1 score.
f1_score(y_true, y_pred, average="weighted")
0.5866666666666667
# Why do we need F1 score when we already have accuracy?
y_true = [1,1,1,1,1,0]
y_pred = [1,1,1,1,1,1]
# Accuracy is not a very good metric to use when the data is highly unbalanced or the class distribution is skewed.
np.sum(np.array(y_true) == np.array(y_pred))/len(y_true)*100
83.33333333333334
Optional
Another commonly used classification metric is ‘ROC-AUC’. You can read more about this here : https://scikit-learn.org/stable/modules/model_evaluation.html