Model Evaluation Techniques

This notebook will only deal with commonly used evaluation metrics for regression and classification. This list is not exhaustive, you are encouraged to look at the other metrics that can be used.

References:
(1) Scikit-Learn : https://scikit-learn.org/stable/modules/model_evaluation.html
(2) https://github.com/maykulkarni/Machine-Learning-Notebooks

Useful Resources :
https://scikit-learn.org/stable/modules/model_evaluation.html
https://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error

In [1]:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

1. Regression Metrics

1.1 $R^{2} \text{score}$ (Coefficient of Determination)

$ R^{2} = 1 – \frac{SS_{res}}{SS_{tot}}$

where,

$ SS_{res} = \sum_{i}^{n}(y_{i} – \hat{y}_{i})^{2}$

$ SS_{tot} = \sum_{i}^{n}(y_{i} – y_{avg})^{2}$

Possible Values:
$$
R^{2}=\begin{cases}
>0, & \text{better than average predictor}\\
0, & \text{exactly the same as average predictor}\\
<0, & \text{worse than average predictor}
\end{cases}
$$

$R^{2} \in (-\infty, 1]$

More information on the math behind the use of the R^2 score can be found here : https://nbviewer.jupyter.org/github/maykulkarni/Machine-Learning-Notebooks/blob/master/05.%20Model%20Evaluation/R%20Squared.ipynb

In [2]:

# Most of the metrics implemented in the scikit-learn library will be found here. (sklearn.metrics)
from sklearn.metrics import r2_score

Case 1: $R^{2} > 0$

In [3]:

"""
    Arbitrarily define y_true and y_pred.
    Assume that the following are the ground truth (y_true) and the model predictions (y_pred).
"""
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]

In [4]:

plt.plot(y_true)
plt.plot(y_pred)
plt.title('Plot of Ground Truth vs Predictions')
plt.legend(['y_true', 'y_pred'])
plt.xlabel('X')
plt.ylabel('Y')

Out[4]:

Text(0, 0.5, 'Y')

No description has been provided for this image

In [5]:

# The signature of the function is : r2_score(y_true, y_pred)
# An r2_score of 1 means that your model is the best. In practice, you don't achieve such high scores, rather 
# r2_score will tend to 1 if you model the data properly.
r2_score(y_true, y_pred)

Out[5]:

0.9486081370449679

Case 2: $R^{2} = 0$

In [20]:

plt.plot(y_true)

# Notice how we're predicting a constant value, equal to the mean of the observed ground truth data.
plt.plot([np.mean(y_true)]*len(y_true))

plt.title('Plot of Ground Truth vs Average of the Ground Truth.')
plt.legend(['y_true', 'y_pred'])
plt.xlabel('X')
plt.ylabel('Y')

Out[20]:

Text(0, 0.5, 'Y')

In [7]:

# An R2 score of 0 corresponds to a horizontal predictor that is just the mean of all the observed values!
r2_score(y_true, [np.mean(y_true)]*len(y_true))

Out[7]:

0.0

Case 3: $R^{2} < 0$

In [8]:

# Change the value of y_pred, and let us see what happens to the R2 score.
y_pred = [1.0, 1.0, 1.0, 1.0]

In [9]:

plt.plot(y_true)
plt.plot(y_pred)
plt.legend(['y_true', 'y_pred'])
plt.xlabel('X')
plt.ylabel('Y')

Out[9]:

Text(0, 0.5, 'Y')

In [10]:

# The signature of the function is : r2_score(y_true, y_pred)
# An r2_score of 1 means that your model is the best. In practice, you don't achieve such high scores, rather r2_score
# will tend to 1 if you model the data properly.
r2_score(y_true, y_pred)

Out[10]:

-0.4817987152034262

1.2 Mean Absolute Error (MAE)

$ \text{MAE} = \frac{1}{n} \sum_{i}^{n}|y_{i} – \hat{y}_{i}|$

In [11]:

# Import the function.
from sklearn.metrics import mean_absolute_error

In [12]:

# Signature of the function is : mean_absolute_error(y_true, y_pred)
mean_absolute_error(y_true, y_pred)

Out[12]:

2.625

In [13]:

# Same function written from first principles!
np.mean( np.abs( np.array(y_true) - np.array(y_pred) ) )

Out[13]:

2.625

For more information on MAE, refer the following link : https://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error

1.3 Mean Squared Error (MSE)

$ \text{MSE} = \frac{1}{n} \sum_{i}^{n}|y_{i} – \hat{y}_{i}|^{2} $

In [14]:

# Import the function.
from sklearn.metrics import mean_squared_error

In [15]:

# Signature of the function is : mean_squared_error(y_true, y_pred)
mean_squared_error(y_true, y_pred)

Out[15]:

10.8125

In [16]:

# Same function written from first principles!
np.mean( np.square( np.array(y_true) - np.array(y_pred) ) )

Out[16]:

10.8125

For theory on the use of mean_squared_error, refer : https://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error

1.4 Root Mean Squared Error (RMSE)

$ \text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i}^{n}|y_{i} – \hat{y}_{i}|^{2} } $

In [17]:

# Import the function.
from sklearn.metrics import mean_squared_error

# Import numpy (will be using some of its functionality)
import numpy as np

In [18]:

# Signature of the function is : mean_squared_error(y_true, y_pred)
np.sqrt(mean_squared_error(y_true, y_pred))

Out[18]:

3.2882366094914763

In [19]:

# Same function written from first principles!
np.sqrt( np.mean( np.square( np.array(y_true) - np.array(y_pred) ) ) )

Out[19]:

3.2882366094914763

solidfish

Model Evaluation – Regression