Machine learning Algorithms using Scikit-Learn

Ref : All the documentation for the functions used can be found at https://scikit-learn.org/stable/

This notebook aims to introduce you to the scikit-learn library that contains a lot of popularly used Machine Learning
algorithms. This notebook contains the following section:
(1) Regression

Each section has a data preparation section separately. For regression, we use the boston housing prices dataset.

1. Regression Data Preparation

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import *
from sklearn.datasets import *

# Set the random seeds for reproducibility.
random.seed(0)
np.random.seed(0)

In [2]:

# Load the boston housing prices dataset.
boston = (load_boston())

In [3]:

# 'boston' is a bunch object (similar to a dictionary). The important attributes are 'data' and 'target'.
# By convention, X always denotes the features, and y always denotes the label / target.
X, y = boston['data'], boston['target']

In [4]:

# It is always a good practice to see the shapes of the features and the target(s).
print (X.shape, y.shape)

(506, 13) (506,)

In [5]:

# Split the data into train and test sets.
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.20)

print ("Train Set : ", X_train.shape, y_train.shape)
print ("Test Set : ", X_test.shape, y_test.shape)

Train Set :  (404, 13) (404,)
Test Set :  (102, 13) (102,)

In [6]:

# It is always a good practice to scale the data. (Mean - Var Normalization)

# Create a standard scaler object.
scaler = StandardScaler()

# Fit the data and then transform the data. 
X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

2. Regression Based algorithms

2.1 Linear Regression

In [7]:

# Import the linear regression model.
from sklearn.linear_model import LinearRegression

In [8]:

# Create an object.
linReg = LinearRegression()

# Print all the parameters of the model.
print (linReg)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [9]:

# Fit the training data to the linear regressor.
linReg.fit(X_train, y_train)

Out[9]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [10]:

# Now, that the model is trained, you can test it on the test data.
modelPreds = linReg.predict(X_test)

In [11]:

# This prints the coefficient of each feature in the data.
linReg.coef_

Out[11]:

array([-0.97082019,  1.05714873,  0.03831099,  0.59450642, -1.8551476 ,
        2.57321942, -0.08761547, -2.88094259,  2.11224542, -1.87533131,
       -2.29276735,  0.71817947, -3.59245482])

In [12]:

"""
    To see how good the model is, we use a variety of metrics. R2 score is a commonly used metric for regression 
    use-cases. You'll come across it in module 2, and it'll be explained in class. It is between 0 - 1. The closer
    the score is to 1, the better model is.
"""
print (r2_score(y_test, modelPreds))

0.5892223849182512

2.2 Decision Tree Regression

In [13]:

# Import the model.
from sklearn.tree import DecisionTreeRegressor

In [14]:

# Create an object.
decTreeReg = DecisionTreeRegressor()

# Print all the parameters of the model.
print (decTreeReg)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort=False, random_state=None, splitter='best')

In [15]:

# Fit the training data to the model.
decTreeReg.fit(X_train, y_train)

Out[15]:

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort=False, random_state=None, splitter='best')

In [16]:

# Now, that the model is trained, you can test it on the test data.
modelPreds = decTreeReg.predict(X_test)

In [17]:

# Understand the performance of the model.
print (r2_score(y_test, modelPreds))

0.5863275009269561

2.3 Ridge Regression

In [18]:

# Import the model.
from sklearn.linear_model import Ridge

In [19]:

# Create an object.
ridgeReg = Ridge()

# Print all the parameters of the model.
print (ridgeReg)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [20]:

# Fit the training data to the model.
ridgeReg.fit(X_train, y_train)

Out[20]:

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [21]:

# Now, that the model is trained, you can test it on the test data.
modelPreds = ridgeReg.predict(X_test)

In [22]:

# Understand the performance of the model.
print (r2_score(y_test, modelPreds))

0.5881400471345534

2.4 Lasso Regression

In [23]:

# Import the model.
from sklearn.linear_model import Lasso

In [24]:

# Create an object.
lassoReg = Lasso()

# Print all the parameters of the model.
print (lassoReg)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

In [25]:

# Fit the training data to the model.
lassoReg.fit(X_train, y_train)

Out[25]:

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

In [26]:

# Now, that the model is trained, you can test it on the test data.
modelPreds = lassoReg.predict(X_test)

In [27]:

# Understand the performance of the model.
print (r2_score(y_test, modelPreds))

0.5069663003862215

2.4 Support Vector Regression

In [28]:

# Import the model.
from sklearn.svm import SVR

In [29]:

# Create an object.
svr = SVR()

# Print all the parameters of the model.
print (svr)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
    gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
    tol=0.001, verbose=False)

In [30]:

# Fit the training data to the model.
svr.fit(X_train, y_train)

Out[30]:

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
    gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
    tol=0.001, verbose=False)

In [31]:

# Now, that the model is trained, you can test it on the test data.
modelPreds = svr.predict(X_test)

In [32]:

# Understand the performance of the model.
print (r2_score(y_test, modelPreds))

0.4957469419124395

solidfish

Machine Learning – Regression Algorithms