Part 1: Logistic Regression 1.1 Introduction to Logistic Regression Logistic regression is a probabilistic model for binary classification. Unlike linear regression, which predicts continuous values, logistic regression outputs probabilities between 0 and 1. The core idea is to apply the sigmoid (logistic) function to a linear score: \[ \sigma(z) = \frac{1}{1 + e^{-z}} \] where […]
Search and Optimization in Machine Learning
This blog post provides a detailed explanation of optimization techniques in machine learning. It is intended for college students and explains concepts step-by-step with mathematical notations, examples, and illustrative diagrams. 1. Why Optimization Matters in Machine Learning In machine learning, training a model involves finding the best possible model from a dataset. This process is […]
Logistic Regression and Support Vector Machines
This blog post provides a detailed explanation of the key concepts Logistic Regression and Support Vector Machines (SVMs). These are fundamental algorithms in machine learning for classification tasks. 1. Logistic Regression Logistic Regression is a probabilistic classification model used primarily for binary classification problems. Unlike linear regression, which predicts continuous values, logistic regression outputs the […]
Nearest Neighbor Methods
This blog post provides a detailed explanation of the Nearest Neighbor algorithms, specifically focusing on k-Nearest Neighbors (KNN), as covered in predictive modeling for data mining. Introduction to Nearest Neighbor: Instance-Based Learning The Nearest Neighbor method is a discriminative classification algorithm that is non-parametric and instance-based. Unlike parametric models (e.g., linear regression) that learn explicit […]
Predictive Modeling in Data Mining: Concepts, Mathematics, and Practical Implications
This post explains predictive modeling as a data-mining workflow composed of four essential elements: (i) task specification, (ii) knowledge representation, (iii) learning (scoring + search), and (iv) prediction/evaluation. Contents 1. Introduction 2. The Four Components of a Predictive Modeling Algorithm 3. Task Specification 4. Knowledge Representation (Model Families) 5. Learning: Model Space, Scoring Functions, and […]
Naive Bayes Classifiers
This blog post provides a detailed explanation of Naive Bayes Classifiers (NBC), a fundamental probabilistic classification algorithm in data mining and machine learning. We will explore the concepts step by step, including mathematical foundations, assumptions, learning processes, and practical considerations. Introduction to Naive Bayes Classifiers The Naive Bayes Classifier is a probabilistic model used for […]
Exploratory Data Analysis (EDA)
What is Exploratory Data Analysis? Exploratory Data Analysis (EDA) is an approach to analyzing data when you do not yet have a clear hypothesis or modeling goal. Instead of jumping directly into modeling, EDA focuses on understanding the structure, patterns, and anomalies in the data. EDA aims to: Maximize insight into the dataset Uncover underlying […]
Linear Algebra for Data: Vectors, Matrices, Eigenvalues, SVD, and Distance Measures
1) Vectors: The Fundamental Data Object 1.1 What is a vector? A vector is a 1D array of numbers. You can think of it as: A list of features for one data point (e.g., height, weight, age). A point in space (2D, 3D, or higher dimensions). An arrow with direction and length (geometric view). Notation: […]
Foundations of Data Mining
The Data Mining Process Overview The full data mining process includes several stages: Data Selection: Choosing relevant data sources. Data Preprocessing: Cleaning, transforming, and preparing data (handling missing values, outliers, etc.). Data Mining: Applying algorithms to extract patterns/models (the focus of most courses). Interpretation/Evaluation: Analyzing results and validating them. While the full process is important, […]
Foundations of Probability and Statistics for Data Mining
1. High-Level Overview – Probability and Statistics In the real world, we rarely have complete information. Data is noisy, measurements contain errors, and future events are uncertain. Probability theory provides a rigorous mathematical framework for: Quantifying uncertainty in a principled way Making optimal decisions when outcomes are uncertain Building models that generalize beyond observed […]
Tensorflow Overview
TF1.x vs TF2.0 Pioneering lirary for building deep learning models, launch November 2015. Its free, open source, originally developed by Google. Other libraries: PyTorch – from FB, October 2016 TensorFlow 2.0 Major new version, September 2019 Dynamoic computation graphs Not backward compatible with TF1 Closer to PyTorch TF1.x vs TF2.0 vs PyTorch TF1.x PyTorch Computation […]
Machine Learning – Adversarial Sample Detection
Adversarial Examples Inputs generated by adversaries to fool neural networks. Two types: Semantic based perturbations Restricted area to manipulate pixels Modify a specific area of the image Simulate real world scenarios Pervasive Perturbations Full access to pixel alteration Modify the whole image Different distance metrics Defense and Detections Adversarial detections – determine whether input […]
Machine Learning – Inference Attacks
How model inversion attack works? Attacker first trains a separate ML model known as Inversion Model based on the output of the target model Goal is to predict the input data (original dataset used to train the target model) Attacker can exploit information based on the input Types of MIA attacks: Query based attacks: […]
Machine Learning – Adversarial Attacks
Below are various papers reviewed regarding security vulnerabilities and adversarial attacks against machine learning. 6thSense Intrusion Detection System (IDS) for smart devices This paper presents 6thSense, a novel intrusion detection system (IDS) designed to defend against sensor-based threats in smart devices, particularly Android smartphones. The framework uses context-aware models and machine learning techniques to detect […]
Machine Learning – Black Box Attacks and Transferability
Adversary Knowledge White-box = adversary has complete knowledge of the targeted model, including its parameter values, architecture, training method and in some cases its training data Black-box = adversary has no knowledge about the ML model except input output samples of training data or input output pairings obtained using the target model as an oracle […]
Model Evaluation – Regression
Model Evaluation Techniques This notebook will only deal with commonly used evaluation metrics for regression and classification. This list is not exhaustive, you are encouraged to look at the other metrics that can be used. References: (1) Scikit-Learn : https://scikit-learn.org/stable/modules/model_evaluation.html (2) https://github.com/maykulkarni/Machine-Learning-Notebooks Useful Resources : https://scikit-learn.org/stable/modules/model_evaluation.html https://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error In [1]: import numpy as np import matplotlib.pyplot as […]
Model Evaluation – Classification
Model Evaluation Techniques This notebook will only deal with commonly used evaluation metrics for classification. This list is not exhaustive, you are encouraged to look at the other metrics that can be used. References: (1) Scikit-Learn : https://scikit-learn.org/stable/modules/model_evaluation.html (2) https://github.com/maykulkarni/Machine-Learning-Notebooks Useful Resources : https://scikit-learn.org/stable/modules/model_evaluation.html https://scikit-learn.org/stable/modules/model_evaluation.html#mean-absolute-error In [1]: import numpy as np import matplotlib.pyplot as plt import […]
Machine Learning – Regression Algorithms
Machine learning Algorithms using Scikit-Learn Ref : All the documentation for the functions used can be found at https://scikit-learn.org/stable/ This notebook aims to introduce you to the scikit-learn library that contains a lot of popularly used Machine Learning algorithms. This notebook contains the following section: (1) Regression Each section has a data preparation section […]
Machine Learning Algorithms Scikit-Learn
Machine learning Algorithms using Scikit-Learn Ref : All the documentation for the functions used can be found at https://scikit-learn.org/stable/ This notebook aims to introduce you to the scikit-learn library that contains a lot of popularly used Machine Learning algorithms. This notebook contains the following section: (1) Classification For the classification component, we use the […]
AWS SageMaker Overview
Amazon SageMaker is a fully managed machine learning service. With SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It provides an integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so […]
Machine Learning Overview
Machine Learning = building a model from example inputs to make data-driven predictions vs following strictly static program instructions. Traditional programming contains logic that the machine must follow to execution. Machine Learning does not have same logic like traditional if, loops, case etc. Instead, it is based on data and some given algorithm. With that […]