Machine Learning Exercises using ML models/algorithms

Assignment 2 Implementation of the Naive-Bayes classifier from scratch

A custom Python implementation of the Naive Bayes algorithm, featuring Laplace smoothing and hyperparameter analysis.

This repository contains a manual implementation of the Naive Bayes classifier. It is designed to handle categorical datasets using discrete probability distributions and includes a workflow that averages accuracy over multiple random splits.

Features

Custom Implementation: The Training & Prediction sections are built entirely with NumPy and Pandas, without relying on pre-built classifier libraries.
Laplace Smoothing: Implements smoothing (controlled by parameter $L$) to handle the zero-frequency problem in categorical data.
Evaluation: The iter_NBC function runs the model 100 times with different random train/test splits to calculate an average accuracy.
Hyperparameter Tuning: Includes an experiment (L_effect) to analyze how different smoothing values ($0.01, 1, 10, 100...$) impact model performance.
Scikit-Learn Comparison: Contains a wrapper to compare the custom model against sklearn's GaussianNB.

##Prerequisites

To run this script will need Python 3.x and the following data science libraries:

pip install numpy pandas scikit-learn

Assignment 3

This script is a comparative benchmark of three different classification algorithms (Naive Bayes, Logistic Regression, and a Trivial Baseline) across three different types of datasets (Categorical, Continuous, and Mixed). It specifically analyzes how training set size affects model performance.

This project evaluates classification algorithms on Categorical, Continuous, and Mixed datasets. It implements a rigorous testing pipeline that analyzes how model accuracy changes as the size of the available training data increases (from 50% to 100%).

Features

Handles three distinct data types: * Categorical: Uses CategoricalNB and One-Hot Encoding. * Continuous: Uses GaussianNB. * Mixed: Uses a custom MixedNB implementation for hybrid data.

Algorithm Comparison:
1. Naive Bayes: The primary generative model.
2. Logistic Regression: The primary discriminative baseline.
3. Trivial Classifier: A majority-class baseline to establish the minimum acceptable performance.
Learning Curve Analysis: Tests the models on $K%$ of the training data ($K \in [50, 60, ..., 100]$) to visualize how data differences impacts performance.
Statistical Accuracy testing: Every experiment is repeated 100 times with random splits to report the average accuracy to ensure results.

Prerequisites

The script requires Python 3.x and the following libraries.

Note: This script relies on a local module named mixed_naive_bayes. Ensure mixed_naive_bayes.py is present in your root directory.

pip install numpy pandas matplotlib scikit-learn

Script: question_3.py This script demonstrates the impact of L1 Regularization (Lasso) on Logistic Regression weights. It visually proves how increasing the regularization strength ($\lambda$) forces model coefficients toward zero, effectively performing feature selection.

The script trains four different Logistic Regression models on the same binary classification task (Iris Setosa vs. Others):

No Regularization: (Approximated with $C = 10^{10}$) - The baseline weights.
Lasso ($\lambda = 0.5$): Mild regularization.
Lasso ($\lambda = 10$): Strong regularization.
Lasso ($\lambda = 100$): Very strong regularization.

It then plots the magnitude of the learned weights ($w_0, w_1, w_2, w_3$) for each model.

L1 Regularization (Lasso)

Lasso adds a penalty term to the loss function equal to the absolute value of the magnitude of coefficients:

$$Loss = \text{Likelihood Error} + \lambda \sum_{j=1}^{p} |w_j|$$

Assignment 4

This script implements a Random Forest Classifier from scratch (using Bagging and Feature Randomness) and compares it against Scikit-Learn's implementation.

This project implements a Random Forest classifier using Bagging (Bootstrap Aggregating) logic wrapped around standard Decision Trees. It investigates the stability and accuracy of ensemble methods by visualizing how the Forest outperforms individual Decision Trees.

Features

Custom Implements TrainRF and PredictRF functions that handle:
- Bootstrapping: Randomly sampling data with replacement to create diverse training sets.
- Feature Subsetting: Restricting each tree to $\sqrt{N}$ features to decorrelate the trees.
- Majority Voting: Aggregating predictions from all trees to determine the final class.
Hyperparameter Analysis: Compares performance with different min_samples_leaf values (1 vs. 10) to observe the trade-off between overfitting and generalization.
Statistical Analysis: Calculates the exact probability that a single random tree could outperform the entire forest.
Benchmarking: Includes a direct comparison against sklearn.ensemble.RandomForestClassifier to validate the custom implementation's accuracy.

Prerequisites

You will need Python 3 and the following libraries:

pip install numpy pandas matplotlib scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Assignment3_ML		Assignment3_ML
Assignment4_ML		Assignment4_ML
Assignment_2_ML		Assignment_2_ML
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Exercises using ML models/algorithms

Assignment 2 Implementation of the Naive-Bayes classifier from scratch

Features

Assignment 3

Features

Prerequisites

L1 Regularization (Lasso)

Assignment 4

Features

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Exercises using ML models/algorithms

Assignment 2 Implementation of the Naive-Bayes classifier from scratch

Features

Assignment 3

Features

Prerequisites

L1 Regularization (Lasso)

Assignment 4

Features

Prerequisites

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages