Skip to content

Latest commit

 

History

History
216 lines (151 loc) · 4.73 KB

File metadata and controls

216 lines (151 loc) · 4.73 KB

Data-Classification-Using-AI

Python Scikit-Learn Pandas NumPy Matplotlib Seaborn


A beginner-friendly machine learning classification project that predicts Iris flower species using a Decision Tree Classifier.

This project demonstrates a complete ML workflow including data loading, exploratory data analysis (EDA), visualization, model training, evaluation, prediction, and performance analysis.


Project Overview

This project performs end-to-end classification using the Iris dataset and covers the complete machine learning pipeline.

The system:

  • Loads and explores data
  • Generates visual analysis
  • Trains a classification model
  • Evaluates prediction performance
  • Produces graphs and insights

Key Highlights

  • Complete machine learning workflow
  • Beginner-friendly implementation
  • Uses Decision Tree Classification
  • Multiple visualizations included
  • Performance evaluation and reporting
  • Feature importance analysis

Dataset Information

Property Details
Dataset Iris Dataset
Source Scikit-learn
Total Records 150
Features Sepal Length, Sepal Width, Petal Length, Petal Width
Classes Setosa, Versicolor, Virginica

Classification Model

Algorithm Used

Decision Tree Classifier

A supervised machine learning algorithm used to classify flower species based on feature values.

Model Configuration

DecisionTreeClassifier(
max_depth = 3
)

Model depth is restricted to reduce complexity and minimize overfitting.


Project Workflow

Load Dataset
↓
Data Exploration
↓
Data Visualization
↓
Train-Test Split
↓
Model Training
↓
Prediction
↓
Evaluation
↓
Performance Analysis

Model Performance

Metric Score
Training Accuracy 95.83%
Testing Accuracy 100.00%

Additional evaluation includes:

  • Classification Report
  • Confusion Matrix
  • Feature Importance
  • Overfitting Check

Generated Visualizations

File Purpose
histogram.png Feature distribution
boxplot.png Feature spread analysis
heatmap.png Correlation analysis
pairplot.png Feature relationship analysis
scatterplot.png Petal comparison

All graphs are stored inside the graphs/ folder.


Project Structure

Data-Classification-Using-AI/
│
├── classification.py
├── README.md
├── requirements.txt
│
└── graphs/
    ├── histogram.png
    ├── boxplot.png
    ├── heatmap.png
    ├── pairplot.png
    └── scatterplot.png

Installation Guide

Clone repository:

git clone https://github.com/anmol396/Data-Classification-Using-AI.git

Move to project folder:

cd Data-Classification-Using-AI

Install dependencies:

pip install -r requirements.txt

Run Project

python classification.py

Technologies Used

Category Technology
Language Python
Data Processing Pandas, NumPy
Machine Learning Scikit-learn
Visualization Matplotlib, Seaborn

Learning Outcomes

After completing this project:

  • Understand classification workflow
  • Build Decision Tree models
  • Perform exploratory data analysis
  • Evaluate model performance
  • Generate visual insights
  • Interpret feature importance

Future Improvements

  • Add Random Forest comparison
  • Hyperparameter tuning
  • Interactive dashboard
  • Model deployment
  • Support additional datasets

Developed as part of Project 2 – Data Classification Using AI