This project explores a convolutional neural network (CNN)-based approach for Bird Species Classification. The model's performance is analyzed across various optimization techniques, providing insights into its learning behavior and interpretability through Class Activation Maps (CAM).
-
Model Architecture:
- A robust CNN architecture featuring residual connections, Batch Normalization, and Adaptive Average Pooling for efficient representation learning.
-
Training Insights:
- Performance metrics, including training and validation loss/accuracy trends across epochs, are visualized for clear understanding.
-
Optimization Techniques:
- Impact of methods such as data augmentation, dropout, Compound Scaling Laws, and StepLR scheduling on model accuracy is explored and tabulated.
-
Class Activation Maps (CAM):
- Visualization of regions focused on by the model for classification decisions, providing interpretability and identifying areas of misclassification.
| Layer | Configuration | Output Shape |
|---|---|---|
| Input | Image (300x300x3) | (300, 300, 3) |
| Conv2D + BatchNorm + ReLU | 7x7, stride 2, padding 3 | (64, 150, 150) |
| MaxPool2D | 3x3, stride 2 | (64, 75, 75) |
| Residual Blocks | Several Conv2D layers with skip connections and strides | Intermediate (128 to 512 channels) |
| AdaptiveAvgPool2D | Adaptive Average Pooling | (512, 1, 1) |
| Fully Connected Layer | Linear (512 → 10) | (10) |
The table captures the architecture of the birdClassifier CNN, optimized for scalability and feature extraction.
Plots for Loss and Accuracy across epochs highlight model convergence and generalization:
- Training-to-validation split: 80% train, 20% validation.
| Technique | Validation Accuracy | Notes |
|---|---|---|
| No Optimization | 60.13% | Baseline model |
| Compound Scaling Laws | 81.78% | Improved scaling |
| Data Augmentation | 89.98% | Enhanced generalization with augmented data |
| Dropout | 86.50% | Slower learning but robust performance |
| StepLR (all combined) | 90.72% | Best accuracy achieved |
CAM visualizations provide insights into the model's decision-making:
- Heatmaps reveal the image regions most influential in classification.
- Misclassified examples show scattered attention, indicating potential areas for further optimization.
- Correct classifications emphasize relevant textures and objects in the images.
- Misclassified examples display poor focus, leading to incorrect predictions.
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Running the Code
# Running code for training. save the model in the same directory with name "bird.pth" python bird.py path_to_dataset train bird.pth # Running code for inference python bird.py path_to_dataset test bird.pth
- Karan Deo Burnwal
This work was completed as part of COL333 Assignment 3.1 at IIT Delhi, exploring advanced deep learning techniques under academic supervision.



