Skip to content

andreleo02/fine-grained-image-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

425 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine-Grained Image Classification

Computer Vision Project

This project aims to conduct an exploratory analysis of fine-grained image classification, a complex task in computer vision requiring deep learning models to discern subtle differences between highly similar images. The performance of two convolutional neural network (CNN) architectures and two transformer-based architectures has been compared across three fine-grained datasets. The experiments were conducted using pre-trained models that had been fine-tuned on these datasets. The focus was on evaluating the accuracy and loss of the models. Our findings highlight the significance of model architecture and training strategies in attaining high performance in fine-grained visual tasks.


Fine grained classification vs general image classification: fine grained classification aims to distinguish between very similar object (red box), while general image classification usually aims to distinguish distinct objects. From Aoxue Li et al.(2017), "Zero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics".

Project structure

┌─ requirements.txt
├─ src/
│  ├─ models/
│  │  ├─ EfficientNetV2/
│  │  │  ├─ config.yml
│  │  │  ├─ EfficientNet.png
│  │  │  ├─ main.py
│  │  │  └─ README.md
│  │  ├─ ResNet34/
        ...
│  │  ├─ SwinTransformer/
        ...
│  │  ├─ ViT-16/
        ...
│  ├─ data_utils.py
│  ├─ training_utils.py
│  └─ utils.py

  • requirements.txt: list of dependencies required to run the project.

In the src folder you will find:

  • models: containing subdirectories for different models used for our experiments; each model has its README.md with a brief description of its srchitecture, a main.py containing the script to run the model and a config.yml with all the paramters of the model.

  • data_utils.py: utility functions for data handling and preprocessing.

  • training_utils.py: utility functions for model training processes.

  • utils.py: general utility functions used across the project.

Models

To conduct our experiments on fine-grained image classification, we have selected four models, two belonging to the family of convolutional neural networks and two belonging to the family of transformers, for comparative purposes:

If something is missing in this guide, please feel free to open an issue on this repo.

Experiments

To conduct this analysis on fine-grained visual classification, we evaluated the performance of our models on four very popular datasets in the field of computer vision, specifically chosen for fine-grained tasks like the present one.

CUB-200-2011 is an extended version of CUB-200. The extended version roughly doubles the number of images per category and adds new part localization annotations. All images are annotated with bounding boxes, part locations, and at- tribute labels. Images and annotations were filtered by multiple users of Mechanical Turk.


Flowers 102 dataset: The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories.


FGVC-Aircraft dataset: Aircraft, and in particular airplanes, are alternative to objects typically considered for fine-grained categorization such as birds and pets.

Note: Mammalia dataset is not publicly available but was used in the context of the competition of the Introduction to Machine Learning Course and provided by the University of Trento. This dataset contains 100 different classes of mammals. For more detailed information please consult our paper.

Results

During our experiments we trained and validated each model on each dataset and compared their performances. The results of our experiments demonstrated that EfficientNet consistently exhibited the highest accuracy and lowest loss across the different datasets. However, SwinT also exhibited promising results, indicating the potential of transformers for image classification. Both of these models exhibited an optimal balance between complexity and efficiency. SwinT also exhibited the best performance in comparison to ViT16. In contrast, ResNet, despite being a deep and effective architecture, exhibited poorer results compared to EfficientNet.

- VALIDATION ACCURACY

Model CUB Flowers Aircrafts Mammalia
ResNet34 97.77 94.22 66.69 50.58
EfficientNetV2 99.91 95.80 76.49 66.11
ViT16 98.36 88.01 44.99 59.96
Swin-T 98.18 94.12 71.39 66.50

- VALIDATION LOSS

Model CUB Flowers Aircrafts Mammalia
ResNet34 0.10 0.28 1.09 2.11
EfficientNetV2 0.015 0.18 0.78 1.42
ViT16 0.13 0.46 2.23 1.70
Swin-T 0.24 0.21 1.12 1.98

More details about the results of our experiments (including information about the training phase) for each model can be found in our paper.

Note: for the mammalia dataset, in the context of the competition of the Introduction to Machine Learning Course (University of Trento, y. 2024), we also trained ResNet50 and SwinB. More datails about these specific runs can be found in the paper in the section "Competition".

Repository guide

  1. Clone the repository

    git clone https://github.com/andreleo02/deep-dream-team.git
  2. Install the requirements:

    pip install -r requirements.txt
  3. Install the CUDA package (if using GPU):

    Follow instructions.

  4. Track results with Weights & Biases (wandb):

    • Create a profile on wandb.
    • On the first run with wandb config flag set to True, you'll be asked to insert an API KEY. Generate it from the Settings section of your wandb account.
  5. Run experiments:

    To replicate experiments on these models and datasets iside a model folder, use the following command:

    python main.py --config ./config.yml --run_name <run_name>

    Feel free to play with the parameters in the config.yml and have fun!


How to use different models

Follow these steps:

  1. Choose one of the pre-trained models available in PyTorch.

  2. In the models directory, create a new folder for your selected model.

  3. Inside the newly created folder, add the following files:

    • config.yml
    • main.py
    • README.md
  4. Specify the run parameters in the config.yml file.

Here's a sample directory structure:

models/
├── YourModelName/
│   ├── config.yml
│   ├── main.py
│   └── README.md

How to use different datasets

Custom datasets

The datasets can be manually downloaded and added to the src/data folder. This folder is however ignored by git and so it will only exists in the local environment. To keep the process of training the models as smooth as possible, some functions to download libraries directly from the code are defined in the utils.py file. Datasets can be downloaded from web (.zip and .tgz).

Tip

To enable the download of a custom dataset, in the data section of the config.yml file the field custom must be set to True and the url of the dataset must be specified in the download_url field. Specify also the dataset_name field with the name of the compressed download folder.

Torchvision datasets

To choose a dataset from torchvision, set the custom field to False. The dataset function must be specified inside the main.py file of the model (see SwinTransformer model).


Authors

About

Project competition for the Introduction to Machine Learning course (2023/2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages