Kaggle competition: bee species image classification across 47 classes with a severely imbalanced dataset.
- Task: Multi-class image classification (47 bee species)
- Train set: 7,982 images (4–718 samples per class)
- Test set: 12,878 unlabelled images
- Challenge: Severe class imbalance + fine-grained visual similarity between species
Beeeesss/
├── src/
│ ├── data_processing.py # Dataset, transforms, augmentation
│ ├── trainer.py # Model-agnostic training framework
│ ├── optuna_search.py # Hyperparameter optimization (Optuna)
│ ├── analyze_models.py # F1, confusion matrix, Grad-CAM analysis
│ ├── create_submission_file.py # Generate Kaggle submission CSV
│ ├── infer.py # Lightweight inference script
│ ├── explore_data.ipynb # EDA: class distribution, imbalance analysis
│ ├── resnet50.ipynb # ResNet50 baseline
│ ├── kaggle_train.ipynb # Full training pipeline for Kaggle
│ ├── kaggle_optuna.ipynb # Hyperparameter search on Kaggle
│ ├── augmentation_visualization.ipynb
│ ├── bee_foreground_swap_viz.ipynb
│ └── build_balanced_dataset.ipynb
├── data/ # Original dataset (gitignored)
├── data_balanced/ # Augmented balanced dataset (v1)
├── data_balanced_v2/ # Augmented balanced dataset (v2)
├── models/ # Saved checkpoints (gitignored)
├── submission_folder/ # Kaggle submission CSVs
├── config/
│ └── default.yaml # Centralized hyperparameter config
└── pyproject.toml
data/train.csv: columnsid(e.g.train/Andrena bicolor/abc.jpg) andlabel(int, 0-based)data/test.csv: columnidonlydata/class-mapping.txt: one species name per line, index = class label- Images:
data/train/{species}/{hash}.jpg
- Weighted sampling (
WeightedRandomSampler) to oversample rare classes during training - Focal Loss (configurable gamma) to focus training on hard/rare examples
- Balanced datasets:
data_balanced/anddata_balanced_v2/created via augmentation (seebuild_balanced_dataset.ipynb) - SAM-based foreground swap (
BeeForegroundSwap): extracts bee foreground using Segment Anything Model and composites onto new backgrounds for minority classes
Using torchvision.transforms.v2 with:
SquarePad→ resize (no distortion)- Random horizontal/vertical flips, rotation, color jitter, perspective
MixUpandCutMixbatch-level augmentation (configurable viaconfig/default.yaml)
| Model | Dataset | Notes |
|---|---|---|
| ResNet50 | Original | Baseline |
| ResNet50 | Balanced | With Optuna-tuned hyperparams |
| EfficientNet-B3 | Balanced | Via timm |
| ViT | Balanced | Vision Transformer |
- Differential learning rates: lower LR for backbone, higher for classification head
- Optimizers: AdamW, Adam, SGD
- Schedulers: Cosine Annealing, Step LR, warmup
- EMA (Exponential Moving Average) of weights
- Checkpoint save/load with
best_model.pthtracking
Optuna tunes 5 parameters: lr_backbone, lr_head, weight_decay, augmentation alpha, focal gamma.
- 15 trials x 15 epochs, MedianPruner for early stopping
- Results stored in SQLite (resumable)
# Install dependencies with uv
uv sync
# Or with pip
pip install -e .Requires Python >= 3.11.
Edit config/default.yaml then run:
uv run python src/trainer.pyOr use the Kaggle notebooks (kaggle_train.ipynb, kaggle_optuna.ipynb) for GPU training on Kaggle.
uv run python src/optuna_search.pyuv run python src/create_submission_file.pyuv run python src/analyze_models.pyProduces per-class F1 scores, confusion matrices, and Grad-CAM visualizations in fig/analysis/.
All hyperparameters are in config/default.yaml:
data:
num_classes: 47
image_size_train: 160
image_size_test: 224
weighted_sampler: true
training:
epochs: 30
batch_size: 32
optimizer:
lr_backbone: 1.0e-4
lr_head: 1.0e-3
weight_decay: 1.0e-4
loss:
type: focal
gamma: 2.0
class_weights: trueKey packages (managed with uv):
torch >= 2.10,torchvision >= 0.25timm >= 1.0.25(EfficientNet, ViT architectures)transformers >= 4.40(SAM for foreground extraction)optuna >= 4.7(hyperparameter search)scikit-learn,pandas,opencv-python,matplotlib,seaborn