This project explores using Convolutional Neural Networks (CNNs) to address image classification tasks from real-world applications in computational pathology and computer vision. The project is divided into two main tasks:
-
Training and Analyzing CNN Models on Pathology Data:
- Train CNNs on a Colorectal Cancer Tissue Dataset.
- Analyze and visualize feature representations using t-SNE.
- Quantitative analysis of the model's performance
-
Knowledge Transfer and Cross-Dataset Analysis:
- Transfer learned feature representations to other datasets (Prostate Cancer and Animal Faces).
- Compare feature extraction against a pre-trained CNN encoder trained on ImageNet.
- Use a classical machine learning algorithm (Random Forest) to classify the extracted features on the Prostate Cancer and Animal Faces datasets.
The main objectives are:
- To study how CNNs generalize across datasets.
- To conduct a detailed analysis of model performance using t-SNE visualizations.
- To classify extracted features using classical machine learning methods.
To set up the environment, use:
pip install -r requirements.txtThe datasets used in this project are:
- Colorectal Cancer Tissue Dataset (used for training CNNs).
- Prostate Cancer Dataset (used for cross-dataset analysis).
- Animal Faces Dataset (used for cross-dataset analysis).
They are already Downloaded if you clone the Github Repository, If you want to download them:
The datasets can be downloaded from the following sources:
- Colorectal Cancer Tissue Dataset: Download Link
- Prostate Cancer Dataset: Download Link
- Animal Faces Dataset: Download Link
Once downloaded, organize the datasets into the following directory structure:
- Data/Colorectal Cancer
- Data/Prostate Cancer
- Data/Animal Faces
- Data/Sample Test Dataset (Prostate Cancer)/Prostate Cancer
To train and validate your CNN model on the Colorectal Cancer Dataset, follow these steps:
- Open the
task_1.ipynbnotebook in Jupyter Notebook or JupyterLab. - Ensure the Colorectal Cancer Dataset (the sample DataSet with just 100 samples) is placed in the correct directory:
Data/Sample Test Dataset (Prostate Cancer)/Prostate Cancer/. - Execute all the cells in the notebook to:
- Preprocess the data.
- Train the CNN model.
- Validate the model performance on the validation set.
- Once training is complete, the trained model will be automatically saved as
resnet50_colorectal.pthin thetask_1/directory. - t-SNE visualizations of the model's feature representations will be saved in the
Figures/directory.
To evaluate the pre-trained model on the provided test dataset, follow these steps:
- Extract the test dataset from the project ZIP file.
- Open the
task_1.ipynbnotebook in Jupyter Notebook or JupyterLab. - Locate the section titled "Evaluate Pre-Trained Model" in the notebook.
- Ensure the pre-trained model file,
resnet50_colorectal.pth, is present in thetask_1/directory. - Modify the
test_dirvariable in the notebook to point to the test dataset directory (Data/Colorectal Cancer/Test/). - Run the evaluation cells to compute:
- Performance metrics (e.g., accuracy, precision, recall, F1-score).
- Visualizations of feature representations on the test dataset.
The evaluation results, including metrics and visualizations, are displayed in the notebook and saved to the Classification_Reports/ and Figures/ directories.