This repository contains the core technical specifications, implementation frameworks, and performance evaluations for the Image Colorizer application. The system leverages Deep Learning techniques and the OpenCV library to automatically transform gray-scale (black and white) photos and videos into high-quality colorized versions by predicting semantic colors and tones based on visual features.
-
Anik Sur – Department of Computer Science & Engineering, Institute of Engineering & Management, Kolkata, India (2019-23)
-
Abhirup Dutta – Department of Computer Science & Engineering, Institute of Engineering & Management, Kolkata, India (2019-23)
Image colorization is the process of taking an input gray-scale image and producing an output colorized image that represents the semantic colors and tones of the input. In this paper, we use Deep Learning techniques, more specifically a Convolutional Neural Network (CNN) capable of colorizing black and white images. Moreover, the OpenCV library has been used to colorize the black and white images as well as videos. The major utilization of our application lies in the need to colorize any black and white images or videos that were taken during the times when cameras could not capture colored media, as well as any black and white sketches. With the availability of more training data, this application can provide highly accurate results for random user inputs.
Index Terms: Colorization, Vision for Graphics, CNNs, Self-supervised learning.
The deployment code requires a working environment with the following frameworks and libraries:
Core Network Framework: Caffe
Python Scientific Computing Libraries: * numpy (Numerical arrays processing)
pyplot / matplotlib (Data and image visualization)
skimage (Image processing algorithms)
scipy (Advanced scientific computing)
OpenCV (Real-time computer vision execution)
Unlike conventional literature methods that encode photos via the RGB model (an additive light matrix) , this project leverages the CIELAB ("Lab") color space.
L Channel (Lightness): Matches the human perception of lightness and serves as the exact standalone input vector passed into the AI neural network model.
A & B Channels: Represent the green-red and blue-yellow color spectrum components. The AI model is trained explicitly to estimate and predict these remaining components.
Advantage: Lab color space approximations prioritize human visual perception and uniform perceptual consistency far better than standard RGB layouts.
Model Framework: The network utilizes a pre-trained CNN to map gray-scale inputs directly to a distribution over quantized color value outputs.
Layer Structural Design: Each convolution block repeatedly chains 2 or 3 Convolution and ReLU layers, followed by a BatchNorm layer. Crucially, the fundamental layout contains no explicit pooling layers; all resolution shifts are managed directly via spatial downsampling or upsampling between conv blocks.
Dataset Scale: The network undergoes a feed-forward pass training on an extensive dataset containing over 1.3 million photos from ImageNet.
Target Mapping: Source colored photos are systematically disintegrated using the CIELAB model; the L channel acts as the input feature, while A and B vectors serve as classification labels.
To evaluate model improvements, structural pooling experiments were integrated into the core network mapping pipeline to evaluate feature extraction:
Average Pooling: Operates by extracting patch blocks and calculating the uniform mean value of the features in the selected region. If the overall activation magnitudes are low, the computed mean contracts, resulting in reduced contrast. It achieved a testing accuracy of 0.9842.
Max Pooling: Slides a 2D filter over maps to discriminate against less dominant features, selecting strictly the highest activation value within the region. This feeds only the most critical, high-magnitude parameters into subsequent layers. It achieved a testing accuracy of 0.9831.
| Pooling Strategy Configuration | Testing Accuracy |
|---|---|
| Average Pooling Layer | 0.9842 |
| Max Pooling Layer | 0.9831 |
The document reviews legacy architectures to showcase performance constraints on alternative image recognition benchmarks:
LeNet-5 Model [6]: Yields a low testing accuracy of 66% on the CIFAR-10 dataset. Given human-level accuracy sits near 94%, it lacks sufficient recognition capabilities for color tracking.
DanNet Model [7]: Recognized as the first pure deep convolutional neural network to win computer vision contests in 2011, establishing structural foundations for AlexNet (2012), Highway Net (2015), and ResNet layouts.
AlexNet Model [8]: Possesses 60 million parameters spanning 5 convolutional and 3 fully connected layers , logging historic top results of 78.1% and 60.9% on early ImageNet subsets.
The application yielded highly successful results:
-
Achieved an outstanding 32% improvement in accuracy compared to legacy baseline models in delivering the authentic, ground-truth colors of an image.
-
Visual desaturation artifacts were significantly diminished compared to older processing paradigms.
-
Demonstrated that colorization functions as a highly robust task for self-supervised feature learning, operating as an efficient cross-channel encoder.
This work was supported in part by the Grant-in-Aid Projects of the faculty members, Institute of Engineering & Management, Kolkata, West Bengal, India.
-
[1] Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: Proceedings of the IEEE International Conference on Computer Vision. (2015) 415-423.
-
[3] Y. Lecun, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, pp. 2278-2324, 1998.
-
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," In Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
(Delete the place holder .txt file in /models and replace it with the .caffemodel file downloaded from https://drive.google.com/file/d/1N5CxEKOS5jec10I16oWUuxJYnLaKywNh/view?usp=sharing)
-
[5] D. C. Cireşan, U. Meier, L. M. Gambardella, and J. Schmidhuber, "Deep, big, simple neural nets for handwritten digit recognition," Neural Computation, vol. 22, pp. 3207-3220, 2010.
-
[6] Zhang, R., Isola, P., Efros, A.A. (2016). Colorful Image Colorization. In: Computer Vision – ECCV 2016.