Skip to content

utki007/comic-face-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Face2Comic: Learning Comic Stylization of Faces using Paired GANs

1. Project Name & Overview

a) Project Name

Face2Comic: Learning Comic Stylization of Faces using Paired GANs

b) Overview

Face2Comic learns a supervised mapping (real → comic) that preserves identity, pose, and key facial features. The pipeline includes preprocessing, hyperparameter tuning, final training, and quantitative + qualitative evaluation on a held-out test set.

The project also evaluates real world generalization and includes a lightweight web app for browser based comic generation from uploaded images.

i. Model

  • Generator: U-Net encoder-decoder with skip connections for preserving spatial structure
  • Discriminator: PatchGAN operating on image patches to enforce local realism

ii. Losses

  • Adversarial (BCE): encourages realism
  • L1 (pixel): enforces reconstruction
  • Final generator objective: G_loss = BCE + lambda_L1 x L1

iii. Training & Tuning

  • Preprocessing resizes to 256×256, creates paired 80/10/10 splits, normalizes to [-1, 1], and runs integrity checks.
  • Conduct Hyperparemeter Tuning over 27 configuration
  • Applied paired augmentations on the fly for training dataset
  • Final training up to 300 epochs. Logs
  • Added linear learning-rate decay starting at epoch 150.
  • Used label smoothing with real labels set to 0.9 for more stable GAN training.

2. Installation & run instructions.

Model

Frontend

From the repository root (runs on epoch 263):

pip install -r requirements-app.txt
npm install
npm run dev

Inference.py is being used to run my frontend model end-to-end.

3. Results of the Model

4. Extra Criteria Pursued

a) Hyperparameter Tuning

Performed hyperparameter grid search for a subset of dataset (2000 train, 500 validation) over learning rate, batch size, and lambda_L1 across 27 configurations (10 epochs each). The best configuration used LR = 5e-4, Batch Size = 32, and lambda_L1 = 100, achieving a validation L1 loss of 0.2262. Grid Search Samples

b) Gallery GUI / Web Application

The React GUI lets users upload an image, auto crops and resizes it to 256×256, and sends it to a backend for instant comic generation with download, copy, and regenerate options. It also includes an optional enhancement toggle for improved preprocessing and postprocessing effects or a raw output mode.

5. Difficulties Faced & Solutions

A key challenge was poor generalization to real world images due to the synthetic training distribution. This was improved using stronger augmentations (noise, color jitter, blur), which significantly increased robustness. Training stability was further improved with label smoothing, linear learning rate decay, and longer training, resulting in more consistent and realistic outputs.

About

Conditional GAN for translating real human faces into comic-style images using paired datasets, with a Node.js inference interface.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages