Unofficial PyTorch implementation starter for DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation (CVPR 2025).
If this repo saves you reading / reproduction time, please star it and follow @StaryMoon. I am building honest open reproduction starters for recent CVPR papers.
This repository is an independent, unofficial, work-in-progress starter.
- Paper: DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
- Venue: CVPR 2025
- Reproduction status: benchmarks are not reproduced yet
- Relationship to authors: this repo is not official and is not affiliated with the paper authors.
This v0.1.0 starter implements a compact, readable scaffold inspired by the paper:
- separate visual and language token pathways
- cross-modal fusion block
- understanding and generation heads
- toy contrastive/generation loss
- smoke-test script
The goal is to make the high-level idea easy to inspect, fork, and improve.
- large language model backbone
- image diffusion backend
- large-scale instruction tuning
- official evaluation protocol
git clone https://github.com/StaryMoon/DiffSensei-Unofficial.git
cd DiffSensei-Unofficial
pip install -r requirements.txt
python scripts/smoke_test.pyExpected output includes:
loss: ...
logits: torch.Size([2, 8, 32])
import torch
from diffsensei_unofficial import UnofficialStarter
image = torch.rand(2, 3, 64, 64)
model = UnofficialStarter(kind="vlm")
out = model(image)- Replace toy modules with a closer implementation of the paper.
- Add dataset loader and config files.
- Add metric scripts and visualization.
- Reproduce a small benchmark or ablation table.
- Add pretrained weights once experiments are stable.
cvpr-2025, manga-generation, diffusion, mllm, pytorch, unofficial-implementation
Please cite the original paper if you use the method. This repo is only an unofficial starter and does not replace the paper.
MIT License. The original paper and official materials remain owned by their respective authors / publishers.