an izutsumi identification project
i do love izutsumi
The goal of this project is to answer the age-old question of "is Izutsumi in this image?"
I am using a two-stage pipeline: detect (object bounding) and identify (classification).
Own dataset. I used Label Studio to manually annotate bounding boxes on Dungeon Meshi manga pages. The labels used allow for the usage of the same dataset in both stages.
Each volume of manga corresponds to a single XML file. The XML files are UTF-8 encoded and structured in the following way.
* **`book`** Title of the work
* **`characters`** List of characters appearing in the work
* **`character`**
* **`character`**
* ...
* **`pages`** List of pages in double-page spread manner
* **`page`**
* **`page`**
* ...character gives the correspondence between each character's name and its ID. page gives an overview (page number, image size) and the objects' information in the page. There are four types of objects; each has an object ID, rectangular area (xmin, xmax, ymin, ymax), and additional information specific to the object type. All IDs are 8-digit hexadecimal numbers and are unique throughout this dataset.
- 01
face: Face of a character.-
Character ID is attached.
-
- 02
body: Body of a character.-
Character ID is attached.
-
- 03
text: Typed text and some handwritten text.-
Text content is recorded.
-
- 04
frame: Frame of a panel.-
No additional information.
-
I used manga109api to parse and normalize this dataset.
We have three datasets:
High-quality manually annotated head detection data. Includes images sampled from gelbooru and fancaps. All heads with human characteristics (including front, side, and back views) are marked, while non-human heads without significant humanoid features are not considered as "heads".
High-quality manually annotated head detection dataset, containing images from bangumi and a large number of complex multi-person illustrations, which can significantly improve the training quality of the v1.0 dataset. Recommended to be used together with v1.0 dataset.
A high-quality third-party dataset that can be used for training directly. Although its name includes face, but what it actually annotated are head.
nyuuzyou/AnimeHeadsv3 (COCO, augmented)
The dataset with augmentation has the following preprocessing parameters:
Resize: Fit within 640x640The following augmentation parameters were applied to the dataset:
Outputs per training example: 3
Flip: Horizontal
Saturation: Between -40% and +40%
Blur: Up to 4px
Noise: Up to 4% of pixels