Model Description

This is a fine-tuned version of the PaddleOCR v5 Server Detection Model. It has been trained on a dataset of manga speech bubble crops to improve detection for:

Speech Bubbles Lines: Standard dialogue detection.
Vertical Text Lines: Improved bounding boxes for Japanese vertical writing (tategaki).
Text Lines Outside Bubbles: Narration boxes and floating text.
Text Lines With Furigana: Greatly reduced the creation of separate bounding regions for furigana.

This model outputs bounding boxes (polygons) for text regions. It does not perform text recognition; you will need a separate recognition model for that.

Note that this model is still being worked on, and may improve with a better dataset or hyperparameters.

Training Data

The dataset consisted largely of synthetic data due to the limited real samples available.

~400 randomly sampled speech bubble crops from Manga109s
~200k synthetic images

Acknowledgments

This project was done with the usage of:

Manga109-s dataset
CC-100 dataset
MangaOCR synthetic data generation (code was edited for speedups, bounding box additions, and improved representation of manga)

Downloads last month: 11

Model tree for bluolightning/PaddleOCRv5-Server-Det-For-Manga

Base model

PaddlePaddle/PP-OCRv5_server_det

Finetuned

(2)

this model

bluolightning
/

PaddleOCRv5-Server-Det-For-Manga

Model Description

Training Data

Acknowledgments

Model tree for bluolightning/PaddleOCRv5-Server-Det-For-Manga

Dataset used to train bluolightning/PaddleOCRv5-Server-Det-For-Manga