Model Description

This is a fine-tuned version of the PaddleOCR v5 Server Detection Model. It has been trained on a dataset of manga speech bubble crops to improve detection for:

  • Speech Bubbles Lines: Standard dialogue detection.
  • Vertical Text Lines: Improved bounding boxes for Japanese vertical writing (tategaki).
  • Text Lines Outside Bubbles: Narration boxes and floating text.
  • Text Lines With Furigana: Greatly reduced the creation of separate bounding regions for furigana.

This model outputs bounding boxes (polygons) for text regions. It does not perform text recognition; you will need a separate recognition model for that.

Note that this model is still being worked on, and may improve with a better dataset or hyperparameters.

Training Data

The dataset consisted largely of synthetic data due to the limited real samples available.

  • ~400 randomly sampled speech bubble crops from Manga109s
  • ~200k synthetic images

Acknowledgments

This project was done with the usage of:

  • Manga109-s dataset
  • CC-100 dataset
  • MangaOCR synthetic data generation (code was edited for speedups, bounding box additions, and improved representation of manga)
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bluolightning/PaddleOCRv5-Server-Det-For-Manga

Finetuned
(2)
this model

Dataset used to train bluolightning/PaddleOCRv5-Server-Det-For-Manga