SongPanda 2.0

SongPanda 是一个面向中文古籍刻本理解的视觉语言模型，基于 PaddleOCR-VL-1.5 通过全参微调得到。相较于通用 OCR 模型，它能在识别正文的同时：

🎯 自动删去版心无关正文的字段；
🎯 以 <footnote> 标签区分双行小字夹注；
🎯 以 <head> 标签识别眉批，以 \n、\f 标识列末换列、换半页。

面对一张带版心、双行小字夹注的清刻本书影，通用视觉大模型（如 doubao）会把版心的卷次信息误识为正文，而 SongPanda 能准确去除版心、识别正文、并以 <footnote> 标签还原双行小字夹注的阅读顺序。

🚀 快速开始

from PIL import Image
from paddleformers.transformers import AutoModelForCausalLM, AutoProcessor

MODEL = "ningzhuo/SongPanda2.0"

processor = AutoProcessor.from_pretrained(MODEL, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL, dtype="bfloat16", trust_remote_code=True)
model.eval()

image = Image.open("your_ancient_book.jpg").convert("RGB")
prompt = "请对这张古籍图像进行 OCR，删去版心、识别正文，小字夹注用 <footnote></footnote> 标出。"

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text", "text": prompt},
]}]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pd")
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
print(processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0])

完整示例见项目仓库 demo/demo.py。

🙏 致谢

PaddleOCR-VL：基座模型与训练框架；
vRain（兀雨书屋）：古籍合成工具；
全球汉籍影像开放集成系统：SongPanda-Bench 真实书影来源；
黄永年《古籍版本学》、陈正宏《东亚汉籍版本学初探》：Benchmark 版本学分类框架。

License

本模型采用 Apache-2.0 协议发布。

license: apache-2.0

Downloads last month: 18

Safetensors

Model size

0.9B params

Tensor type

BF16

Model tree for ningzhuo/SongPanda2.0

Base model

baidu/ERNIE-4.5-0.3B-Paddle

Finetuned

PaddlePaddle/PaddleOCR-VL-1.5

Finetuned

(5)

this model