YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ’» HomePage   |   πŸ€— GitHub   |   πŸ€– Demo

LogicsDocBench results


OmniDocBench-v1.5 results

Updates

  • [2026/02/13] πŸš€πŸš€πŸš€πŸš€πŸš€ We release Logics-Parsing-v2 Model.
  • [2025/09/25] πŸš€πŸš€πŸš€We release Logics-Parsing Model.

Introduction

Logics-Parsing-v2 is an advanced evolution of the previously proposed Logics-Parsing (v1). It inherits all the core capabilities of v1 model, while demonstrating more powerful capabilities on handling complex documents. Furthermore, it extends support for Parsing-2.0 scenarios, enabling structured parsing of musical sheets, flowcharts, as well as code/pseudocode blocks.

LogicsDocBench ζ¦‚θ§ˆ

Key Features

  • Effortless End-to-End Processing

    • End-to-end recognition and parsing for various kinds of document elements within a single model.
    • Handles complex-layout and text-dense documents such as newspapers and magazines with exceptional precision and ease;
  • Advanced Content Recognition

    • Smaller in size, greater in performance, delivering more accurate and structured parsing of tables and scientific formulas.
    • Introducing Parsing-2.0: natively supports parsing of diverse structured content, including flowcharts, music sheets and pseudocode blocks.
  • Rich, Structured HTML Output

    • Transforms documents into concise HTML -- capturing not just content, but also element types, spatial layouts, and semantic hierarchy.
    • More scientific and intuitive formats for structured elements -- such as Mermaid for flowcharts and ABC notation for musical scores.
  • State-of-the-Art Performance

    • SOTA across the board: Logics-Parsing-v2 sets top records on both our in-house benchmark (overall score: 82.16) and the renowned public benchmark OmniDocBench-v1.5 (overall score: 93.23).

Benchmark

Comparisons on LogicsDocBench

We introduce LogicsDocBench, a new comprehensive evaluation benchmark comprising 900 carefully selected PDF pages, covering both traditional document Parsing-1.0 tasks and the newly introduced Parsing-2.0 scenarios. This benchmark is designed to better assess models’ capabilities in complex and diverse real-world documents parsing. The dataset is organized into three core document subsets:

  • STEM Documents (218 pages):

    Focuses on high-difficulty academic and educational content, spanning over ten domains including physics, mathematics, engineering, and interdisciplinary sciences. This subset evaluates deep understanding of mathematical formulas, technical terminology, and structured knowledge representation.

  • Complex Layouts (459 pages):

    Includes challenging real-world layouts such as multi-column text, cross-page tables, vertical writing, and mixed text-image arrangements. This subset comprehensively evaluate a model’s layout analysis abilities.

  • Parsing-2.0 Content (223 pages):

    Targets modern digital and semi-structured content that poses significant challenges for traditional OCR systems, including:

    • Chemical Molecular formulas
    • Musical sheets
    • Code and pseudo-code block
    • Flowcharts and mind maps

For Parsing-1.0 tasks, we adopt the same evaluation protocols as OmniDocBench-v1.5 to ensure fairness and consistency across benchmarks. For Parsing-2.0, we report fine-grained results using edit distance for each subcategory, and compute an overall score as follows:

Overall=Parsing1.0OverallΓ—3+(1βˆ’ChemistryEdit)Γ—100+(1βˆ’CodeEdit)Γ—100+(1βˆ’ChartEdit)Γ—100+(1βˆ’MusicEdit)Γ—1007\small \text{Overall} = \frac{Parsing1.0^{Overall} \times 3 + (1-{Chemistry}^{Edit})\times 100 + (1-{Code}^{Edit})\times 100 + (1-{Chart}^{Edit})\times 100 + (1-{Music}^{Edit})\times 100}{7}

Comprehensive evaluation of document parsing on LogicsDocBench is listed as follows:

The histogram below provides a more intuitive visualization of the advantages of our Logics-Parsing-v2 model in both Parsing-1.0 and 2.0 scenarios.


Comparisons on OmniDocBench_v1.5

We also provide the experimental results of our newly proposed Logics-Parsing-v2 model on the widely recognized open-source benchmark OmniDocBench-v1.5. As shown in the table below, Logics-Parsing-v2 achieves the highest scores among all other approaches, demonstrating its effectiveness and superiority.

* The model results in the table are sourced from the official OmniDocBench website.

Quick Start

1. Installation

conda create -n logis-parsing-v2 python=3.10
conda activate logis-parsing-v2

pip install -r requirements.txt

2. Download Model Weights

# Download our model from Modelscope.
pip install modelscope
python download_model_v2.py -t modelscope

# Download our model from huggingface.
pip install huggingface_hub
python download_model_v2.py -t huggingface

3. Inference

python3 inference_v2.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL

Showcases

Acknowledgments

We would like to acknowledge the following open-source projects that provided inspiration and reference for this work:

Downloads last month
31
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Logics-MLLM/Logics-Parsing-v2

Quantizations
3 models