You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

legal-reference-extraction-base-de

A fine-tuned EuroBERT-210m encoder for German legal citation extraction: detecting references to laws (e.g. § 823 BGB, Art. 14 GG) and court decisions (e.g. 1 BvR 123/89) in German legal text via BIO token classification.

This model is the default transformer backbone for the refex library.

Task

Sequence token classification over 5 BIO labels:

id	label	meaning
0	`O`	Outside any citation
1	`B-LAW_REF`	Beginning of a law-citation span
2	`I-LAW_REF`	Inside a law-citation span
3	`B-CASE_REF`	Beginning of a case-citation span
4	`I-CASE_REF`	Inside a case-citation span

Output spans can be consumed directly or routed through refex's TransformerExtractor, which assembles them into typed LawCitation / CaseCitation objects with span, book, number, court, file_number, date fields.

Evaluation

Evaluated on an unreleased benchmark of 1,009 German court decisions (test split, locked-in once). Numbers reported using the benchmark's span-level F1 metric; exact requires character-perfect (start, end) agreement with the gold annotation, overlap requires any character-level intersection.

Engine	span F1 (exact)	span F1 (overlap)	Law F1 (overlap)	Case F1 (overlap)	Throughput (docs/s)	Median ms/doc
regex baseline (CPU)	0.737	0.860	0.872	0.828	455.9	1.1
regex + CRF (CPU)	0.740	0.878	0.891	0.846	106.4	6.4
this model (MPS)	0.533	0.909	0.932	0.855	1.5	467.4
regex + this model (MPS)	0.743	0.889	0.905	0.849	1.5	467.3

How to read the two span-F1 columns

Span exact is character-perfect. A pure transformer works at whitespace-word granularity, so its span boundaries rarely match an annotator's character-level trimming (trailing punctuation, enclosing parens, etc.), which is why exact F1 is lower than overlap.
Span overlap is the right metric for "did we locate a citation in the right place", which is what matters when a downstream step re-parses the span into structured fields.

Headline: on overlap F1 this model beats the regex baseline by +4.9 pp and the regex + CRF ensemble by +3.1 pp, driven primarily by recall on law citations (Law overlap F1 +6.0 pp over regex). Ensembling regex + this model gives the best exact F1 (0.743) — the right choice when you need precise character boundaries as well as recall.

Training-time validation metrics

During training, on a held-out validation split, the seqeval span-level F1 reached 0.8743 (precision 0.839, recall 0.913) after three epochs. Losses converged smoothly (eval_loss 0.0344 → 0.0232). This seqeval number is measured on the transformer's own tokenisation; the unreleased benchmark numbers above are the character-span-level re-alignment used for engine comparison.

Inference speed

All numbers above are Apple Silicon MPS, single document, batch size 1. On CUDA with batching, expect 100–500× higher throughput (base-sized EuroBERT comfortably hits 1 000+ docs/s on a single modern GPU). For CPU-only deployments where sub-10 ms latency per document matters, stick with refex's regex or regex + CRF engines — this model is recall-first, not latency-first.

Usage

Via the `refex` library (recommended)

pip install legal-reference-extraction[transformers]

from refex.engines.transformer import TransformerExtractor

extractor = TransformerExtractor(
    model="openlegaldata/legal-reference-extraction-base-de",
    device="mps",  # or "cuda" / "cpu"
)

citations, relations = extractor.extract(
    "Gemäß § 823 Abs. 1 BGB haftet der Schädiger. "
    "Vgl. auch BVerfG, Urt. v. 12.03.1990 – 1 BvR 123/89."
)

for c in citations:
    print(c.type, c.span.text, "->", getattr(c, "book", None) or getattr(c, "court", None))

Via `transformers` directly

from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained(
    "openlegaldata/legal-reference-extraction-base-de",
    trust_remote_code=True,
)
model = AutoModelForTokenClassification.from_pretrained(
    "openlegaldata/legal-reference-extraction-base-de",
    trust_remote_code=True,
)

ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple",
)

text = "Nach § 242 BGB sowie Art. 2 GG ist dies zu beachten."
for span in ner(text):
    print(span)

trust_remote_code=True is required because EuroBERT ships custom modeling code (modeling_eurobert.py, configuration_eurobert.py).

Intended use

German legal document processing: extracting law (§ / Art.) and case-number references as structured spans for downstream linking, indexing, or redaction.
Drop-in backbone for the refex library's TransformerExtractor.
Research into German legal NLP.

Out of scope

Languages other than German. The base model is multilingual but fine-tuning targets German legal prose specifically; performance on other languages is not evaluated.
Non-legal domains.
Production where low-latency CPU inference matters (see the speed table above) — use the regex engine.
Any commercial use (see license).

Limitations

Span boundaries are emitted at the transformer's token granularity, not the character level. Exact-match F1 is correspondingly lower than overlap F1; if you need character-perfect boundaries, post-process with the regex engine (see the ensemble row above).
Confidence is a single label-argmax per token; no calibration has been performed.
Model inherits any biases from the EuroBERT-210m pre-training corpus.
Rare law codes, obscure court formats, and historically unusual citation formats will be under-represented relative to the high-frequency patterns in modern German court decisions.

License

CC BY-NC 4.0 — Creative Commons Attribution-NonCommercial 4.0 International.

You may redistribute and adapt this model for non-commercial use provided you give appropriate attribution. For commercial licensing inquiries, contact Open Legal Data.

The underlying EuroBERT-210m base model is distributed under Apache-2.0; this fine-tuned checkpoint is a derivative work and the CC BY-NC 4.0 terms here apply to the fine-tuned weights and model card.

Citation

@inproceedings{10.1145/3383583.3398616,
author = {Ostendorff, Malte and Blume, Till and Ostendorff, Saskia},
title = {Towards an Open Platform for Legal Information},
year = {2020},
isbn = {9781450375856},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3383583.3398616},
doi = {10.1145/3383583.3398616},
booktitle = {Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020},
pages = {385–388},
numpages = {4},
keywords = {open data, open source, legal information system, legal data},
location = {Virtual Event, China},
series = {JCDL '20}
}

Contact

Issues and feedback: https://github.com/openlegaldata/legal-reference-extraction/issues

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for openlegaldata/legal-reference-extraction-base-de

Base model

EuroBERT/EuroBERT-210m

Finetuned

(68)

this model