SegFormer-B0 Fine-Tuned on CMP Facade Dataset

Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.

Model Details

Architecture: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
Parameters: ~3.7M
Task: Semantic Segmentation
Input Size: 512×512
Classes: 6 unified facade classes

Class Mapping

ID	Class	Description
0	`background`	Sky, ground, non-facade regions
1	`facade_wall`	Main wall surface + moldings, cornices, pillars, sills, deco
2	`window`	Windows + blinds
3	`door`	Doors + shopfronts
4	`balcony`	Balconies
5	`vegetation_occluder`	Vegetation (trained as background since CMP lacks this class)

Training

Dataset: CMP Facade Database — 378 train, 114 test rectified facade images
Original Classes: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
Mapping: 12 CMP classes → 6 unified classes (see mapping above)
Epochs: ~53 (best at epoch 38, mean IoU 0.4856)
Optimizer: AdamW, lr=6e-5
Batch Size: 4 per device (effective batch = 8 with grad accumulation)
Hardware: Tesla T4 GPU

Best Validation Metrics

Metric	Value
Mean IoU	0.4856
Facade Wall IoU	0.867
Window IoU	0.410
Door IoU	0.460
Balcony IoU	0.230
Background IoU	0.467

Usage

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch

# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")

# Load image
image = Image.open("facade.jpg").convert("RGB")

# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Upsample to original size
upsampled = nn.functional.interpolate(
    logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()

Intended Use

Primary: Second-pass segmentation of rectified facades (after homography rectification)
Secondary: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)

Pipeline Role

This model is designed for use in a 2-pass facade segmentation pipeline:

Pass 1: Segment raw street photo → find facade wall region
Rectify facade via homography
Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly

Limitations

Trained only on rectified facade images from CMP. Performance on perspective-distorted street photos will be degraded.
No vegetation data in training set — vegetation_occluder class will detect as background.
Small dataset (378 images) — performance ceiling is moderate.

Citation

Please cite this model if you use it:

@misc{corbetta_segformer_facade_cmp_2026,
  author       = {Marco Corbetta},
  title        = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}

CMP Dataset:

@INPROCEEDINGS{Tylecek13,
  author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
  title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
  booktitle = {Proc. GCPR},
  year = {2013},
}

SegFormer:

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

Downloads last month: 48

Safetensors

Model size

3.72M params

Tensor type

F32

Dataset used to train Marco333/segformer-b0-facade-cmp

Paper for Marco333/segformer-b0-facade-cmp

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Paper • 2105.15203 • Published May 31, 2021 • 3