SegFormer-B0 Fine-Tuned on CMP Facade Dataset

Custom semantic segmentation model for facade parsing: wall, window, door, and balcony detection on rectified building facades.

Model Details

  • Architecture: SegFormer-B0 (NVIDIA, ADE20K-pretrained)
  • Parameters: ~3.7M
  • Task: Semantic Segmentation
  • Input Size: 512×512
  • Classes: 6 unified facade classes

Class Mapping

ID Class Description
0 background Sky, ground, non-facade regions
1 facade_wall Main wall surface + moldings, cornices, pillars, sills, deco
2 window Windows + blinds
3 door Doors + shopfronts
4 balcony Balconies
5 vegetation_occluder Vegetation (trained as background since CMP lacks this class)

Training

  • Dataset: CMP Facade Database — 378 train, 114 test rectified facade images
  • Original Classes: 12 (facade, molding, cornice, pillar, window, door, sill, blind, balcony, shop, deco, background)
  • Mapping: 12 CMP classes → 6 unified classes (see mapping above)
  • Epochs: ~53 (best at epoch 38, mean IoU 0.4856)
  • Optimizer: AdamW, lr=6e-5
  • Batch Size: 4 per device (effective batch = 8 with grad accumulation)
  • Hardware: Tesla T4 GPU

Best Validation Metrics

Metric Value
Mean IoU 0.4856
Facade Wall IoU 0.867
Window IoU 0.410
Door IoU 0.460
Balcony IoU 0.230
Background IoU 0.467

Usage

from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch.nn as nn
import torch

# Load model
processor = SegformerImageProcessor.from_pretrained("Marco333/segformer-b0-facade-cmp")
model = SegformerForSemanticSegmentation.from_pretrained("Marco333/segformer-b0-facade-cmp")

# Load image
image = Image.open("facade.jpg").convert("RGB")

# Inference
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Upsample to original size
upsampled = nn.functional.interpolate(
    logits, size=image.size[::-1], mode="bilinear", align_corners=False
)
pred_seg = upsampled.argmax(dim=1)[0].cpu().numpy()

Intended Use

  • Primary: Second-pass segmentation of rectified facades (after homography rectification)
  • Secondary: First-pass facade detection on raw street photos (with expected lower accuracy due to lack of unrectified training data)

Pipeline Role

This model is designed for use in a 2-pass facade segmentation pipeline:

  1. Pass 1: Segment raw street photo → find facade wall region
  2. Rectify facade via homography
  3. Pass 2: Re-run this model on rectified crop → parse windows, doors, balconies cleanly

Limitations

  • Trained only on rectified facade images from CMP. Performance on perspective-distorted street photos will be degraded.
  • No vegetation data in training set — vegetation_occluder class will detect as background.
  • Small dataset (378 images) — performance ceiling is moderate.

Citation

Please cite this model if you use it:

@misc{corbetta_segformer_facade_cmp_2026,
  author       = {Marco Corbetta},
  title        = {segformer-b0-facade-cmp: SegFormer-B0 fine-tuned on CMP Facade},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Marco333/segformer-b0-facade-cmp}}
}

CMP Dataset:

@INPROCEEDINGS{Tylecek13,
  author = {Radim Tyle{\v c}ek and Radim {\v S}{\' a}ra},
  title = {Spatial Pattern Templates for Recognition of Objects with Regular Structure},
  booktitle = {Proc. GCPR},
  year = {2013},
}

SegFormer:

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}
Downloads last month
48
Safetensors
Model size
3.72M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Marco333/segformer-b0-facade-cmp

Paper for Marco333/segformer-b0-facade-cmp