Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching
Abstract
UniDFlow is a unified discrete flow-matching framework that decouples understanding and generation through low-rank adapters and uses reference-based alignment to improve multimodal tasks without retraining.
We propose UniDFlow, a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via task-specific low-rank adapters, avoiding objective interference and representation entanglement, while a novel reference-based multimodal preference alignment optimizes relative outcomes under identical conditioning, improving faithfulness and controllability without large-scale retraining. UniDFlpw achieves SOTA performance across eight benchmarks and exhibits strong zero-shot generalization to tasks including inpainting, in-context image generation, reference-based editing, and compositional generation, despite no explicit task-specific training.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Unified Thinker: A General Reasoning Modular Core for Image Generation (2026)
- Loom: Diffusion-Transformer for Interleaved Generation (2025)
- Towards Generalized Multi-Image Editing for Unified Multimodal Models (2026)
- UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing (2026)
- Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing (2026)
- SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens (2026)
- Unified Text-Image Generation with Weakness-Targeted Post-Training (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
