Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos Paper • 2605.18233 • Published 5 days ago • 87
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO Paper • 2605.15190 • Published 9 days ago • 13
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 9 days ago • 80
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation Paper • 2605.15141 • Published 9 days ago • 91
LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs Paper • 2605.17260 • Published 6 days ago • 24
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 5 days ago • 108
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 16 days ago • 195
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 14 days ago • 78
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 8 items • Updated 8 days ago • 67
HP-Edit: A Human-Preference Post-Training Framework for Image Editing Paper • 2604.19406 • Published Apr 21 • 7
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 240
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published Apr 15 • 32
Seeing Fast and Slow: Learning the Flow of Time in Videos Paper • 2604.21931 • Published about 1 month ago • 19
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 26 days ago • 71
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing Paper • 2604.22586 • Published 29 days ago • 16