Hallucinations Undermine Trust; Metacognition is a Way Forward Paper • 2605.01428 • Published 4 days ago • 11
Map2World: Segment Map Conditioned Text to 3D World Generation Paper • 2605.00781 • Published 5 days ago • 22
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding Paper • 2604.26779 • Published 7 days ago • 13
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments Paper • 2604.26067 • Published 8 days ago • 72
ClawGym: A Scalable Framework for Building Effective Claw Agents Paper • 2604.26904 • Published 7 days ago • 48
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons Paper • 2604.28130 • Published 6 days ago • 18
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published 9 days ago • 19
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 9 days ago • 64
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 9 days ago • 116
Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation Paper • 2604.23604 • Published 10 days ago • 5
Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data Paper • 2604.24479 • Published 9 days ago • 8
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 13 days ago • 33
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 9 days ago • 68