26 86 17

Min-Hung Chen

cmhungsteve

https://minhungchen.netlify.app/

AI & ML interests

Multimodal AI, Transfer Learning, Unsupervised Learning, Video Understanding, Vision Transformer, Computer Vision, Deep Learning

Recent Activity

upvoted a paper 8 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

upvoted a collection 12 days ago

Cosmos3

authored a paper 13 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

View all activity

Organizations

upvoted a paper 8 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 9 days ago • 60

upvoted a collection 12 days ago

Cosmos3

Collection

Omnimodal World Models for Physical AI • 15 items • Updated 13 days ago • 131

authored a paper 13 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 14 days ago • 106

upvoted a paper 13 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 14 days ago • 106

submitted a paper to Daily Papers 13 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 14 days ago • 106

authored 3 papers 20 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published 28 days ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 21 days ago • 16

upvoted 3 papers 20 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published 28 days ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 21 days ago • 16

New activity in nvidia/4D-RGPT-8B 23 days ago

fix links

#1 opened 23 days ago by

cmhungsteve

liked a model 23 days ago

nvidia/4D-RGPT-8B

Video-Text-to-Text • Updated 23 days ago • 252 • 15

upvoted a paper 26 days ago

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published 28 days ago • 60

upvoted a paper 28 days ago

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Paper • 2605.28774 • Published 29 days ago • 93

upvoted an article about 1 month ago

Article

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

nvidia

•

May 18

• 21

liked a dataset about 1 month ago

nvidia/PhysicalAI-VANTAGE-Bench

Viewer • Updated 21 days ago • 6.47k • 5.09k • 14

liked a model about 2 months ago

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

Any-to-Any • 33B • Updated May 8 • 642k • 359

New activity in MINT-SJTU/RoboFAC-dataset about 2 months ago

License for RoboFAC?

#6 opened about 2 months ago by

cmhungsteve

New activity in nvidia/R4D-Bench 2 months ago

selected as CVPR'26 Highlight

#6 opened 2 months ago by

cmhungsteve

Min-Hung Chen

AI & ML interests

Recent Activity

Organizations

cmhungsteve's activity

fix links

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

License for RoboFAC?

selected as CVPR'26 Highlight