Video understanding
updated
Wolf: Captioning Everything with a World Summarization Framework
Paper
•
2407.18908
•
Published
•
32
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Paper
•
2407.19985
•
Published
•
37
TPDiff: Temporal Pyramid Video Diffusion Model
Paper
•
2503.09566
•
Published
•
45
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware
Regressive GRPO
Paper
•
2506.07464
•
Published
•
14
Video models are zero-shot learners and reasoners
Paper
•
2509.20328
•
Published
•
99
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large
Multimodal Models
Paper
•
2510.05034
•
Published
•
48
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal
Evidence
Paper
•
2510.20579
•
Published
•
55
Video Reasoning without Training
Paper
•
2510.17045
•
Published
•
7
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement
Learning
Paper
•
2510.23473
•
Published
•
84
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with
the MME-CoF Benchmark
Paper
•
2510.26802
•
Published
•
33
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Paper
•
2511.15065
•
Published
•
75
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
Paper
•
2511.16668
•
Published
•
54
In-Video Instructions: Visual Signals as Generative Control
Paper
•
2511.19401
•
Published
•
31
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
Paper
•
2512.01342
•
Published
•
16
ViDiC: Video Difference Captioning
Paper
•
2512.03405
•
Published
•
27
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
Paper
•
2512.04678
•
Published
•
40
Evaluating Gemini Robotics Policies in a Veo World Simulator
Paper
•
2512.10675
•
Published
•
17
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Paper
•
2512.13874
•
Published
•
16
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
Paper
•
2512.15702
•
Published
•
14
Kling-Omni Technical Report
Paper
•
2512.16776
•
Published
•
164
SemanticGen: Video Generation in Semantic Space
Paper
•
2512.20619
•
Published
•
89
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Paper
•
2512.20618
•
Published
•
53
Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations
Paper
•
2512.21004
•
Published
•
12