A Survey on Vision-Language-Action Models: An Action Tokenization
Perspective
Paper
•
2507.01925
•
Published
•
39
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive
World Knowledge
Paper
•
2507.04447
•
Published
•
44
A Survey on Vision-Language-Action Models for Autonomous Driving
Paper
•
2506.24044
•
Published
•
14
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
Paper
•
2507.10548
•
Published
•
36
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human
Videos
Paper
•
2507.15597
•
Published
•
34
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent
Planning
Paper
•
2507.16815
•
Published
•
40
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action
Models
Paper
•
2507.23682
•
Published
•
23
InstructVLA: Vision-Language-Action Instruction Tuning from
Understanding to Manipulation
Paper
•
2507.17520
•
Published
•
14
MolmoAct: Action Reasoning Models that can Reason in Space
Paper
•
2508.07917
•
Published
•
44
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding
in Vision-Language-Action Policies
Paper
•
2508.20072
•
Published
•
31
Mechanistic interpretability for steering vision-language-action models
Paper
•
2509.00328
•
Published
•
2
F1: A Vision-Language-Action Model Bridging Understanding and Generation
to Actions
Paper
•
2509.06951
•
Published
•
32
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model
Paper
•
2509.09372
•
Published
•
243
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper
•
2509.09674
•
Published
•
80
FLOWER: Democratizing Generalist Robot Policies with Efficient
Vision-Language-Action Flow Policies
Paper
•
2509.04996
•
Published
•
13
A Vision-Language-Action-Critic Model for Robotic Real-World
Reinforcement Learning
Paper
•
2509.15937
•
Published
•
20
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
Paper
•
2510.06710
•
Published
•
39
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Paper
•
2510.13054
•
Published
•
12
Expertise need not monopolize: Action-Specialized Mixture of Experts for
Vision-Language-Action Learning
Paper
•
2510.14300
•
Published
•
11
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for
Generalist Robot Policy
Paper
•
2510.13778
•
Published
•
16
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper
•
2510.19430
•
Published
•
49
10 Open Challenges Steering the Future of Vision-Language-Action Models
Paper
•
2511.05936
•
Published
•
5
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Paper
•
2511.14659
•
Published
•
12
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
Paper
•
2512.16760
•
Published
•
12