Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models
Abstract
Vision-language-action models face significant security vulnerabilities from 3D adversarial textures that can be physically deployed and effectively attack robotic manipulation tasks.
Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we introduce Foreground-Background Decoupling (FBD), which enables differentiable texture optimization through dual-renderer alignment while preserving the original simulation environment. To further ensure that the attack remains effective across long-horizon and diverse viewpoints in the physical world, we propose Trajectory-Aware Adversarial Optimization (TAAO), which prioritizes behaviorally critical frames and stabilizes optimization with a vertex-based parameterization. Built on these designs, we present Tex3D, the first framework for end-to-end optimization of 3D adversarial textures directly within the VLA simulation environment. Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance across multiple manipulation tasks, achieving task failure rates of up to 96.7\%. Our empirical results expose critical vulnerabilities of VLA systems to physically grounded 3D adversarial attacks and highlight the need for robustness-aware training.
Community
This paper presents Tex3D, a novel framework for attacking vision-language-action (VLA) models via adversarial 3D textures applied to manipulated objects. To enable end-to-end optimization in non-differentiable simulation environments, the authors introduce Foreground-Background Decoupling (FBD) and Trajectory-Aware Adversarial Optimization (TAAO). Experiments in both simulation and real-robot settings show that Tex3D significantly degrades VLA performance, revealing an important and physically realistic vulnerability of VLA systems.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Beyond the Patch: Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent 3D Adversarial Object (2026)
- TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches (2026)
- R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting (2026)
- Comparative Analysis of Patch Attack on VLM-Based Autonomous Driving Architectures (2026)
- Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey (2026)
- When Surfaces Lie: Exploiting Wrinkle-Induced Attention Shift to Attack Vision-Language Models (2026)
- Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.01618 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper