A2 Pretrained Policy

Pretrained ViLGP3D policy for 6-DOF grasp and place tasks in tabletop manipulation.

Model Description

This model uses CLIP-based cross-attention for selecting grasp and place poses from candidates generated by GraspNet/PlaceNet.

Files

  • sl_checkpoint_199.pth: Trained policy weights (ViLGP3D fusion network)
  • checkpoint-rs.tar: GraspNet checkpoint for grasp candidate generation

Usage

With lerobot_policy_a2

from lerobot_policy_a2 import A2Policy

# Load pretrained model
policy = A2Policy.from_pretrained("dgrachev/a2_pretrained")

# Use for grasp prediction
action, info = policy.predict_grasp(
    color_images={"front": rgb_image},
    depth_images={"front": depth_image},
    point_cloud=point_cloud,
    lang_goal="grasp a round object"
)

With LeRobot A2 Environment

# Data collection
A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_collect     --policy a2     --hf_repo dgrachev/a2_pretrained     --task grasp     --num_episodes 100

# Benchmark evaluation
A2_DISABLE_EGL=true uv run python -m lerobot.envs.a2_benchmark     --task grasp     --policy a2     --hf_repo dgrachev/a2_pretrained

Training Details

  • Architecture: ViLGP3D with CLIP ViT-B/32 backbone
  • Hidden dim: 768
  • Attention heads: 8
  • Position encoding: Rotary Position Encoding (RoPE)
  • Training data: Tabletop manipulation demonstrations

Related Resources

Citation

@misc{a2_policy,
  author = {Denis Grachev},
  title = {A2 Policy: CLIP-based 6-DOF Grasp and Place Policy},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/dgrachev/a2_pretrained}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading