MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description
Paper
•
2410.11404
•
Published
•
1
MoChat is a Multimodal Large Language Model (MLLM) that revolutionizes human motion understanding through precise spatio-temporal grounding. Unlike conventional motion analysis systems, MoChat integrates:
We provide the following trained models for download: