Official models of "MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description"

Overview

MoChat is a Multimodal Large Language Model (MLLM) that revolutionizes human motion understanding through precise spatio-temporal grounding. Unlike conventional motion analysis systems, MoChat integrates:

Motion Understanding: Performs fundamental motion comprehension and summarization.
Spatial Limb Grounding: Accurately locates body parts involved in described movements.
Temporal Action Grounding: Precisely identifies time boundaries corresponding to specific motion descriptions.

Models

We provide the following trained models for download:

Joints-Grouped Skeleton Encoder for motion sequences representation.
Two variants of motion comprehension models:
- MoChat: Base model.
- MoChat-R: Extended model with regression head.

Resources

Codebase: Github
Paper: Arxiv

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for CSUBioGroup/MoChat

MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description

Paper • 2410.11404 • Published Oct 15, 2024 • 1