haetsal-lee 's Collections
Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action
Recognition with Virtual Connections
Paper
• 2411.14796
• Published
LLaVAction: evaluating and training multi-modal large language models
for action recognition
Paper
• 2503.18712
• Published
• 3
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action
Recognition
Paper
• 2402.03241
• Published
Leveraging Temporal Contextualization for Video Action Recognition
Paper
• 2404.09490
• Published
Collaboratively Self-supervised Video Representation Learning for Action
Recognition
Paper
• 2401.07584
• Published
• 1
TASAR: Transfer-based Attack on Skeletal Action Recognition
Paper
• 2409.02483
• Published
• 4
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Paper
• 2403.09508
• Published
Advancing Human Action Recognition with Foundation Models trained on
Unlabeled Public Videos
Paper
• 2402.08875
• Published
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot
Action Recognition
Paper
• 2401.11654
• Published
Referring Atomic Video Action Recognition
Paper
• 2407.01872
• Published
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by
Disentangled Variational Autoencoders
Paper
• 2407.13460
• Published
Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton
Information
Paper
• 2503.04470
• Published
Revealing Key Details to See Differences: A Novel Prototypical
Perspective for Skeleton-based Action Recognition
Paper
• 2411.18941
• Published
CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based
Multi-Entity Action Recognition
Paper
• 2410.07153
• Published
SkeletonX: Data-Efficient Skeleton-based Action Recognition via
Cross-sample Feature Aggregation
Paper
• 2504.11749
• Published
• 2
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network
for Video Action Recognition
Paper
• 2408.05421
• Published
• 1
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal
Perturbation and Learning Stabilization
Paper
• 2501.01245
• Published
• 5
SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition
Paper
• 2410.16746
• Published
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Paper
• 2412.14719
• Published
• 1
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
Paper
• 2007.15796
• Published
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Paper
• 2504.06580
• Published
• 1
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper
• 2405.20340
• Published
• 20
ST-LLM: Large Language Models Are Effective Temporal Learners
Paper
• 2404.00308
• Published
• 8
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding
Paper
• 2404.05726
• Published
• 23
M-LLM Based Video Frame Selection for Efficient Video Understanding
Paper
• 2502.19680
• Published
MotionBank: A Large-scale Video Motion Benchmark with Disentangled
Rule-based Annotations
Paper
• 2410.13790
• Published
LongVLM: Efficient Long Video Understanding via Large Language Models
Paper
• 2404.03384
• Published
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering
Using a VLM
Paper
• 2403.18406
• Published
• 2
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Paper
• 2411.10979
• Published
Paper
• 2503.20680
• Published
• 3
GPT-4o: Visual perception performance of multimodal large language
models in piglet activity understanding
Paper
• 2406.09781
• Published
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
Models
Paper
• 2407.15841
• Published
• 40
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Paper
• 2411.04997
• Published
• 39
Latent Action Pretraining from Videos
Paper
• 2410.11758
• Published
• 3
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for
Joint Video Highlight Detection and Moment Retrieval
Paper
• 2412.01558
• Published
• 4
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Paper
• 2504.00072
• Published
• 6
Sparse Attention Vectors: Generative Multimodal Model Features Are
Discriminative Vision-Language Classifiers
Paper
• 2412.00142
• Published
• 5
LLMs Meet Long Video: Advancing Long Video Comprehension with An
Interactive Visual Adapter in LLMs
Paper
• 2402.13546
• Published
• 4
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
Multi-modal LLMs in Video Analysis
Paper
• 2405.21075
• Published
• 26
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Paper
• 2505.21334
• Published
• 21
AutoMMLab: Automatically Generating Deployable Models from Language
Instructions for Computer Vision Tasks
Paper
• 2402.15351
• Published
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Paper
• 2503.18422
• Published
Item-Language Model for Conversational Recommendation
Paper
• 2406.02844
• Published
• 13
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
Understanding
Paper
• 2406.19389
• Published
• 54
Prompt Learning for Action Recognition
Paper
• 2305.12437
• Published
HAIC: Improving Human Action Understanding and Generation with Better
Captions for Multi-modal Large Language Models
Paper
• 2502.20811
• Published
• 3
Visual Perception by Large Language Model's Weights
Paper
• 2405.20339
• Published
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal
LLMs
Paper
• 2506.01674
• Published
• 28
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video
Understanding
Paper
• 2406.09418
• Published
• 1
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via
Role Recognition and Involvement Measurement
Paper
• 2410.14259
• Published
One to rule them all: natural language to bind communication, perception
and action
Paper
• 2411.15033
• Published
• 3
TempCompass: Do Video LLMs Really Understand Videos?
Paper
• 2403.00476
• Published
• 1
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
• 2503.04130
• Published
• 96
ViDAS: Vision-based Danger Assessment and Scoring
Paper
• 2410.00477
• Published
Dense Connector for MLLMs
Paper
• 2405.13800
• Published
• 24
TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with
Temporal Considerations
Paper
• 2409.03206
• Published
Policy Improvement using Language Feedback Models
Paper
• 2402.07876
• Published
• 9
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action
Detection
Paper
• 2304.04688
• Published
• 1
LLM4VG: Large Language Models Evaluation for Video Grounding
Paper
• 2312.14206
• Published
• 3