video llm - a plmsmile Collection

plmsmile 's Collections

vision foundation modesl

image-video llm

video generation

mllm applications

video llm

updated Aug 13, 2024

video llm works

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 37
VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11, 2024 • 29
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 37
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 15
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22, 2024 • 28
VidLA: Video-Language Alignment at Scale

Paper • 2403.14870 • Published Mar 21, 2024 • 15
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Paper • 2404.01258 • Published Apr 1, 2024 • 12
Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1, 2024 • 13
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4, 2024 • 27
Koala: Key frame-conditioned long video-LLM

Paper • 2404.04346 • Published Apr 5, 2024 • 7
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14, 2024 • 15
iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 74
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Paper • 2406.08085 • Published Jun 12, 2024 • 17