Collections
Discover the best community collections!
Collections including paper arxiv:2508.10104
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140 -
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Paper • 2508.02091 • Published • 13 -
DINOv3
Paper • 2508.10104 • Published • 305 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 81 -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 80 -
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Paper • 2506.03147 • Published • 58
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.5k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 96 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
Paper • 2502.15425 • Published • 9 -
EgoLife: Towards Egocentric Life Assistant
Paper • 2503.03803 • Published • 46 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 86
-
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Paper • 2508.01059 • Published • 34 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 240 -
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 191 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 211
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 90 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 37 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 82 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 254
-
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
Global and Local Entailment Learning for Natural World Imagery
Paper • 2506.21476 • Published • 1 -
DINOv3
Paper • 2508.10104 • Published • 305 -
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic
Paper • 2509.01363 • Published • 61
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 163
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 258 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 263 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 302
-
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report
Paper • 2508.01059 • Published • 34 -
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper • 2508.01191 • Published • 240 -
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Paper • 2508.05629 • Published • 191 -
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 211
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140 -
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Paper • 2508.02091 • Published • 13 -
DINOv3
Paper • 2508.10104 • Published • 305 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 90 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 37 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 82 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 254
-
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 81 -
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
Paper • 2506.04207 • Published • 48 -
MiMo-VL Technical Report
Paper • 2506.03569 • Published • 80 -
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Paper • 2506.03147 • Published • 58
-
End-to-End Vision Tokenizer Tuning
Paper • 2505.10562 • Published • 22 -
Global and Local Entailment Learning for Natural World Imagery
Paper • 2506.21476 • Published • 1 -
DINOv3
Paper • 2508.10104 • Published • 305 -
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic
Paper • 2509.01363 • Published • 61
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.5k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 96 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 98 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 163
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
Paper • 2502.15425 • Published • 9 -
EgoLife: Towards Egocentric Life Assistant
Paper • 2503.03803 • Published • 46 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 86
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 258 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 263 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 302