LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models Paper • 2405.18377 • Published May 28, 2024 • 21
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Paper • 2503.02495 • Published Mar 4, 2025 • 9
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
view article Article What is MoE 2.0? Update Your Knowledge about Mixture-of-experts Apr 27, 2025 • 10
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation Paper • 2510.00515 • Published Oct 1, 2025 • 39