Running 13 Distilling 100B+ Models 40x Faster with TRL 📝 13 TRL distillation for 100B+ teachers, 40x faster
Liquid Claude Collection Liquid Claude is a small series of LiquidAI/LFM2.5-1.2B-Thinking model that have been fine tuned on Claude chats/data. • 5 items • Updated about 19 hours ago • 2
FlameF0X/LFM2.5-1.2B-Distilled-Claude-4.6 Text Generation • 1B • Updated about 20 hours ago • 307 • 1
Liquid Claude Collection Liquid Claude is a small series of LiquidAI/LFM2.5-1.2B-Thinking model that have been fine tuned on Claude chats/data. • 5 items • Updated about 19 hours ago • 2
FlameF0X/LFM2.5-1.2B-Distilled-Claude-4.6 Text Generation • 1B • Updated about 20 hours ago • 307 • 1
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 51
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity Paper • 2501.16295 • Published Jan 27, 2025 • 9