Effective Distillation to Hybrid xLSTM Architectures Paper • 2603.15590 • Published 10 days ago • 32
Effective Distillation to Hybrid xLSTM Architectures Paper • 2603.15590 • Published 10 days ago • 32
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Paper • 2410.07170 • Published Oct 9, 2024 • 16