Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs Paper • 2604.13226 • Published 28 days ago • 10
Running Featured 80 Distilling 100B+ Models 40x Faster with TRL 📝 80 TRL distillation for 100B+ teachers, 40x faster
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773