SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving Paper • 2604.19157 • Published Apr 21 • 1
A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work Paper • 2604.18555 • Published Apr 20 • 1
Polynomial-Time Optimal Group Selection via the Double-Commutator Eigenvalue Problem Paper • 2605.00834 • Published 18 days ago • 1
Approximating Uniform Random Rotations by Two-Block Structured Hadamard Rotations in High Dimensions Paper • 2604.23418 • Published Apr 25 • 1
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval Paper • 2502.11276 • Published Feb 16, 2025 • 1
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World Paper • 2603.19223 • Published Mar 19 • 33
Revisiting RaBitQ and TurboQuant: A Symmetric Comparison of Methods, Theory, and Experiments Paper • 2604.19528 • Published 26 days ago • 1
PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective Paper • 2505.21799 • Published Feb 5 • 1
Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers Paper • 2605.07297 • Published 18 days ago • 1
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 19 days ago • 229
Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion Paper • 2605.07013 • Published 19 days ago • 2
Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 27 days ago • 53