docs: add RTX 4090 benchmark + GPU arch list for SageAttention build

#22
by gkalstn0 - opened
Motif Technologies org

Summary

  • RTX 4090 GGUF + SageAttention benchmark table added to docs/gguf-sageattention.md
  • ~3.16x speedup with SageAttention on RTX 4090 (vs 1.59x on H200)
  • All GGUF variants fit in 24 GB
  • GPU arch list (SM 8.0/8.6/8.9/10.0/9.0) added to SageAttention build instructions
  • README news entry added
gkalstn0 changed pull request status to open
gkalstn0 changed pull request status to merged

Sign up or log in to comment