@mitkox on Hugging Face: "I just stress-tested the Beast: MiniMax-M2.1 on Z8 Fury G5. 2101 tokens/sec.…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

mitkox

posted an update 3 days ago

Post

3138

I just stress-tested the Beast: MiniMax-M2.1 on Z8 Fury G5.
2101 tokens/sec. FORTY concurrent clients. That's 609 t/s out, 1492 t/s in. The model outputs fire faster than I can type, but feeds on data like a black hole on cheat day.
But wait, there's more! Threw it into Claude Code torture testing with 60+ tools, 8 agents (7 sub-agents because apparently one wasn't enough chaos). It didn't even flinch. Extremely fast, scary good at coding. The kind of performance that makes you wonder if the model's been secretly reading Stack Overflow in its spare time lol
3 months ago, these numbers lived in my "maybe in “2030 dreams. Today it's running on my desk AND heaths my home office during the winter!

mike-ravkine

3 days ago

🔥 Got any pics of this rig? Would love to see how it's managing thermals.

BreathingAir

2 days ago

Honestly,
looks very cool and stable,
almost like it is boring?

MastertheWeb

1 day ago

google says 'Mitko Vasilev, a CTO who runs the model on this configuration, reported impressive stress test results: the setup achieved 2101 tokens/second across forty concurrent clients. The user emphasizes the capability of the Z8 Fury G5 to handle "enterprise-grade AI throughput" on a local desktop, positioning it as a powerful, cost-effective alternative to cloud GPU service'. Seems Google needs tp update the results from here.

In this post