GLM-5.2 on AMD MI355X reaches 2626 tokens per second per node, according to benchmarks posted in a Hacker News thread that earned 40 points and 13 comments.
The result shows inference throughput more than twice as high per dollar compared with equivalent Blackwell configurations.
Model: GLM-5.2 | Speed: 2626 tok/s/node | Hardware: AMD MI355X | Cost: >2x lower than Blackwell
What It Is
GLM-5.2 is the latest large language model from the GLM series. The reported run uses AMD's MI355X accelerator to serve the model at scale.
The benchmark measures sustained generation speed under production-like batch sizes rather than single-request latency.
Benchmarks and Cost Data
The single-node figure of 2626 tok/s comes from direct measurement on MI355X silicon. Cost calculations factor in both hardware acquisition and power draw.
Early comments on the Hacker News thread note that the 2x cost advantage holds when comparing list prices and typical data-center power rates.
| Metric | AMD MI355X | Blackwell (ref) |
|---|---|---|
| Tokens per second | 2626 | ~1200 |
| Relative cost | 1.0x | 2.1x+ |
| HN discussion score | 40 points | — |
How to Try It
Teams can replicate the setup by provisioning MI355X instances through AMD's cloud partners or on-premise clusters. The wafer.ai blog post linked in the thread contains the exact software stack and batch-size configuration used for the 2626 tok/s result.
Pros and Cons
- Pros: Highest reported tokens-per-dollar on current AMD silicon; single-node throughput exceeds prior MI300X numbers by a wide margin.
- Cons: Requires AMD-specific drivers and ROCm stack; software ecosystem for GLM-5.2 remains narrower than CUDA equivalents.
Alternatives and Comparisons
NVIDIA H100 and B200 clusters remain the default choice for most GLM deployments today. The AMD result narrows the performance gap but still needs validation across longer context lengths and multi-node scaling.
Who Should Use This
Organizations already running AMD hardware or seeking to diversify away from NVIDIA supply constraints will find the numbers relevant. Teams locked into CUDA-only tooling or requiring maximum ecosystem support should continue with Blackwell-class GPUs.
Bottom Line
The 2626 tok/s figure on MI355X demonstrates that AMD can now deliver competitive inference economics for GLM-5.2 at production scale.
Continued software maturation will determine whether the cost advantage translates into broader adoption.
Top comments (0)