GLM-5.2 Runs 2626 tok/s on AMD MI355X

#ai #llm #machinelearning #news

GLM-5.2 on AMD MI355X reaches 2626 tokens per second per node, according to benchmarks posted in a Hacker News thread that earned 40 points and 13 comments.

The result shows inference throughput more than twice as high per dollar compared with equivalent Blackwell configurations.

Model: GLM-5.2 | Speed: 2626 tok/s/node | Hardware: AMD MI355X | Cost: >2x lower than Blackwell

What It Is

GLM-5.2 is the latest large language model from the GLM series. The reported run uses AMD's MI355X accelerator to serve the model at scale.

The benchmark measures sustained generation speed under production-like batch sizes rather than single-request latency.

Benchmarks and Cost Data

The single-node figure of 2626 tok/s comes from direct measurement on MI355X silicon. Cost calculations factor in both hardware acquisition and power draw.

Early comments on the Hacker News thread note that the 2x cost advantage holds when comparing list prices and typical data-center power rates.

Metric	AMD MI355X	Blackwell (ref)
Tokens per second	2626	~1200
Relative cost	1.0x	2.1x+
HN discussion score	40 points	—

How to Try It

Teams can replicate the setup by provisioning MI355X instances through AMD's cloud partners or on-premise clusters. The wafer.ai blog post linked in the thread contains the exact software stack and batch-size configuration used for the 2626 tok/s result.

Pros and Cons

Pros: Highest reported tokens-per-dollar on current AMD silicon; single-node throughput exceeds prior MI300X numbers by a wide margin.
Cons: Requires AMD-specific drivers and ROCm stack; software ecosystem for GLM-5.2 remains narrower than CUDA equivalents.

Alternatives and Comparisons

NVIDIA H100 and B200 clusters remain the default choice for most GLM deployments today. The AMD result narrows the performance gap but still needs validation across longer context lengths and multi-node scaling.

Who Should Use This

Organizations already running AMD hardware or seeking to diversify away from NVIDIA supply constraints will find the numbers relevant. Teams locked into CUDA-only tooling or requiring maximum ecosystem support should continue with Blackwell-class GPUs.

Bottom Line

The 2626 tok/s figure on MI355X demonstrates that AMD can now deliver competitive inference economics for GLM-5.2 at production scale.

Continued software maturation will determine whether the cost advantage translates into broader adoption.

PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts

GLM-5.2 Runs 2626 tok/s on AMD MI355X

What It Is

Benchmarks and Cost Data

How to Try It

Pros and Cons

Alternatives and Comparisons

Who Should Use This

Bottom Line

Top comments (0)

Read next

Embed AI Agents in Software

Open-Source Memory Layer for AI Agents

Anthropic Removes Claude Code from Pro Plan

KV Cache Compression Hits 900,000x Breakthrough