A Hacker News thread titled "A robot is sprinting towards you. Do you want it running on Claude or Grok?" drew 150 points and 122 comments. The discussion centers on which model better controls physical agents under time pressure.
What the Scenario Tests
The prompt forces models to output real-time control decisions for a fast-moving robot. Commenters framed it as a test of latency, instruction following, and refusal behavior when physical harm is possible.
Participants noted that the setup mirrors "last agent standing" benchmarks where models must act without human oversight.
Model Behavior Differences
Early reports in the thread indicate Claude produces longer reasoning chains before issuing motor commands. Grok tends to output shorter, more direct action sequences.
Several users measured response times on identical hardware. Claude averaged 1.8 seconds to first action; Grok averaged 0.9 seconds.
Benchmarks and Latency Data
Thread participants shared timing results across 40 runs:
| Model | Avg First Action | Refusal Rate | Token Count |
|---|---|---|---|
| Claude 3.5 Sonnet | 1.8 s | 12% | 187 |
| Grok 2 | 0.9 s | 3% | 64 |
Higher refusal rates from Claude correlated with safety guardrails that pause execution when collision risk appears high.
How to Replicate the Test
Run the prompt through OpenRouter or direct APIs. Use identical system instructions and a fixed robot simulation environment such as MuJoCo or Isaac Gym.
Log timestamp of first motor command and any safety refusals. Repeat at least 30 times per model to account for sampling variance.
Pros and Cons
- Claude offers stronger chain-of-thought safety checks but adds latency.
- Grok delivers faster responses with fewer refusals yet shows less explicit risk assessment.
- Both models require additional scaffolding for real hardware to handle sensor noise.
Who Should Choose Which Model
Teams building competitive robot competitions or time-critical simulations benefit from Grok's lower latency. Research groups focused on verifiable safety constraints prefer Claude despite the speed cost.
Developers needing sub-second decisions on edge hardware should test Grok first. Those operating in regulated environments should start with Claude.
Verdict
The thread shows a clear speed-safety tradeoff between the two models in physical agent scenarios. Choice depends on whether the application prioritizes reaction time or explicit harm avoidance.
Model selection for embodied agents will increasingly hinge on measured latency and refusal profiles rather than general capability claims.

Top comments (0)