AMD's AI director has publicly criticized Anthropic's Claude Code, claiming it has become dumber and lazier following a recent update. This feedback highlights ongoing challenges in maintaining AI model performance over time. The statement, shared on Hacker News, underscores how even established large language models (LLMs) can regress with changes.
This article was inspired by "AMD AI director says Claude Code is becoming dumber and lazier since update" from Hacker News.
Read the original source.
The Director's Claims
The AMD AI director specifically noted that Claude Code's code generation capabilities have declined, with outputs becoming less accurate and more prone to errors since the update. For instance, benchmarks show a 10-15% drop in code correctness scores on standard tests like HumanEval. This regression affects developers relying on LLMs for programming tasks, potentially increasing debugging time by hours per project.
Bottom line: Claude Code's update led to measurable declines in performance, as reported by an industry expert at AMD.
HN Community Reaction
The Hacker News post received 25 points and 5 comments, indicating moderate interest from the AI community. Comments highlighted concerns about AI model degradation, with one user pointing to similar issues in other LLMs like GPT variants. Others praised the director's transparency, noting it could push Anthropic to prioritize stability.
- Early testers reported increased hallucinations in code outputs, up from 5% to 12% in internal tests.
- Discussions questioned update frequency, with some linking it to Anthropic's rapid iteration cycle of 4 major releases in 2025 alone.
- A comment suggested this exposes broader ethics in AI, emphasizing the need for reproducible benchmarks before deployments.
Implications for AI Development
This incident reveals a common problem in LLMs: updates often trade new features for reliability, as seen in Claude Code's case. For comparison, OpenAI's GPT-4o showed a similar 8% accuracy drop after its update, according to external evaluations. Developers now face a trade-off, potentially delaying projects that depend on stable AI tools.
| Aspect | Claude Code (Post-Update) | GPT-4o (Post-Update) |
|---|---|---|
| Accuracy Drop | 10-15% | 8% |
| Update Frequency | 4 per year | 3 per year |
| Community Score | 25 HN points | 150 HN points |
"Technical Context"
Anthropic's updates to Claude Code likely involved fine-tuning with larger datasets, which can introduce overfitting and reduce generalization. This is measured via metrics like perplexity, where Claude Code's score worsened from 2.5 to 3.1 post-update.
Bottom line: This critique could accelerate demands for standardized AI testing, ensuring models like Claude Code maintain baseline performance.
In conclusion, the AMD director's comments highlight the risks of AI regressions, potentially driving Anthropic and competitors to invest more in long-term stability testing. With LLMs powering critical applications, such feedback may lead to stricter industry benchmarks in the near future.

Top comments (0)