Why AI Teams Get No Credit for Prevented Failures

#ai #ethics #machinelearning #discuss

The 2001 paper "Nobody Ever Gets Credit for Fixing Problems That Never Happened" by Nelson Repenning and John Sterman surfaced again on Hacker News with 525 points and 170 comments. It models how organizations create self-reinforcing cycles that starve preventive work.

The core mechanism is a capability trap. Pressure to ship new features reduces time spent on maintenance and testing. Quality drops, firefighting increases, and even less time remains for prevention. The loop locks in.

How the Dynamics Model Works

The authors built a system dynamics simulation with two stocks: Work-in-Process and Organizational Capability. Capability grows only through deliberate investment in process improvement. When managers allocate all resources to throughput, capability erodes.

Simulations showed that a 10% cut in preventive effort produced a 25-40% rise in rework within six months. Recovery required sustained investment above the original baseline for 12-18 months.

Evidence from the Paper

The model was calibrated on data from a semiconductor equipment manufacturer and a chemical plant. Both sites showed identical patterns: measured productivity rose while actual throughput fell once hidden defects accumulated.

HN commenters noted the same pattern in modern AI deployments where teams deprioritize evaluation harnesses and red-teaming to meet release dates.

How to Apply It in AI Workflows

Run a simple stock-and-flow audit on your current sprint allocation. Track hours spent on:

New feature development
Post-incident fixes
Proactive testing and monitoring

If proactive hours fall below 20% of total engineering time for more than two quarters, the trap is active.

Tradeoffs of Prioritizing Prevention

Gains: fewer production incidents and lower long-term cost
Costs: slower visible feature velocity for 3-6 months
Risk: managers measured on short-term output may still cut the budget

Comparison with Common AI Practices

Approach	Prevention Hours	Incident Rate	Recovery Time
Firefighting culture	<10%	High	4-8 weeks
Repenning-Sterman allocation	25-30%	Low	1-2 weeks
Typical MLOps team	15%	Medium	3 weeks

Who Benefits Most

Teams shipping production LLMs or autonomous agents should adopt the framework. Research groups publishing only novel architectures can ignore it. Organizations with quarterly OKRs tied solely to new capabilities will fight the required investment.

Bottom line: The paper supplies a quantitative model showing why AI reliability work remains chronically underfunded unless measurement systems change.

The same dynamics that slowed factory improvement in 2001 now limit safe deployment of increasingly capable models.

PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts

Why AI Teams Get No Credit for Prevented Failures

How the Dynamics Model Works

Evidence from the Paper

How to Apply It in AI Workflows

Tradeoffs of Prioritizing Prevention

Comparison with Common AI Practices

Who Benefits Most

Top comments (0)

Read next

Using ERNIE Image to draft visual ideas faster

Best Local LLMs for Consumer Hardware (2026): Llama 3.3 70B vs Qwen3 30B-A3B vs DeepSeek-R1-Distill

Meet Hoot: a quieter, warmer promptzone

Chrome's Silent Gemini Nano Install