A post titled "Every AI Visibility Tool Is Lying to You" appeared on Hacker News and drew 13 points with 2 comments. The linked analysis at canonry.ai argues that commercial dashboards reporting LLM citations and brand mentions contain systematic overcounts.
What the Post Actually Shows
The article demonstrates that tools scrape a narrow set of prompts, then extrapolate to claim broad visibility. It lists repeated cases where reported citations did not appear when the same prompts were run directly in the target models.
Evidence from the HN Thread
Early comments noted the absence of prompt sampling methodology and lack of timestamped verification. One thread participant asked for raw prompt lists; none were supplied by the tool vendors mentioned.
Common Measurement Errors
Most tools rely on three recurring flaws:
- Single-run prompt tests treated as longitudinal data
- Failure to account for model version drift
- Inclusion of partial string matches as full citations
These produce inflated percentages that drop 40-70% on retest with fresh sessions.
How to Run Your Own Checks
Use a fixed prompt set of 50 queries across three models. Record exact output strings and dates. Store results in a simple spreadsheet rather than a paid dashboard. Re-run the same set monthly to track changes.
Tool Claims vs Direct Testing
| Approach | Reported Visibility | Verified on Retest | Cost |
|---|---|---|---|
| Commercial AI visibility platforms | 65-85% | 25-40% | $99+/mo |
| Manual prompt sampling | N/A | 25-40% | Free |
| Google Search Console + logs | Exact URL data | Matches logs | Free |
Who Should Skip Paid Tools
Teams running fewer than 200 brand queries per month gain nothing from subscription dashboards. Researchers needing reproducible citation counts should maintain their own prompt corpus instead.
Practical Next Steps
Export your current tool's prompt list if available. Replicate the top 20 queries in ChatGPT, Claude, and Gemini within 24 hours. Compare outputs against the vendor report. Discrepancies above 30% indicate the tool is not reliable for decision-making.
Bottom line: Direct prompt sampling remains the only method that matches actual model outputs.
Commercial visibility platforms will continue to sell smoothed aggregates until buyers demand raw prompt logs and version-specific results.
Top comments (0)