HubSpot launched their free AEO tool last week. It sits on millions of AI answers and gives marketers a dashboard view of brand mentions across engines. It's a great tool. We're not building that.
We've manually read every AI engine response across 4 deep SaaS audits — Fireflies.ai, Linear, Otter.ai, and Notta.ai. 6 engines, 8-10 prompts per audit, 192-240 prompt-engine pairs read by a human, not aggregated by a dashboard.
Three patterns showed up across all four audits that no dashboard surfaces by design. Here they are, with the data.
Pattern 1: The evidence gap is universal
All 4 audits found the same gap: no named customer testimonials with specific outcomes. Every product had logos. None had a /customers page with named companies and measurable results.
Why this matters: AI engines look for verifiable proof. Logo walls don't read as proof — they read as decoration. When the engine has to choose between Fireflies and Otter for a recommendation, the one with 'MIT, IBM, Zoom use Otter' wins because it's a quotable fact, not a graphic.
What 4/4 audits found:
- Fireflies.ai: '1 million+ companies' aggregate stat, no named case studies on the homepage
- Linear: logo wall present, zero named outcomes anywhere on public pages
- Otter.ai: actually has the strongest evidence — 'MIT, IBM, Zoom' is cited by 4/6 engines verbatim
- Notta.ai: no /customers page at all; trust narrative dominated by 1.4-2.2/5 Trustpilot scores
A dashboard tracking 'mention rate' will show all four products being mentioned. It won't tell you that 3 out of 4 are missing the structural piece that makes the mentions actually convert.
Pattern 2: Engines disagree more than any single number reveals
When you read every engine response, you see something dashboards smooth over: the engines often categorize a product completely differently from each other.
Notta.ai is the clearest case. Across 6 engines:
- 0/6 engines categorized Notta as an 'AI meeting assistant'
- 6/6 engines categorized Notta as a 'multilingual transcription tool'
- When users ask 'best AI meeting tool', Notta appears in 1/3 queries (always with niche caveats)
- When users ask Notta by name, 6/6 engines respond with detailed product info
A dashboard mention rate of 76% (Notta's overall across 48 prompt-engine pairs) tells you nothing about this. The product is visible — but in the wrong category. The 'best AI meeting tool' query is roughly 10x more common than 'best multilingual transcription tool'. Notta is invisible in the high-traffic query and dominant in the niche one.
This kind of category-mismatch is the single highest-impact fix in three of our four audits. It's structurally invisible to any dashboard that aggregates 'mentioned' as a binary signal.
Pattern 3: One engine flags what every other engine misses
AI engines have different training data and different editorial tendencies. When you read all 6, sometimes one of them surfaces a concern that the other 5 ignore — and that one concern can rewrite the whole narrative.
Notta example: DeepSeek was the only engine to mention SOC 2 Type II + ISO 27001 certifications (positive signal). DeepSeek was also the only engine to flag a data-privacy concern: 'Notta may use private conversations to train AI models unless users opt out.'
5 engines saw 'multilingual transcription tool with 98.86% accuracy.' One engine saw 'enterprise-ready security AND a data privacy red flag in the same product.' If you're an enterprise buyer reading the DeepSeek summary, that single engine just killed the deal — even though 5 other engines ignored both the certs and the concern.
A dashboard averaging across engines erases this. A human reading every response sees the contradiction and writes it into the fix list as a /security page priority.
What HubSpot AEO will do well
Asia Frost gave a great talk at HubSpot Grow last week. Their data is solid: blog posts and listicles drive 62.1% of citations, question-based H2s correlate strongest with citations, freshness matters more than backlinks, LinkedIn + Reddit + YouTube account for 55% of citations on product comparison queries. We use this data ourselves.
