HubSpot launched their free AEO tool last week. It sits on millions of AI answers and gives marketers a dashboard view of brand mentions across engines. It's a great tool. We're not building that.
We've manually read every AI engine response across 4 deep SaaS audits — Fireflies.ai, Linear, Otter.ai, and Notta.ai. 6 engines, 8-10 prompts per audit, 192-240 prompt-engine pairs read by a human, not aggregated by a dashboard.
Three patterns showed up across all four audits that no dashboard surfaces by design. Here they are, with the data.
Pattern 1: The evidence gap is universal
All 4 audits found the same gap: no named customer testimonials with specific outcomes. Every product had logos. None had a /customers page with named companies and measurable results.
Why this matters: AI engines look for verifiable proof. Logo walls don't read as proof — they read as decoration. When the engine has to choose between Fireflies and Otter for a recommendation, the one with 'MIT, IBM, Zoom use Otter' wins because it's a quotable fact, not a graphic.
What 4/4 audits found:
- Fireflies.ai: '1 million+ companies' aggregate stat, no named case studies on the homepage
- Linear: logo wall present, zero named outcomes anywhere on public pages
- Otter.ai: actually has the strongest evidence — 'MIT, IBM, Zoom' is cited by 4/6 engines verbatim
- Notta.ai: no /customers page at all; trust narrative dominated by 1.4-2.2/5 Trustpilot scores
A dashboard tracking 'mention rate' will show all four products being mentioned. It won't tell you that 3 out of 4 are missing the structural piece that makes the mentions actually convert.
Pattern 2: Engines disagree more than any single number reveals
When you read every engine response, you see something dashboards smooth over: the engines often categorize a product completely differently from each other.
Notta.ai is the clearest case. Across 6 engines:
- 0/6 engines categorized Notta as an 'AI meeting assistant'
- 6/6 engines categorized Notta as a 'multilingual transcription tool'
- When users ask 'best AI meeting tool', Notta appears in 1/3 queries (always with niche caveats)
- When users ask Notta by name, 6/6 engines respond with detailed product info
A dashboard mention rate of 76% (Notta's overall across 48 prompt-engine pairs) tells you nothing about this. The product is visible — but in the wrong category. The 'best AI meeting tool' query is roughly 10x more common than 'best multilingual transcription tool'. Notta is invisible in the high-traffic query and dominant in the niche one.
This kind of category-mismatch is the single highest-impact fix in three of our four audits. It's structurally invisible to any dashboard that aggregates 'mentioned' as a binary signal.
Pattern 3: One engine flags what every other engine misses
AI engines have different training data and different editorial tendencies. When you read all 6, sometimes one of them surfaces a concern that the other 5 ignore — and that one concern can rewrite the whole narrative.
Notta example: DeepSeek was the only engine to mention SOC 2 Type II + ISO 27001 certifications (positive signal). DeepSeek was also the only engine to flag a data-privacy concern: 'Notta may use private conversations to train AI models unless users opt out.'
5 engines saw 'multilingual transcription tool with 98.86% accuracy.' One engine saw 'enterprise-ready security AND a data privacy red flag in the same product.' If you're an enterprise buyer reading the DeepSeek summary, that single engine just killed the deal — even though 5 other engines ignored both the certs and the concern.
A dashboard averaging across engines erases this. A human reading every response sees the contradiction and writes it into the fix list as a /security page priority.
What HubSpot AEO will do well
Asia Frost gave a great talk at HubSpot Grow last week. Their data is solid: blog posts and listicles drive 62.1% of citations, question-based H2s correlate strongest with citations, freshness matters more than backlinks, LinkedIn + Reddit + YouTube account for 55% of citations on product comparison queries. We use this data ourselves.
HubSpot AEO is going to be the right tool for marketing teams that need:
- Ongoing visibility tracking across many engines (week-over-week trends)
- Brand-level dashboard reporting to senior leadership
- Aggregate mention rates and citation volumes
- Integration with the rest of HubSpot (CRM, Marketing Hub, etc.)
If you're a marketing team at a $5-50M ARR company already using HubSpot, install their AEO tool. It's free and it's good.
When the manual audit is the right job
We're not building a dashboard. We're a human service that does a one-shot deep diagnostic when the dashboard alone isn't enough.
The audit fits when you need:
- Verbatim engine quotes (so you can show your team exactly what AI says about you, not a percentage)
- Cross-engine pattern reading (the 3 patterns above only show up when one person reads all 6 engines)
- Page-level prioritized fix list (not 'your visibility dropped 8%' — 'rewrite homepage H1 from X to Y, add JSON-LD with applicationCategory: Z')
- Engine-specific narrative drift detection (DeepSeek-style outliers that alter the buyer story)
- A PDF you can send a co-founder, an investor, or a content team — not a dashboard login they'll never check
That's a $79 one-shot. It's not a $99/mo subscription. It's not a $5,000/mo enterprise tier. It's a person reading every response and writing a fix list, then handing it to you.
And if you want the fixes shipped instead of just listed, our $499 Fix Pack ships the top 3 in 5 working days — homepage rewrite, JSON-LD, FAQ schema, persona pages — with a before/after re-audit included.
Honest disclosure
We've published 4 deep teardowns. Not 40, not 400. Each one took 2-4 hours of human reading across 6 engines and 8-10 prompts. The patterns above are real but the sample is small. We're publishing this in case the patterns generalize for other indie SaaS founders staring at HubSpot AEO and trying to decide whether the dashboard alone is enough.
If you want to read the audits in full:
- Fireflies.ai teardown: comparison gap, evidence gap, docs coverage gap
- Linear teardown: comparison gap, evidence gap, structured-data gap
- Otter.ai teardown: strongest evidence narrative of the 4 (named customers cited by 4/6 engines)
- Notta.ai teardown: full 5-gap diagnosis including the entity positioning + DeepSeek privacy patterns above
All are public at /case-studies. No commercial relationship with any audited company.
Want this for your SaaS?
Run the same 6-engine audit on your product. PDF in your inbox in 5 minutes. $79 launch / $129 regular. 30-day refund, no questions asked.
If you'd rather have the fixes shipped than listed, the AI Fix Pack ($499) covers the top 3 highest-impact recommendations in 5 working days, with a before/after re-audit so you see the score move.
Both are one-time purchases. No subscription. No SDR follow-ups. No CRM-required dashboard. Just a human reading every AI engine answer about your product and shipping a fix plan you can act on this week.