12 min
AI, Reality, and the Lost Art of Testing
Why founders, teams, and even AI itself keep tripping over the same blind spots — and how to fix them.
Why this conversation happened in the first place
The call wasn’t a pitch. It was a deliberate break from the hyper-polished, transactional vibe of LinkedIn. The goal: meet the humans behind the profiles, swap unfiltered realities, and maybe leave each other with a sharper way to think. No sale at the end, just connection and mutual upskilling.
And yes — this also doubles as the ultimate stealth sales hack if you ever wanted it to be. No one expects the conversation, no one feels pitched, yet you leave with insights, rapport, and a real map of the other person’s world.
The recurring AI problem no one admits to
Most founders integrating AI into SaaS products don’t understand what they’re building. They don’t know how LLMs work, how to prompt effectively, or even how to measure if their features are successful.
Two recurring patterns:
- Shallow prompts — Users type two or three words into a tool like Lovable and expect magic. Rich, specific prompts are rare.
- First-answer syndrome — The first AI-generated response is blindly trusted as “the right one” without iteration or experimentation.
When that happens, the output is generic because the input is generic. AI parity across products means the quality gap is in how you use it, not what it can theoretically do.
Messaging and marketing are being flattened by AI
Ask around in SaaS and you’ll hear the same thing: marketing copy, blogs, and even frontend designs are starting to look cloned. AI-generated frameworks dominate. Everyone ships the same tone, the same layouts, the same CTA styles.
What gets lost:
- Original positioning
- Fresh messaging tests
- Brand-specific nuance
This is a problem for adoption and retention. If you look and sound like everyone else, you’ve erased the only free differentiator you had.
The deeper operational rot: no testing discipline
Founders and teams often don’t run experiments at all. Or if they do, they skip the most valuable step — tying operational tests to actual business metrics like cash flow, retention, or acquisition cost.
Even worse, internal politics distort results. Many companies showcase only their wins in team reviews. Losses are buried. The team ends up living in a “Lalaland” where the data is technically real but selectively presented.
Better practice:
- Run 3 experiments a week, every week → 36 in 3 months.
- Capture both quantitative and qualitative data.
- Log failures openly.
- Review with the same rigor you apply to wins.
At Lovable, the companies that got this right iterated faster than the market. But in most places — especially B2B SaaS — the default is “throw it at the wall and move on.”
Measuring vs. interpreting data
Even when teams have the numbers, they often read them in the most convenient way possible. Churn in AI products is 50–70%. In traditional SaaS, that’s catastrophic. In AI, it’s seen as “normal.”
Both are true — but only if you understand the product type, customer behavior, and what the churn actually means in your context. Without that depth, companies end up making wrong calls off technically “correct” data.
AI as a critic, not just a generator
One of the best uses of AI discussed: have it critique other AI outputs.
- Send your draft to ChatGPT, Grok, and Anthropic.
- Compare their disagreements.
- Force them to challenge each other’s logic.
Same for content ideas: speak your thoughts for 20 minutes, then have AI break down and attack everything you said. Brutal honesty uncovers blind spots far better than a “supportive” assistant.
Pro tip: add custom instructions to your AI to:
- Admit when it doesn’t know something.
- Adopt a role (designer, engineer, etc.) and aim to produce “tears of joy” quality.
- Be brutally honest, even mean.
LLM quirks that will cost you if you don’t know them
- Memory limits — Most LLMs lose context after 3–5 back-and-forths. Hit “try again” too many times and you’re hallucinating into nowhere.
- Consistency gaps — Even with a detailed “story bible,” long-form creative work drifts in tone and plot over time.
- Error loops — When an LLM can’t solve something, it retries into nonsense instead of admitting defeat.
- Creative sameness — AI-generated images and assets often have a visible “AI soul” — recognizable patterns, compositions, and visual tropes. True novelty is still rare.
- Web search bias — When connected to the internet, LLMs often pull from popular but outdated sources (e.g., listing MailChimp as a top marketing tool). Tools like Perplexity do this better.
The human learning gap
Users rarely know how the tools they use actually work. Lovable’s analytics showed 80% of signups didn’t know what to build and simply clicked a suggested prompt.
Education matters — not in the abstract, but right inside the product.
- Teach prompt-writing in context.
- Show the “why” behind features.
- Give real-world examples tied to user goals.
What AI actually improved in daily life
The biggest shift isn’t a single feature — it’s interaction style.
For the first time, we can speak to machines in natural language instead of clicking through menus. Full-sentence queries replace staccato keyword searches.
That’s not just UX candy. It’s a mental shift from “how do I use this tool?” to “how do I describe exactly what I want?” — a skill that will decide who gets real leverage from AI in the years ahead.
Core takeaways from the conversation
- Don’t trust the first AI output. Iterate and force critiques.
- Tie every experiment to a business metric, not vanity data.
- Show your failures internally — they’re where the real learning happens.
- Teach users to prompt well, or your product will look worse than it is.
- Customize your AI’s personality and honesty level to match your goals.
- Remember: AI’s memory, creativity, and reasoning still have limits. Work with them, not against them.