
How AI Can Trash Your Company
"I've seen things you people wouldn't believe." - Roy Batty, Blade Runner
It's no secret I use AI more than most people. I've taught hundreds how to leverage the power of LLMs and other types of AI. I've consulted with companies trying to scale and find efficiencies since I got out of university.
Using a GPT program like Chat, Claude, Gemini and Perplexity gives you power that 3 years ago was unthinkable.
But power corrupts. And GPTs are no exception. And we're all in a big hurry.
How many times have you used a GPT and then later regretted the output?
Did you send an email to a customer that had
"let me know if you want me to turn this into a video script"
It's no different than "Hello [NAME]".
I've seen it. I've done it. I'm sure you have too.
The real danger isn't in sending a one-off email or even a broadcast.
It's when you start using it for making important decisions.
And it happens to everyone - sometimes in really bad ways.
Everyone Loves a car crash
Deloitte's $290,000 Hallucination Disaster
Deloitte Australia delivered a 237-page report to the government filled with fabricated citations, invented quotes, and references to non-existent research. They used GPT-4 to generate parts of the report without proper disclosure or robust fact-checking. The scandal exposed what happens when a top-tier consultancy treats an LLM like a trusted researcher instead of an unreliable intern.
Law Firms and Fake Case Citations
Several law firms have been sanctioned for submitting briefs with AI-invented court cases and bogus judicial opinions. InMata v. Avianca, attorneys relied on ChatGPT as a “super search engine,” even asking it to confirm the fake cases were real. Judges fined the lawyers, demanded written apologies, and made clear that AI doesn’t absolve humans from doing real legal research.
Samsung's Confidential Data Leak
Samsung employees pasted proprietary code and even full meeting transcripts into ChatGPT, unintentionally leaking sensitive information outside the company. Because those inputs are stored on external servers, the data became difficult or impossible to fully retrieve or delete. Samsung responded by banning ChatGPT internally and tightening policies, but the incident became a textbook example of how easily AI tools can cause data exfiltration.
IBM Watson for Oncology: Unsafe Cancer Treatment Recommendations
IBM spent billions building Watson for Oncology, promising AI-powered personalized cancer treatment. In practice, it generated unsafe and incorrect recommendations, sometimes suggesting drugs explicitly contraindicated for the patient’s condition. Major hospitals scrapped the system after spending tens of millions, proving that unvalidated AI in medicine can be dangerous.
UnitedHealth's 90% Error Rate AI Denying Healthcare
UnitedHealthcare used an AI system to override doctors and deny coverage for elderly patients in rehabilitation. A lawsuit alleges the tool has an error rate around 90%, meaning most denials are reversed on appeal—but not before patients suffer financial and health consequences. The AI effectively became a cost-cutting engine dressed up as clinical judgment.
Air Canada's Chatbot Makes Up Bereavement Policy
Air Canada’s chatbot invented a bereavement refund policy that didn’t exist and gave it to a grieving passenger as official guidance. When the passenger tried to claim the refund, the airline argued the chatbot was a separate “entity” responsible for its own statements, which the tribunal rejected outright. The ruling held Air Canada fully accountable for its AI’s hallucinations and turned into a viral PR nightmare.
CNET's Plagiarism and Error-Filled AI Articles
CNET secretly published dozens of AI-written financial articles that were later found to contain factual errors, bad math, and plagiarized passages. The brand took a major credibility hit as readers and staff discovered the extent of the mistakes. The experiment showed how quickly trust evaporates when publishers replace journalists with lightly edited AI drafts.
Google's $100 Billion Market Value Wipeout
In a high-profile demo, Google’s Bard confidently gave a wrong answer about the James Webb Space Telescope, triggering a $100 billion drop in Alphabet’s market cap. The error undermined trust in Google at the exact moment it was trying to prove its AI leadership. Later, Gemini’s problematic image outputs sparked another backlash and huge value loss, reinforcing how fragile trust is in this space.
McDonald's AI Drive-Thru Debacle
McDonald’s tested AI-powered drive-thrus that went viral for hilariously wrong orders: hundreds of unwanted nuggets, bizarre add-ons, and constant misinterpretations. TikTok filled with videos of frustrated customers, making the pilot a meme instead of a milestone. The company ultimately shut down the system, showing that “good enough” AI accuracy is nowhere near good enough in public, real-time customer interactions.
Coca-Cola, Willy Wonka, and AI Marketing Disasters
Coca-Cola’s heavily AI-generated holiday campaign was slammed as soulless and lazy, with critics accusing the brand of replacing artists with algorithms. The infamous “Willy Wonka Experience” in Glasgow used AI-generated images to sell a fantasy that turned out to be a sad warehouse with no magic and no candy. Both became viral case studies in what happens when AI promises outpace real-world delivery.
AI Content's Virality Problem: The Sameness Epidemic
Research shows people trust content less when they suspect it’s AI-generated, and they’re less likely to buy from brands associated with that content. Generative models tend to produce “averages of averages,” leading to content that feels bland, repetitive, and emotionally flat. The result: AI content floods the internet, but genuine human voices and unique perspectives are what actually go viral.
Why These Failures Happened: The Root Causes
LLMs Are Probabilistic, Not Truth-Seeking
LLMs don’t “know” facts or check reality. These tools just predict the next likely word based on patterns in their training data.
That means they can produce confident, precise-sounding nonsense without any internal alarm bell. Hallucinations aren’t rare glitches; they’re an inherent part of how this technology works.
Training Data Limitations and Temporal Misalignment
Models are trained on incomplete, historical data that quickly becomes outdated, especially in fast-moving fields like law, finance, and medicine.
They may reference rules, prices, or guidelines that are no longer valid. Anytime accuracy depends on real-time or proprietary information, raw LLM outputs are especially risky.
Some models are trained primarily on specific data sets like Reddit.
Over-Reliance Without Human Oversight
Many AI implementations fail because organizations treat LLMs as plug-and-play solutions rather than tools that need guardrails, review, and domain experts in the loop.
To “save time,” teams skip the slow bits—verification, QA, and escalation paths—exactly where safety and quality actually live.
The end result is faster production of wrong answers at scale.
Financial Pressure and Misaligned Incentives
Companies feel intense pressure to cut costs, move faster, and “do something with AI” to please investors or executives.
That pressure encourages premature deployment, underfunded governance, and using AI where it’s misaligned with user safety or ethics (like denying care). When success is measured only in speed or savings, accuracy and trust inevitably suffer.
Don't Get Sued.
AI failures are not theoretical.
These issues show up as fines, lawsuits, and regulatory scrutiny.
Companies have seen tens to hundreds of billions wiped from their value over a single AI mistake, alongside lasting damage to reputation and trust.
Internally, teams lose time fixing broken AI outputs, while competitors who deploy AI responsibly start to look more reliable and attractive.
OK, so what do we do about it?
Validate .Treat every AI output as a draft that must be checked—especially in legal, financial, medical, or customer-facing contexts. Build in explicit review steps and give named humans final responsibility.
Understand the limitations.They are great at drafts, ideas, and language, but terrible as sole sources of truth, real-time calculators, or compliance engines. If accuracy or recency really matters, they should assist experts, not replace them. Think of these as really fast interns or 5th grade children. They get things done quickly but don't really understand context.
Keep humans in the loop. The safest and most effective AI systems pair models with domain experts who handle edge cases and make final decisions. Your process design should assume the AI will be wrong sometimes and catch that before it hits customers.
Be transparent about AI use. Don’t hide AI behind fake bylines or pretend a chatbot is a separate “entity.” Clear disclosure builds trust and also forces you to hold your own systems to a higher standard.
Start small and test in the real world. Pilot with limited scope, real users, and clear success criteria before scaling up. Containing early mistakes is far cheaper than cleaning up a full-scale rollout gone wrong.
Apply the “too good to be true” rule. If an LLM returns flawless citations, perfect numbers, or suspiciously convenient answers, assume it might be hallucinating and verify thoroughly. Healthy skepticism is a core safety feature, not a blocker to progress.
