LLMs Must Hallucinate to Function

Nov 16, 2025

Imagine that you made a technology that could not really be tested for correctness, that you couldn't really explain how it worked, that you definitely couldn't reliably predict what it would do, and that using it could cause very serious problems, like people committing suicide.

How seriously do you think people would take you? Do you think they'd line up to pour hundreds of billions of dollars into your product?

Here's the bare, simple truth about LLMs, or what most people know of as "Generative AI" or perhaps just by one of the product's names, ChatGPT or Grok or Claude: LLMs must hallucinate to function.

Two years ago, when ChatGPT was making a big splash and heads of companies were tripping over themselves to deploy "AI initiatives" and loudly proclaim they were reducing headcount or not hiring anymore because "AI" could do the job, this is not what people were leading with.

But more recently, the news organizations are starting to catch up. Here are a couple recent, notable articles:

"Hallucination is fundamental to how transformer-based language models work. In fact, it’s their greatest asset: this is the method by which language models find links between sometimes disparate concepts. But hallucination can become a curse when language models are applied in domains where the truth matters." Your AI strategy needs mathematical logic

... "domains where the truth matters." That's a big hmm for me. 🤔

According to a recent article in the Wall Street Journal, Yann LeCun, formerly head of Meta's FAIR (Fundamental AI Research team), "has been telling anyone who asks that he thinks large language models, or LLMs, are a dead end in the pursuit of computers that can truly outthink humans." He’s Been Right About AI for 40 Years. Now He Thinks Everyone Is Wrong.

But, you know, technology isn't perfect and sometimes things are rough around the edges, so maybe we shouldn't be too hasty in judging this new technology.

That might be a fair point. I'm not sure. Maybe if we knew how big of a problem this might be, we could assess our stance more carefully.

You may have heard of IBM. They're a pretty big company and have done a thing or two with computational technologies over the years. For example, their computer, Deep Blue, was the first to beat a world chess champion.

On the IBM website, you can find something called the AI risk atlas. To make it easy, I added all their content links to a single enumerated list to give a sense of the span of topics they've collected about dangers in deploying LLMs. You might enjoy browsing:

  1. Toxic output
  2. Unexplainable and untraceable actions
  3. Data poisoning
  4. Unreliable source attribution
  5. Mitigation and maintenance
  6. AI agent compliance
  7. Function calling hallucination
  8. Harmful output
  9. Indirect instructions attack
  10. Confidential information in data
  11. Lack of model transparency
  12. Exploit trust mismatch
  13. Unrepresentative data
  14. AI agents' Impact on human agency
  15. AI agents' impact on human agency
  16. Personal information in prompt
  17. Sharing IP/PI/confidential information with user
  18. Temporal gap
  19. Nonconsensual use
  20. Decision bias
  21. Lack of testing diversity
  22. Exposing personal information
  23. Confidential data in prompt
  24. Data privacy rights alignment
  25. Discriminatory actions
  26. IP information in prompt
  27. Prompt leaking
  28. Hallucination
  29. Legal accountability
  30. Social hacking attack
  31. Non-disclosure
  32. Lack of training data transparency
  33. Model usage rights restrictions
  34. Reproducibility
  35. Specialized tokens attack
  36. Prompt injection attack
  37. Incomplete advice
  38. Lack of system transparency
  39. Data usage restrictions
  40. Impact on cultural diversity
  41. Impact on education: plagiarism
  42. Personal information in data
  43. Direct instructions attack
  44. Improper usage
  45. Exclusion
  46. Extraction attack
  47. Impact on Jobs
  48. Jailbreaking
  49. Data acquisition restrictions
  50. Sharing IP/PI/confidential information with tools
  51. Prompt priming
  52. Reidentification
  53. Overfitting
  54. Attribute inference attack
  55. Poor model accuracy
  56. Data transfer restrictions
  57. Generated content ownership and IP
  58. Encoded interactions attack
  59. Impact on human dignity
  60. Output bias
  61. Dangerous use
  62. Lack of AI agent transparency
  63. Unexplainable output
  64. Human exploitation
  65. AI agents' impact on jobs
  66. Improper data curation
  67. Over- or under-reliance on AI agents
  68. Attack on AI agents’ external resources
  69. Revealing confidential information
  70. Lack of domain expertise
  71. Spreading disinformation
  72. Data bias
  73. Uncertain data provenance
  74. Unrepresentative risk testing
  75. Redundant actions
  76. Data usage rights restrictions
  77. Unauthorized use
  78. Misaligned actions
  79. AI agents' impact on environment
  80. Harmful code generation
  81. Data contamination
  82. Incomplete usage definition
  83. Copyright infringement
  84. Context overload attack
  85. Lack of data transparency
  86. Impact on affected communities
  87. Improper retraining
  88. Spreading toxicity
  89. Inaccessible training data
  90. Impact on education: bypassing learning
  91. Introduce data bias
  92. Accountability of AI agent actions
  93. Incomplete AI agent evaluation
  94. Untraceable attribution
  95. Evasion attack
  96. Impact on the environment
  97. Over- or under-reliance
  98. Incorrect risk testing
  99. Membership inference attack

I don't know about you, but if someone showed me a list like this about a piece of technology, I might be concerned about the level of oversight it might be getting and whether we might want to investigate it a bit further.

At the same time, you might also wonder if we're just stuck with LLMs or whether there might be something else we could try. Turns out, there are quite a few things.