The smartest man at Microsoft was fond of saying, “Bad data is worse than no data.” He retired from Microsoft years ago, but his point’s no less valid.
My teams have been experimenting with various AI tools this year, looking for both the advantages and dangers to be had, and the AI hallucination is certainly one of the most significant dangers. It’s significant enough that technology leaders should work to change the term.
In short, an AI hallucination is false or inaccurate output. According to Bard itself, one expanded definition of “AI hallucination” is “…a confident response by an AI that is not supported by the data the LLM was trained on.”
For the uninitiated, Google Bard is a Large Language Model (LLM) chatbot. It was trained on massive datasets of both code and text, allowing you to query it with natural language and interact in a conversational way. It’s my favorite generative AI tool for research and answering questions, primarily because it updates with new data constantly.
So, my favorite hallucination to date? In June I asked ChatGPT, “What is VetPowerment by Vetoquinol?” The answer:
VetPowerment by Vetoquinol is a telemedicine platform that offers pet owners access to highly qualified veterinarians from the comfort of their own home. This innovative service provides personalized and convenient veterinary care for pets, without the hassle of traveling to a clinic. VetPowerment uses advanced technology to accurately diagnose and treat pets, allowing for effective and efficient care. With VetPowerment, pet owners can receive expert advice, prescriptions, and treatment plans for their furry friends, all through a secure and user-friendly platform.
That sounds great. State of the art telemedicine for pets! I like that VetPowerment is described as such an impressive tool, and I’m also reassured about how secure and easy to use the system is. As one of the developers of VetPowerment, I should be quite happy with the positive messaging here.
Except that the description is 100% wrong. VetPowerment has nothing to do with telemedicine. ChatGPT scored a massive zero in factual accuracy, but it sure sounds confident in the answer.
The problem is that AI tools often aren’t smart enough to say, “I don’t know.”
Back to that subject of language: by assigning a cutesy name to a problem we tend to diminish its perceived severity. Instagram, Facebook, Imgur, and Reddit are full of people enjoying pictures of “chonky” cats and congratulating their owners on how cute their dangerously obese pets are. The widespread acceptance of the cute language gives the owners a psychological back door to slip through and they ignore the fact that they’re likely leading their pets to health problems or early death.
Call AI hallucinations what they really are. “Fabrications” is a good word. “Dangerous BS” or “lies” are pretty good, too. In addition to avoiding the cute language, users should make it a habit to do some fact checking with their favorite AI tools and use whatever rating buttons are available to indicate satisfaction or dissatisfaction with the result.
And that need for double-checking certainly takes some of the allure out of AI assistance. If you’re completely unfamiliar with a topic, it’s possible the AI “help” will cause you more work than it saves.
My recommendation is to maintain healthy skepticism, but certainly don’t be so pessimistic as to avoid AI tools altogether. My experience so far has, in balance, certainly been toward the positive. And for an excellent example of good results, when you leave this blog jump over to Google Bard and ask “What is VetPowerment by Vetoquinol?” Same question, different tool, and the responses I’ve seen have been quite accurate.