Insights / August 7, 2024

LLMs Are Not Hallucinating. But You Are.

Insights / August 7, 2024 /

By Akshat Patel

In 1976, MIT AI Lab computer scientist Drew McDermott published his infamous paper, Artificial Intelligence meets Natural Stupidity¹.It aired his concerns about the field of AI at the time. He critiqued the field’s cavalier use of language to ascribe human properties to computer programs, creating deeply flawed understanding and expectations of AI. Among his criticisms was the fact that designing programs like local network search made reckless use of mnemonics that again, attributed human cognitive faculties, goals, intentions, and desires to these programs, creating delusions even inside the field about the elementary nature of computer programs.

Dissatisfied with the loose talk and undisciplined progression of the discipline, he noted, “As a field, artificial intelligence has always been on the border of respectability, and therefore on the border of crackpottery. Many critics have urged that we are over the border.”

Summarizing his motivations for the sardonic paper, he appealed to the reader, “If we can't criticize ourselves, someone else will save us the trouble.”

Today, history repeats.

Large Language Models

The emergence of transformer models, and more specifically, large language models (LLMs) has reignited visions of achieving human-level intelligence, or AGI. It is no accident that a species that possesses an innate biological endowment for language – a system of thought that can also be used for communication – mistakes chatbots and language generators as incipient intelligence. We cannot help it.

Human natural language is an example of only one on the planet. No other species possesses this generative cognitive capacity. The system allows us to construct unbounded complex associations, ideas, relations, and concepts as a means of thought. It is a lonely experience for the species. So anytime any object exhibits language outputs, never mind the means, the object is surmised to be anthropomorphically intelligent, and if not yet, could get there if only spoken to more, trained about our ways a little more, and introduced to the human world a little more.

The ELIZA effect, named after the 1966 chatbot, is “the susceptibility of people to read far more understanding than is warranted into strings of symbols—especially words—strung together by computers.” More generally, it describes any situation wherein “based solely on a system’s output, users perceive computer systems as having intrinsic qualities and abilities which the software controlling the output cannot possibly achieve, or assume that outputs reflect a greater causality than they actually do.”²

LLMs, trained on large text datasets (the final outputs of human language), identify patterns in the text and regurgitate textual output through token generation based on the distribution and sequential association of tokens in the dataset. This should not be confused with possessing the language faculty itself. It is not replicating the faculty; it is replicating with variation some feature distribution in the training data - a form of lossy compressed representation of the corpus in the network.

You wouldn’t know it listening to the present discourse around LLM sentience, however. Today, LLMs don’t just generate erroneous token associations due to under-representation or incorrectness in their training dataset distribution. No, they “hallucinate.”

As the Federal Trade Commission recently quipped, “Your therapy bots aren’t licensed psychologists, your AI girlfriends are neither girls nor friends, your grief-bots have no soul, and your AI copilots are not gods.”³

Artificial General Intelligence

The search for Artificial General Intelligence (AGI) is both simple and difficult.

For a program to automate a factory, it must implement the various functions and processes of the factory. In other words, the program is a theory of the factory, expressed in a computer language. When one searches for AGI, one is searching for a program. Not a bipedal robot, not a guy breakdancing in a robot jumpsuit, but a program. This is where the simplicity ends. Just like the factory program, this program must implement processes for human cognition, which requires a theory of human intelligence. This is where the difficulties begin.

The difficulty is we don’t have it. The gravitation of industry overpowers research today, and that has warped AGI into a computer science problem, and less of a cognitive science, neuroscience, and linguistics research problem.

Trained LLMs do not theorize human language capacity for a simple reason. Consider a training dataset. Randomize the words, garbling every sentence beyond any recognition and coherence. Train the model with this new dataset. It will output an utterly meaningless string of words in response to a prompt, equally performant as when trained with a meaningful dataset.

Meanwhile, the innate human language faculty will disallow such structures as meaningless; an impossible language⁴, in violation of Universal Grammar⁵. A theory that purports to explain a phenomenon cannot strictly describe what is observed; it must also disallow what is not observed. Newton’s theory of gravitation allows for elliptical orbits, and correctly disallows square orbits. LLMs allow for English and Spanish, but also Gibberish.

McDermott noted in 1976, “It is hard to say where they have gone wronger, in underestimating language or overestimating computer programs.”

Thinking Machines

In his 1950 paper Computing Machinery and Intelligence⁶, the father of modern Computer Science Alan Turing proposed the Imitation Game, the famous and often misunderstood Turing Test. Meant to illuminate a general direction for research and development of machines with increasing capabilities, the paper began by asking, “Can machines think?”

Turing gave a ten-word response to the question: “I believe it to be too meaningless to deserve discussion.”

Noam Chomsky summarizes Turing’s response by asking, “Do submarines swim?” If one wants to call it swimming, fine, it’s swimming. That is Turing’s response to the question of whether machines can think. If one wants to say that a vending machine microcontroller is thinking about a choice of drink before calculating the actuation instructions for its motors, fine, it thinks.

Any discussion about thinking machines beyond that would require formalizations, as Turing attempted.

Conclusion

Transformer models are an exciting development that showcase some very interesting and clever mathematics and software implementation. They can be potent tools for specific automation tasks, typically called Narrow AI, or more simply, algorithms. Their targeted use within larger automated systems, or for particular applications and use cases, are promising research and development avenues today. Muddying the waters, unfortunately, are unsubstantiated claims about imminent or emergent human-level intelligence that hysterically inflate the underlying scope and possibilities of the technology. Restoring discipline in discourse, and reason in the enterprise, is the need of the hour. Let’s not hallucinate.

Endnotes

Written by Akshat Patel