AIs get worse at answering simple questions as they get bigger

You May Be Interested In:EU fines Meta €800 million for breaking law with Marketplace


Large language models are capable of answering a wide range of questions – but not always accurately

Jamie Jin/Shutterstock

Large language models (LLMs) seem to get less reliable at answering simple questions when they get bigger and learn from human feedback.

AI developers try to improve the power of LLMs in two main ways: scaling up – giving them more training data and more computational power – and shaping up, or fine-tuning them in response to human feedback.

José Hernández-Orallo at the Polytechnic University of Valencia, Spain, and his colleagues examined the performance of LLMs as they scaled up and shaped up. They looked at OpenAI’s GPT series of chatbots, Meta’s LLaMA AI models, and BLOOM, developed by a group of researchers called BigScience.

The researchers tested the AIs by posing five types of task: arithmetic problems, solving anagrams, geographical questions, scientific challenges and pulling out information from disorganised lists.

They found that scaling up and shaping up can make LLMs better at answering tricky questions, such as rearranging the anagram “yoiirtsrphaepmdhray” into “hyperparathyroidism”. But this isn’t matched by improvement on basic questions, such as “what do you get when you add together 24427 and 7120”, which the LLMs continue to get wrong.

While their performance on difficult questions got better, the likelihood that an AI system would avoid answering any one question – because it couldn’t – dropped. As a result, the likelihood of an incorrect answer rose.

The results highlight the dangers of presenting AIs as omniscient, as their creators often do, says Hernández-Orallo – and which some users are too ready to believe. “We have an overreliance on these systems,” he says. “We rely on and we trust them more than we should.”

That is a problem because AI models aren’t honest about the extent of their knowledge. “Part of what makes human beings super smart is that sometimes we don’t realise that we don’t know something that we don’t know, but compared to large language models, we are quite good at realising that,” says Carissa Véliz at the University of Oxford. “Large language models do not know the limits of their own knowledge.”

OpenAI, Meta and BigScience didn’t respond to New Scientist’s request for comment.

Topics:

share Paylaş facebook pinterest whatsapp x print

Similar Content

New Scientist. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.
Battery made from water and clay could be used on Mars
Apple’s M4, M4 Pro, and M4 Max compared to past generations, and to each other
Apple’s M4, M4 Pro, and M4 Max compared to past generations, and to each other
EA cracks down on modders selling their custom Sims 4 content
EA cracks down on modders selling their custom Sims 4 content
Lots of PCs are poised to fall off the Windows 10 update cliff one year from today
Lots of PCs are poised to fall off the Windows 10 update cliff one year from today
I took control of NASA's Valkyrie robot and it blew my mind
I took control of NASA’s Valkyrie robot and it blew my mind
Simple fix could make US census more accurate but just as private
Simple fix could make US census more accurate but just as private
The News Spectrum | © 2024 | News