A Bit More on the AI Inaccuracies

Start your day with TPM.
Sign up for the Morning Memo newsletter

I had heard a bit about this in discussions of inaccurate information from AI engines like ChatGPT. But hearing it directly from TPM Readers AT (a research librarian) and JC brought it into sharper relief for me. Inaccurate information can mean a lot of things. You can say Thomas Jefferson died in 1846 when in fact he died in 1826. Other inaccuracies are more complex and pernicious.

In my earlier post, I noted that when I asked ChatPGT for data on the trajectory of COVID case mortality rates, it came pretty close to my ideal result: a series of studies from reputable journals which gave me a start on pulling this information together. I noted that it was a bit annoying that it didn’t include the links. But I’m told that’s part of the guardrails since the whole application is still in a testing phase.

I assume that’s true. But the lack of links probably obscured something pretty important.

As AT and JC explained to me, many of these studies turn out not to exist. They’re simply made up, or rather they are the kinds of studies and articles that could exist but don’t. I might have published an article in the American Historical Review in 1998. It fits with various details of my academic background, the fact that I entered a PhD program in American history in 1992. But I didn’t. I wrote one in 1995 in a different academic journal. AT notes that she’s had a number of cases where academics come into the library looking for help locating one of the articles they’ve been referred to by ChatGPT. But the articles don’t exist.

Weird!

If this were a person writing an article or a school paper this would be textbook evidence of willful deception. Anyone can get the dates of a president’s life wrong, either through error or sloppiness. But you can’t accidentally come up with the details — journal name, date, volume, page numbers, author — of an academic article. You can only make it up. The very facticity of the detail makes it fabrication rather than error.

So what’s going on here?

I understand very little about how AI works. So I’ll keep this all at a very high level of generality. But at a basic level, the technology isn’t great at distinguishing between things that are real and things that are not real. One of the things people do is ask ChatGPT to write a poem in the style of this or that poet. Or write a short essay on Thomas Jefferson — the kind a student might turn in as a homework assignment. In that case it’s literally making things up based on putting together facts and connections and statistical patterns it’s absorbed by consuming vast amounts of information on the web. The whole magic of it is that it’s not just finding a few paragraph essay somewhere on the web and showing it to you. It’s creating something new, something that didn’t exist. But clearly the technology isn’t always able to distinguish between creating something in a certain style or format as a person might write it (but didn’t) and producing claims that might have been true but are in fact not true.

There are whole fields of human intellection devoted to how these things differ and how we understand those differences — formal logic, cognitive and neuroscience, machine learning, linguistics, computer science and maybe 20 other fields. It’s complicated. So maybe we shouldn’t be so hard on ChatGPT. But they are foundational. So foundational that they’re hard to spot at first.

TPM Reader JL, who has some distant academic background in the field, told me that for all the amazing, stunning leaps forward in the technology, “to a large degree it’s really just a clever trick. ChatGPT is ultimately just stringing chunks of text together based on statistical observations without any understanding of what it’s talking about. It’s amazing how far that gets you but the idea that the errors it makes are just kinks to be worked out seems like wishful thinking.”

Latest Editors' Blog
Masthead Masthead
Founder & Editor-in-Chief:
Executive Editor:
Managing Editor:
Deputy Editor:
Editor at Large:
General Counsel:
Publisher:
Head of Product:
Director of Technology:
Associate Publisher:
Front End Developer:
Senior Designer: