A lot of people talking about "hallucinations", and how often LLMs are right or wrong today. I wish you'd stop.
LLMs can't be right or wrong. They don't hallucinate, because they don't experience any reality whatsoever. There's no understanding, or intent, or knowledge, or learning, or truth in there. The best mental model to understand what it's doing is something like mad-libs. It's a mad-lib solver. If the end result is factual, that's a coincidence, and it happened on accident.
And marketing bullshit like "RAG" or "reasoning" or whatever else don't change this. All that does is tailor the dictionary it's allowed to use to solve this mad-lib.
In other words, a stopped clock is always wrong.
It doesn't matter that it will coincidentally display the correct time twice per day. That coincidence doesn't have any connection to a clock's function as a tool. It's not doing the thing you think clocks do. It's not right about the time any more than a photo of a clock is. Or a description of a clock in a book. A stopped clock isn't a time keeping device. It's something else, that's merely shaped like a time keeping device.
LLMs aren't thinking. They aren't learning, understanding, or remembering. They're doing something else, that is merely shaped like thinking. They are a stopped clock, and they are always wrong.
@jenniferplusplus I'm only suggesting you use the word
@autolycos bullshit is marginally better, but it still implies mental facilities that LLMs are nowhere near capable of. Bullshiting requires intention, and a theory of mind. But LLMs aren't that. They're calculators that do vector minimization on word-shaped numbers.
@jenniferplusplus I will not argue with that, but I can't go to my peers and use the statistical explanation because my math is too limited
@jenniferplusplus I think this is the relevant part that makes me still say AI output is bullshit
"bullshitters or people who are bullshitting are distinct, as they are not focused on the truth. Persons who communicate bullshit are not interested in whether what they say is true or false, only in its suitability for their purpose[17]"
AI purpose is output with no focus on truth
@jenniferplusplus the only novelty of it is they have beheaded the garbage in portion. It is only garbage out because there is no validation with maximal objectification labeled as objectivity
@jenniferplusplus and I think there is a healthy space to debate if mental faculties are required
@jenniferplusplus @autolycos I quite like it, in one sense, but that is because I am watching Poker Face, where Charlie (a human lie detector) will respond "Bullshit" when someone lies to her, and it is near enough an automated response.
Almost as automated as the LLM bullshit that is spouted.
@jenniferplusplus @autolycos true, regardless their probabilistic nature and that there is no thinking involved, they can have a lot of added value, do things no other tool can do. My personal example is NotebookLM which I sometimes use for quick overview of large documents, flawless? surely not, but useful for sure. It even helps me check it outcomes if I want
I am heavily reminded of https://anathem.fandom.com/wiki/Bulshytt again, as a term for the claims about LLMs by their proponents.
@jenniferplusplus Point well made. The "photo of a clock" really clicked in my head.
A photo of a clock is not telling the right time twice a day, as it is never telling the current time at all.
@jenniferplusplus "They are a stopped clock, and they are always wrong." feels like maybe it goes a bit far.
They can come up with "right" answers or at least useful answers, so always "wrong" feels off. They never know the right answer though. knowing is not their function.
The never know, but they aren't always wrong.
@aeischeid but as you say, being right is not its function. So you can't rely upon it to be right. If you need correctness, you have to verify it against something that does perform the function of providing correct information. But then why not just use the correct information tool? Why involve the information-shaped mad-lib solver at all?
Or, why consult the broken clock, and then your watch, to see if the broken clock is right?
@jenniferplusplus I certainly don't mean to be in the position of defending or justifying superfluous LLM usage...
Similar to a web search, people would say things like "google was wrong" or "google was right" when what they meant maybe was, "it pointed me in the right direction" or "wrong direction".
Its function wasn't to be 'right', it was to return results relevant to the query. Expecting it to be 'correct' would be user error in a sense when it was a wayfinding tool.
@jenniferplusplus Which is to say, yeah, LLMs probably fairly useless if you are aiming for correctness.
But an answer doesn't always need to be correct to be useful.
@aeischeid @jenniferplusplus There’s a massive difference though, traditional search engines were never marketed as being ‘right’. Their function is very clear, to search the internet and return the sites that most closely match your criteria. The results are very clearly links to other sites. As a user it is very clear that the information on those sites is up to you to interpret.
LLMs present a totally different interface. They essentially do the same thing, except the UI absolutely suggests that the information returned is ‘right’, and the hype and marketing around LLMs only re-enforces this. The term ‘hallucination’ has been cleverly chosen to refer to the times when the LLM is ‘wrong’, but as has been mentioned above, everything they return is a hallucination, and as a user you are being lulled into thinking those times when it is clearly wrong are outliers.
@alexanderdyas @jenniferplusplus Agree! The marketing, hype, the language used to discuss, and even the UI/UX - all deeply problematic. For sure.
Still, to over-correct and say nobody should ever use these tools because they are always 'wrong' also feels like a reactive misstep
@aeischeid @alexanderdyas
If you have a framework for identifying wrong-but-useful responses, I'm all ears.
I don't. Other people don't. And neither do the tools themselves. It's up to the user to know when their problem could benefit from that kind of data. And they have to do it in the face of a hype machine backed by the entire US venture capital ecosystem and multiple sovereign wealth funds.
So, I don't actually think it's an overcorrection. It's a stopped clock, and the time it tells you is always wrong. Stopped clocks do have some uses, but it's not the obvious one.
@jenniferplusplus @alexanderdyas I wish such a framework did exist, I don't know if it does, I am not going to be the one coming up with it.
But even if it did, I can only wish it would even matter to most people who are just using it because it exists and does *seem* useful and so they use it.
The Luddites weren't wrong in so many ways, but also we can't just smash these new machines, and there is probably something to be learned from the ways they failed and became mostly a pejorative
@jenniferplusplus @alexanderdyas From a lingustics angle "they never know so they are always wrong" simply doesn't follow. categorically they are neither right nor wrong, they aren't telling truth at all. as you illustrated so cleverly with your picture of a clock analogy.
the picture of the clock doesn't have the wrong time. what would that mean, other than you're using the wrong framework to speak about what it is.
@aeischeid @alexanderdyas Right. As far as I can tell, no one has that good framework. So we're all using the bad framework. The responses from LLMs are shaped like information, knowledge, comprehension, whatever. People reflexively use it as though it were actually that. My goal here is to break that reflex.
@jenniferplusplus @alexanderdyas A worthy goal, and some good steps towards it!
Hopefully my comments are complimentary to that end. That was the intent at least.
@jenniferplusplus @alexanderdyas Not at all a complete framework for evaluating wrong-but-useful, but potentially relevant toward that -- two parts of this post have stuck with me and been at least an interesting lens on this topic
https://www.scotthyoung.com/blog/2025/05/13/22-thoughts-on-using-ai-to-learn-better/
1) the distinction of "Necessary and Unnecessary Difficulty" -- he applies to learning but more broadly applicable. eg. decisions where accountability is involved
2) Not a Good Teacher, but Maybe a Great Tutor
@aeischeid I actually kind of hate this. A big list of suggestions to use AI for learning, and only once coming anywhere close to addressing the fact that factual correctness is not a feature of the responses. Neither is accurate reproduction, but he instead pretends it is. The one nod to the question of suitability is what he calls "verifiable prompts", which is not even remotely a thing.
@jenniferplusplus Fair, I don't love the post on the whole, hesitated to share, still, the 2 parts I tried to highlight have been useful to me.
point 11, as you said, highly suspect. Just use web search to find primary sources - if only web search hadn't been systematically ruined over time...
point 12, however feels valid, some responses are purely internally verifiable as they weren't seeking anything factual , suggestions for mnemonics etc. many use LLMs in this fashion
@aeischeid Even that, in the context of learning, is something I have to be highly skeptical of. Analogies seem internally self verifiable, but you already know what analogies are. If you didn't know in advance what a reasonable or useful value would be, how would you evaluate it?
Most people don’t understand what LLMs do. It’s as if they’re watching an animation of Donald Duck and believe him to be sentient.
So, if I don’t know that the clock is broken, I might be convinced it is telling the correct time.
If being right is not it’s function, then being wrong is also not it’s function. It’s neither.
Finally, even if the above is true LLMs can be trained to generate correct statements a lot of the time. Hence the hype.
For watches and clocks, it makes no sense at all. It would take more effort to look at a photograph of a clock (broken or not) and compare it to your watch than to look at the watch. Arriving at the solution and verifying it are the same steps with the same effort.
On the other hoof, situations where uncovering a feasible solution through logical methods is significantly more effort than verifying a proposed solution is correct, then using automated guessing machines to propose possible answers for manual verification. For an example that predates modern LLMs, automated fuzzing of inputs.
@jenniferplusplus
"Because it's nicely-formed, I can appropriate the form and present it to other humans as my intellectual output. They can't call me out, because they can't prove it's autogenerated randomness. Since I am neither a sharp thinker nor a good writer, I don't even loose reputation."
My frustrated take at the receiving end of this practice.
@aeischeid @jenniferplusplus They're never right for the right reason.
This is usually the place in the conversation where I tap the sign about the tradition of verification and validation (V&V) in computational engineering (such as fluid and structural mechanics), in which reliability is important and the community found it essential to demonstrate that models were "right for the right reasons" because anything else was dangerously unreliable. Today's "AI" fails all engineering standards of fitness for purpose (at least for all "valuable" purposes), yet is recklessly being forced upon society.
@jedbrown @aeischeid @jenniferplusplus I did not know about V&V as a formalism (I am really just a code jockey and my CS degree was a long time ago) and that feels somehow like one of those things that seens like it should be obviously the right way to do things and yet clearly wasn't!
@aeischeid @jenniferplusplus It means looking at a photo of a clock in a room a visitor cannot accept it a the time at that moment but looking at a stopped clock they may treat it as the time. So this way the clock is not showing right time but the visitor can interpret it as right.
@aeischeid @jenniferplusplus Same here. The description of these as a 'photo of a clock". Wow. really fantastic analogy.
@jenniferplusplus Difference between a stopped wind up clock and LLMs is that people can take a mechanical device apart and figure out what’s wrong.
@jenniferplusplus They're a broken clock that stopped at twenteen fiftytwelve.
@jenniferplusplus Yep. And what is incredibly insidious is the use of I statements and other cues to make it seem more like it does have those abilities
And if you ask the engineers why they would be doing that; they will make the claim it is for "readability"; which is a weird response given that longer responses are less readable
But we know its so they can oversell its capabilities, inflate stock, and make people feel more precarious than they already feel
If you need a distraction, you will be amused by Lewis Carroll's logical-but-hilarious treatment of stopped clocks:
https://etc.usf.edu/lit2go/112/poems-puzzles-and-stories-of-lewis-carroll/4953/the-two-clocks/
Needed saying. Always wrong.
@jenniferplusplus AI companies: we changed the time base on our clock to base 8, so now it's right THREE times per day, an increase in accuracy of FIFTY PERCENT!
Me: that is SO not how any of this works
@jenniferplusplus
And what LLMs are actually getting better at is looking like they are thinking and responding intelligently. And that is increasingly dangerous because more and more humans are failing the reverse Turing test when confronted with them. My brother is convinced he has a personal relationship with ChatGPT, at least on a par with the assistant in “Her”.