this post was submitted on 26 Mar 2025
2 points (75.0% liked)
Technology
69098 readers
3105 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The quote was originally on news and journalists.
Another realization might be that the humans whose output ChatGPT was trained on were probably already 40% wrong about everything. But let's not think about that either. AI Bad!
This is a salient point that's well worth discussing. We should not be training large language models on any supposedly factual information that people put out. It's super easy to call out a bad research study and have it retracted. But you can't just explain to an AI that that study was wrong, you have to completely retrain it every time. Exacerbating this issue is the way that people tend to view large language models as somehow objective describers of reality, because they're synthetic and emotionless. In truth, an AI holds exactly the same biases as the people who put together the data it was trained on.
I'll bait. Let's think:
-there are three humans who are 98% right about what they say, and where they know they might be wrong, they indicate it
now there is an llm (fuck capitalization, I hate the ways they are shoved everywhere that much) trained on their output
now llm is asked about the topic and computes the answer string
By definition that answer string can contain all the probably-wrong things without proper indicators ("might", "under such and such circumstances" etc)
If you want to say 40% wrong llm means 40% wrong sources, prove me wrong