this post was submitted on 22 Feb 2024
207 points (97.7% liked)
Technology
59569 readers
3431 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Since half or more of reddit is now bots and shills, I don't imagine the training data is going to be great. That's fine, Gemini already sucks, so it'll be hard to make it worse.
The data being generated now sure, but there's still the years of actually useful data there.
Then add on the remaining half of comments that are from sensible users and it's a decent, and still fairly unique, dataset.
There are many, many, many things posted as fact over the years on reddit that are not only untrue, but dangerous or even deadly in the case of some of the most idiotic advice given. I wish good luck telling them all apart to the poor 3rd world contractors the big commercial AI companies ~~exploit~~use to "train" their stochastic parrots.
That was one of my favorite shitposting formats. I would type a whole paragraph with technical details and real knowledge. Only the people who actually knew what I was talking about would realize its a shitpost.
Yep, and a lot of reddit is thinly veiled shitposts, bots, and uncredited karma whoring reposts of stolen content (the commercial AI companies should feel right at home here). Some of them are to anger the self righteous redditors who come to PC police anyone who dares speak against the far left zeitgeist. But most importantly, so, so many of them are just for the lols.
The scariest part is that those drawn out, apparently accurate but actual nonsense posts/comments, is how many of them end up near the top, with massive numbers of votes from those who think "well that sounds reasonable," but know nothing of the subject itself.
Semi-related: I really loved the shitposts where the guy would tell an elaborate story, and end it with his dad beating the shit out of him with jumper cables. Now that's quality reddit content.