this post was submitted on 17 Feb 2024
1089 points (98.7% liked)
Technology
59534 readers
3195 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Called this awhile back, this is why Reddit has such a high evaluation.
Poisoning your data won't do anything but give them more data, do you seriously think reddit servers don't track every edit you make to posts? You'd literally just be providing training data of original human vs poisoned. They'd still have your original post, and they have a copy of everytime you edit it.
Whoever buys reddit will have sole access to one of the larger (I don't think largest though) pools of text training Data on the internet, with full licensed usage of it. I expect someone like Google, FB, MS, OpenAI, etc would pay big $$$ for that.
"But can't people already scrape it?"
Well yes, but it's at best legally dubious in some places
Scraping Data off reddit only gets you current versions of posts (which means you can get poisoned dara, and cant see deleted content), and is extremely slow... if you own the server you have first class access to all posts in a database, including g the originals and diffs of everytime soneone edited a post, and all the deleted posts too.
Think about if you perhaps wanted to train an AI to detect posts that require flagging for moderation, if you scrape reddit data, you can't find deleted posts that got moderated...
But, if you have the raw original data, you 100% would have a list of every post that got deleted by mods and even the mod message on why it was deleted
You surely can see the value of such data, that only owners of reddit are currently privy to atm...
In regards to the editing part, sure, I'm sure they can track your edit history. However, on a large scale, most edits are going to be to correct things. To determine if an edit was to poison the text, it would likely require manual review and flagging. There's no way they're going to sift through all of the edits on individual accounts to determine this, so it's still worthwhile to do.
Although they could sidestep the issue a bit by simply comparing the changes between edits. Huge changes could just be discarded, while minor ones are fine.
You could easily make a minor change that negates every single other fact.