this post was submitted on 17 Feb 2024
1089 points (98.7% liked)

Technology

59589 readers
3077 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] pixxelkick@lemmy.world 75 points 9 months ago* (last edited 9 months ago) (25 children)
  1. Called this awhile back, this is why Reddit has such a high evaluation.

  2. Poisoning your data won't do anything but give them more data, do you seriously think reddit servers don't track every edit you make to posts? You'd literally just be providing training data of original human vs poisoned. They'd still have your original post, and they have a copy of everytime you edit it.

  3. Whoever buys reddit will have sole access to one of the larger (I don't think largest though) pools of text training Data on the internet, with full licensed usage of it. I expect someone like Google, FB, MS, OpenAI, etc would pay big $$$ for that.

"But can't people already scrape it?"

  1. Well yes, but it's at best legally dubious in some places

  2. Scraping Data off reddit only gets you current versions of posts (which means you can get poisoned dara, and cant see deleted content), and is extremely slow... if you own the server you have first class access to all posts in a database, including g the originals and diffs of everytime soneone edited a post, and all the deleted posts too.

Think about if you perhaps wanted to train an AI to detect posts that require flagging for moderation, if you scrape reddit data, you can't find deleted posts that got moderated...

But, if you have the raw original data, you 100% would have a list of every post that got deleted by mods and even the mod message on why it was deleted

You surely can see the value of such data, that only owners of reddit are currently privy to atm...

[–] DAMunzy@lemmy.dbzer0.com 19 points 9 months ago (2 children)

Poison it by randomly posting copywrited materials by big corps like Disney?

[–] RGB3x3@lemmy.world 10 points 9 months ago

Bee Movie script. Millions of times

[–] Isoprenoid@programming.dev 9 points 9 months ago

Once again the day is saved by piracy.🏴‍☠️

load more comments (22 replies)