this post was submitted on 17 Feb 2024
1089 points (98.7% liked)

Technology

59693 readers
2822 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] prex@aussie.zone 60 points 9 months ago (13 children)

I assume AI is training off the content here for free.

[–] OmanMkII@aussie.zone 14 points 9 months ago (4 children)

I was curious if a robots.txt equivalent exists for AI training data, and there was some solid points here:

If I go to your writing, I read it & learn from it. Your writing influences my future writing. We've been okay with this as long as it's not a blatant forgery.

If a computer goes to your writing, it reads it & learns from it. Your writing influences its future writing. It seems we are not okay with this, even if it isn't blatant forgery.

[AI at the moment is] different because the company is re-using your material to create a product they are going to sell. I'm not sure if I believe that is so different than a human employee doing the same thing.

https://news.ycombinator.com/item?id=34324208

I still think we should have the ability to opt out like we do with search engines and webcrawlers, but if the algorithm works ideally and learns but does not recycle content, is it truly any different from a factory of workers pumping out clones of popular series on Amazon? I honestly don't know the answer to that.

[–] Appoxo@lemmy.dbzer0.com 6 points 9 months ago (1 children)

Afaik the OpenAI bot may choose to ignore it? At least that's what another user claimed it does.

[–] JohnEdwa@sopuli.xyz 12 points 9 months ago

Robots.txt has been always ignored by some bots, it's just a guideline originally meant to prevent excessive bandwidth usage by search indexing bots and is entirely voluntary.

Archive.org bot for example has completely ignored it since 2017.

load more comments (2 replies)
load more comments (10 replies)