this post was submitted on 20 Feb 2025
137 points (96.0% liked)

Technology

63010 readers
3817 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Voroxpete@sh.itjust.works 36 points 1 day ago (7 children)

For the record, the reason this matters is because distributing a copyrighted work confers a much higher penalty than simply copying it for yourself. If Meta seeded those books they could be on the hook for a staggeringly large amount of damages. It's on the order of hundreds or even thousands per download. And that's across all the thousands of different books Meta grabbed.

[–] 01189998819991197253@infosec.pub 4 points 1 day ago (2 children)

Would distribution in the form of an AI not constitute a different form of seeding? I think it should.

[–] FauxLiving@lemmy.world 0 points 23 hours ago (1 children)

No, you can't find any copyrighted text inside the model's weights.

[–] patatahooligan@lemmy.world 4 points 14 hours ago

It's much more complicated than this. Given that models have been shown to spit out verbatim copies of some training material, it can be argued that the weights do in fact encode the material, just in some obfuscated way. Additionally, it can be argued that the output of the model is a derivative copy of the original work regardless of whether the original work can be "found inside" the model weights, just by the nature of the process. As of now, there is no precedent that I know of on whether this constitutes redistribution of copyrighted material.

load more comments (4 replies)