this post was submitted on 09 Mar 2026
184 points (98.4% liked)

Technology

82460 readers
3968 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

In order to help train its AI models, Meta (and others) have been using pirated versions of copyrighted books, without the consent of authors or publishers. The company behind Facebook and Instagram faces an ongoing class-action lawsuit brought by authors including Richard Kadrey, Sarah Silverman, and Christopher Golden, and one in which it has already scored a major (and surprising) victory: The Californian court concluded last year that using pirated books to train its Llama LLM did qualify as fair use.

You'd think this case would be as open-and-shut as it gets, but never underestimate an army of high-priced lawyers. Meta has now come up with the striking defense that uploading pirated books to strangers via BitTorrent qualifies as fair use. It further goes on to claim that this is double good, because it has helped establish the United States' leading position in the AI field.

Meta further argues that every author involved in the class-action has admitted they are unaware of any Llama LLM output that directly reproduces content from their books. It says if the authors cannot provide evidence of such infringing output or damage to sales, then this lawsuit is not about protecting their books but arguing against the training process itself (which the court has ruled is fair use).

Judge Vince Chhabria now has to decide whether to allow this defense, a decision that will have consequences for not only this but many other AI lawsuits involving things like shadow libraries. The BitTorrent uploading and distribution claims are the last element of this particular lawsuit, which has been rumbling on for three years now, to be settled.

top 39 comments
sorted by: hot top controversial new old
[–] Kolanaki@pawb.social 8 points 46 minutes ago (1 children)

So I can use pirated media to train my AI (Actual Intelligence), right?

[–] tehn00bi@lemmy.world 1 points 40 minutes ago

Should make all journal publications fair use.

[–] melfie@lemy.lol 3 points 32 minutes ago* (last edited 31 minutes ago)

Looking forward to Jellyfin getting a LLM to train locally on movie preferences so everyone’s library is fair use. Wait, is this why LLMs are being shoehorned into everything? 🤔

By this logic i should be able to copy paste Moby Dick and change all instances of the name to Mopy Dick and now it's output no longer matches the imput. I'm about to be the next Stefani King.

[–] SnotFlickerman@lemmy.blahaj.zone 84 points 3 hours ago* (last edited 3 hours ago) (2 children)
  1. Shorter and more reasonable copyright lengths would make this a moot point because then there would sufficient literature in the public domain to pull from.

  2. These kind of charges are what put the Pirate Bay admins in prison and caused Aaron Swartz to kill himself because of a threat of lifetime in prison. The claim that they did this either with the goal of profit or actually successful profit and that this was a serious crime. Neither TPB or Swartz at that point in time had ever moved as much data as Meta has for these claims, nor did they ever have the profit or possibility of profit Meta aims to make from their AI offerings.

  3. Now Meta is claiming they've profited so hard you can't possibly hold them accountable.

It will be the biggest "fuck you" in history to anyone ever hit with civil charges for piracy in the early 2000s, let alone the TPB admins and Swartz, if they let this go. Which means they probably will because in America, apparently if you crime hard enough and big enough they stop putting you in prison and start patting you on the back and calling it good business sense.

[–] Airfried@piefed.social 3 points 33 minutes ago (1 children)

in America, apparently if you crime hard enough and big enough they stop putting you in prison and start patting you on the back and calling it good business sense.

There's a story about Alexander the great capturing a pirate and scolding him for raiding villages along the coast line. Alexander asked if the pirate feels ashamed and wants to beg for forgiveness. However, the pirate had something else to say. He said that Alexander was doing the same thing, but infinitely worse. The only difference was that Alexander called himself king and plundered entire lands while the pirate only raided small villages. The pirate reminded Alexander of the many lives he had destroyed in his conquest. So the pirate's only crime was not to be the biggest baddie in the hood, so to speak.

Alexander replied by stating that the title of king forces his hand and that he couldn't just stop what he was doing. The pirate on the other hand was just an individual who could easily change course. And so Alexander set the pirate free, stating that he himself will start changing his own ways right there and then if the pirate makes a fresh start first.

I don't know if there is any truth to this but it's a fable often used to explain how legitimacy changes the perception people have of wrong doing and heroism on a fundamental level. Alexander's reply sounds like an excuse and I think that's on purpose. The pirate outwitted him in the end by stating a basic truth.

[–] SnotFlickerman@lemmy.blahaj.zone 1 points 25 minutes ago

https://www.youtube.com/watch?v=UQBWGo7pef8

This is where I first remember hearing this tale, in this old Schoolhouse Rock parody that was in protest of the War in Iraq.

[–] Luminous5481@anarchist.nexus 6 points 1 hour ago

Yup, that's what I'm doing with all those audiobooks I torrented. Helping the US maintain the lead in AI 😂

[–] daggermoon@piefed.world 7 points 2 hours ago

Is it fair use if I do it?

[–] Goodlucksil@lemmy.dbzer0.com 32 points 3 hours ago (2 children)

Classic "the end justifies the means" (bad) defense. If ISPs can send letter for torrenting, and Facebook torrented a lot, Facebook deserves a fair punishment.

[–] GameOverFlow@lemmy.zip 20 points 3 hours ago

Not deserves, needs.

[–] Willoughby@piefed.world 2 points 1 hour ago

truck full of letters backs up to Meta's headquarters

"there, that's more appropriate."

[–] HaunchesTV@feddit.uk 0 points 15 minutes ago

Just spitballing...

If you were to train a model on just one book, as long as you don't prompt it to create an exact copy (maybe just some indiscernible differences) then presumably that's fair use.

Then, since we know AI generated work can't be copyrighted, does that essentially create a copyright-free version of the text which can be freely distributed?

[–] _Nico198X_@piefed.europe.pub 1 points 1 hour ago

There are no rules. Everything is made up to their convenience.

[–] umbrella@lemmy.ml 4 points 2 hours ago (1 children)

sure. thanks meta, anna's archive will help me with my reading list, thanks.

[–] rc__buggy@sh.itjust.works 2 points 1 hour ago

We can train our NI (Natural Intelligence) models.

As long as they cannot copyright what they generate from using the pirated materials

[–] ArbitraryValue@sh.itjust.works 7 points 2 hours ago (1 children)

We're going to end up in a situation where whatever is necessary to train AI is permitted, and the main question is whether that will be through (re)interpretation of existing law or the passage of a new law.

[–] ctrl_alt_esc@lemmy.ml 4 points 2 hours ago (1 children)

Good thing I have a local model running that's constantly learning, for precisely this reason

[–] panda_abyss@lemmy.ca 3 points 2 hours ago (1 children)

I’m still collecting media before I can start the training process.

[–] XLE@piefed.social 1 points 15 minutes ago

If anything, this is proof you should be next in line for a large venture capital infusion!

[–] ryathal@sh.itjust.works 4 points 2 hours ago (2 children)

Arguing that training models isn't fair use us going to be a massive uphill battle, it's basically reading the book but with a computer. It's not actually a big deal to people, unless you hold the copyright to a ton of works and want to get a percentage of all the AI income these companies have made.

Torrenting the books is likely absolutely copyright infringement, but that has relatively low payout compared to the money these companies are getting for their models. The training being fair use means that rights holders can't try to take any money from the model's use. The statutory limits for infringement even at per work levels aren't significant compared to the legal cost of proving it happened.

[–] FatCrab@slrpnk.net 1 points 11 minutes ago

Anthropic pirating books for their training corpus resulted in the biggest copyright settlement in history--well over a billion. That is still being quibbled over i believe, but they settled because they were likely to pay out more if the case went forward. So I'm not really sure where you're coming from that infringement via torrenting does not result in monstrously large liability.

[–] OfCourseNot@fedia.io 3 points 1 hour ago (1 children)

There's an argument to be made that it is, in fact, not 'reading'. The training of the model could be considered a lossy compression of the data. And streaming movies in a lossy compression format is not fair use, is it?

[–] Fatal@piefed.social 2 points 1 hour ago

It's not the storage of the information that matters as much as the presentation. Google's search index stores a huge amount of copyrighted material, even losslessly. But they only present small snippets at a time which is not considered copyright infringement. The question really is whether or not the information being presented by the models is in a format which is considered copyright infringement. So far, courts have not found that they are.

[–] Grimy@lemmy.world 3 points 3 hours ago* (last edited 3 hours ago)

They didn't say seeding is fair use, just inherently part of torrenting. Good thing Sarah Silverman has pc gamer there to pander for her.