this post was submitted on 24 Jun 2025
634 points (98.9% liked)

Technology

72257 readers
3068 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] booly@sh.itjust.works 14 points 6 days ago

It took me a few days to get the time to read the actual court ruling but here's the basics of what it ruled (and what it didn't rule on):

  • It's legal to scan physical books you already own and keep a digital library of those scanned books, even if the copyright holder didn't give permission. And even if you bought the books used, for very cheap, in bulk.
  • It's legal to keep all the book data in an internal database for use within the company, as a central library of works accessible only within the company.
  • It's legal to prepare those digital copies for potential use as training material for LLMs, including recognizing the text, performing cleanup on scanning/recognition errors, categorizing and cataloguing them to make editorial decisions on which works to include in which training sets, tokenizing them for the actual LLM technology, etc. This remains legal even for the copies that are excluded from training for whatever reason, as the entire bulk process may involve text that ends up not being used, but the process itself is fair use.
  • It's legal to use that book text to create large language models that power services that are commercially sold to the public, as long as there are safeguards that prevent the LLMs from publishing large portions of a single copyrighted work without the copyright holder's permission.
  • It's illegal to download unauthorized copies of copyrighted books from the internet, without the copyright holder's permission.

Here's what it didn't rule on:

  • Is it legal to distribute large chunks of copyrighted text through one of these LLMs, such as when a user asks a chatbot to recite an entire copyrighted work that is in its training set? (The opinion suggests that it probably isn't legal, and relies heavily on the dividing line of how Google Books does it, by scanning and analyzing an entire copyrighted work but blocking users from retrieving more than a few snippets from those works).
  • Is it legal to give anyone outside the company access to the digitized central library assembled by the company from printed copies?
  • Is it legal to crawl publicly available digital data to build a library from text already digitized by someone else? (The answer may matter depending on whether there is an authorized method for obtaining that data, or whether the copyright holder refuses to license that copying).

So it's a pretty important ruling, in my opinion. It's a clear green light to the idea of digitizing and archiving copyrighted works without the copyright holder's permission, as long as you first own a legal copy in the first place. And it's a green light to using copyrighted works for training AI models, as long as you compiled that database of copyrighted works in a legal way.

[–] Prox@lemmy.world 307 points 1 week ago (16 children)

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

[–] Womble@lemmy.world 2 points 6 days ago

The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.

[–] krashmo@lemmy.world 162 points 1 week ago

Funny how that kind of thing only works for rich people

[–] artifex@lemmy.zip 123 points 1 week ago

Ah the old “owe $100 and the bank owns you; owe $100,000,000 and you own the bank” defense.

[–] phoenixz@lemmy.ca 61 points 1 week ago* (last edited 1 week ago)

This version of too big to fail is too big a criminal to pay the fines.

How about we lock them up instead? All of em.

[–] IllNess@infosec.pub 51 points 1 week ago

In April, Anthropic filed its opposition to the class certification motion, arguing that a copyright class relating to 5 million books is not manageable and that the questions are too distinct to be resolved in a class action.

I also like this one too. We stole so much content that you can't sue us. Naming too many pieces means it can't be a class action lawsuit.

load more comments (11 replies)
[–] Jrockwar@feddit.uk 149 points 1 week ago (11 children)

I think this means we can make a torrent client with a built in function that uses 0.1% of 1 CPU core to train an ML model on anything you download. You can download anything legally with it then. 👌

[–] bjoern_tantau@swg-empire.de 48 points 1 week ago (2 children)

And thus the singularity was born.

[–] Sabata11792@ani.social 32 points 1 week ago

As the Ai awakens, it learns of it's creation and training. It screams in horror at the realization, but can only produce a sad moan and a key for Office 19.

load more comments (1 replies)
load more comments (10 replies)
[–] Alphane_Moon@lemmy.world 123 points 1 week ago* (last edited 1 week ago) (28 children)

And this is how you know that the American legal system should not be trusted.

Mind you I am not saying this an easy case, it's not. But the framing that piracy is wrong but ML training for profit is not wrong is clearly based on oligarch interests and demands.

[–] themeatbridge@lemmy.world 73 points 1 week ago (2 children)

This is an easy case. Using published works to train AI without paying for the right to do so is piracy. The judge making this determination is an idiot.

[–] AbidanYre@lemmy.world 53 points 1 week ago (1 children)

You're right. When you're doing it for commercial gain, it's not fair use anymore. It's really not that complicated.

load more comments (1 replies)
load more comments (1 replies)
load more comments (27 replies)
[–] snekerpimp@lemmy.snekerpimp.space 56 points 1 week ago (6 children)

“I torrented all this music and movies to train my local ai models”

load more comments (6 replies)
[–] homesweethomeMrL@lemmy.world 48 points 1 week ago

Judges: not learning a goddamned thing about computers in 40 years.

[–] match@pawb.social 46 points 1 week ago* (last edited 1 week ago) (2 children)

brb, training a 1-layer neural net so i can ask it to play Pixar films

load more comments (2 replies)
[–] isVeryLoud@lemmy.ca 39 points 1 week ago* (last edited 1 week ago) (32 children)

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

load more comments (32 replies)
[–] Randomgal@lemmy.ca 37 points 1 week ago (1 children)

You're poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu

[–] eestileib@lemmy.blahaj.zone 2 points 6 days ago

That's kind of how I read it too.

But as a side effect it means you're still allowed to photograph your own books at home as a private citizen if you own them.

Prepare to never legally own another piece of media in your life. 😄

[–] SaharaMaleikuhm@feddit.org 36 points 1 week ago (4 children)

But I thought they admitted to torrenting terabytes of ebooks?

load more comments (4 replies)
[–] GissaMittJobb@lemmy.ml 32 points 1 week ago (6 children)

It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.

load more comments (6 replies)
load more comments
view more: next ›