I'm not pirating. I'm building my model.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
To anyone who is reading this comment without reading through the article. This ruling doesn't mean that it's okay to pirate for building a model. Anthropic will still need to go through trial for that:
But he rejected Anthropic's request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build its library of material.
I also read through the judgement, and I think it's better for anthropic than you describe. He distinguishes three issues:
A) Use any written material they get their hands on to train the model (and the resulting model doesn't just reproduce the works).
B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).
C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).
A and B were fair use by summary judgement. Meaning this judge thinks it's clear cut in anthropics favor. C will go to trial.
C could still bankrupt the company depending on how trial goes. They pirated a lot of books.
As a civil matter, the publishing houses are more likely to get the full money if anthropic stays in business (and does well). So it might be bad, but I'm really skeptical about bankruptcy (and I'm not hearing anyone seriously floating it?)
Depending on the type of bankruptcy, the business can still operate, all their profits would just be going towards paying off their depts.
It might be that bad. Most 'damage' (as publishers see it) comes from distribution, not the download itself. Depending on how they acquired the books, it might be not be much of a problem.

Anakin: “Judge backs AI firm over use of copyrighted books”
Padme: “But they’ll be held accountable when they reproduce parts of those works or compete with the work they were trained on, right?”
Anakin: “…”
Padme: “Right?”
IMO the focus should have always been on the potential for AI to produce copyright-violating output, not on the method of training.
If you try to sell "the new adventures of Doctor Strange, Jonathan Strange and Magic Man." existing copyright laws are sufficient and will stop it. Really, training should be regulated by the same laws as reading. If they can get the material through legitimate means it should be fine, but pulling data that is not freely accessible should be theft, as it is already.
I have a freely accessible document that I have a cc license for that states it is not to be used for commercial use. This is commercial use. Your policy would allow for that document to be used though since it is accessible. This kind of policy discourages me from easily sharing my works as others profit from my efforts and my works are more likely to be attributed to a corporate beast I want nothing to do with then to me.
I'm all for copyright reform and simpler copyright law, but these companies need to be held to standard copyright rules and not just made up modifications. I'm convinced a perfectly decent LLM could be built without violating copyrights.
I'd also be ok sharing works with a not for profit open source LLM and I think others might as well.
That "freely" there really does a lot of hard work.
It means what it means, "freely" pulls its own weight. I didn't say "readily" accessible. Torrents could be viewed as "readily" accessible but it couldn't be viewed as "freely" accessible because at the very least you bear the guilt of theft. Library books are "freely" accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn't free.
Civil cases of copyright infringment are not theft, no matter what the MPIA have trained you to believe.
But they are copyright infringement, which costs more than theft.
as it is already
Copies of copyrighted works cannot be regarded as "stolen property" for the purposes of a prosecution under the National Stolen Property Act of 1934.
https://en.m.wikipedia.org/wiki/Dowling_v.United_States(1985)
Plantifs made that argument and the judge shoots it down pretty hard. That competition isn't what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?
Would love to hear your thoughts on the ruling itself (it's linked by reuters).
I hate AI with a fire that keeps we warm at night. That is all.