Technology

76623 readers

4457 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

334

Microsoft, OpenAI sued for copyright infringement by nonfiction book authors in class action claim (www.cnbc.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

62 comments fedilink hide all child comments

Microsoft, OpenAI sued for copyright infringement by nonfiction book authors in class action claim::The new copyright infringement lawsuit against Microsoft and OpenAI comes a week after The New York Times filed a similar complaint in New York.

you are viewing a single comment's thread
view the rest of the comments

[–] bassomitron@lemmy.world 23 points 2 years ago* (last edited 2 years ago) (50 children)

I'm not a huge fan of Microsoft or even OpenAI by any means, but all these lawsuits just seem so... lazy and greedy?

It isn't like ChatGPT is just spewing out the entirety of their works in a single chat. In that context, I fail to see how seeing snippets of said work returned in a Google summary is any different than ChatGPT or any other LLM doing the same.

Should OpenAI and other LLM creators use ethically sourced data in the future? Absolutely. They should've been doing so all along. But to me, these rich chumps like George R. R. Martin complaining that they felt their data was stolen without their knowledge and profited off of just feels a little ironic.

Welcome to the rest of the 6+ billion people on the Internet who've been spied on, data mined, and profited off of by large corps for the last two decades. Where's my god damn check? Maybe regulators should've put tougher laws and regulations in place long ago to protect all of us against this sort of shit, not just businesses and wealthy folk able to afford launching civil suits and shakey grounds. It's not like deep learning models are anything new.

Edit:

Already seeing people come in to defend these suits. I just see it like this: AI is a tool, much like a computer or a pencil are tools. You can use a computer to copyright infringe all day, just like a pencil can. To me, an AI is only going to be plagiarizing or infringing if you tell it to. How often does AI plagiarize without a user purposefully trying to get it to do so? That's a genuine question.

Regardless, the cat's out of the bag. Multiple LLMs are already out in the wild and more variations are made each week, and there's no way in hell they're all going to be reigned in. I'd rather AI not exist, personally, as I don't see protections coming for normal workers over the next decade or two against further evolutions of the technology. But, regardless, good luck to these companies fighting the new Pirate Bay-esque legal wars for the next couple of decades.

[–] patatahooligan@lemmy.world 18 points 2 years ago (12 children)

Already seeing people come in to defend these suits. I just see it like this: AI is a tool, much like a computer or a pencil are tools. You can use a computer to copyright infringe all day, just like a pencil can. To me, an AI is only going to be plagiarizing or infringing if you tell it to. How often does AI plagiarize without a user purposefully trying to get it to do so? That’s a genuine question.

You are misrepresenting the issue. The issue here is not if a tool just happens to be able to be used for copyright infringement in the hands of a malicious entity. The issue here is whether LLM outputs are just derivative works of their training data. This is something you cannot compare to tools like pencils and pcs which are much more general purpose and which are not built on stole copyright works. Notice also how AI companies bring up "fair use" in their arguments. This means that they are not arguing that they are not using copryighted works without permission nor that the output of the LLM does not contain any copyrighted part of its training data (they can't do that because you can't trace the flow of data through an LLM), but rather that their use of the works is novel enough to be an exception. And that is a really shaky argument when their services are actually not novel at all. In fact they are designing services that are as close as possible to the services provided by the original work creators.

[+] bassomitron@lemmy.world -7 points 2 years ago* (last edited 2 years ago) (11 children)

In fact they are designing services that are as close as possible to the services provided by the original work creators.

I disagree and I feel like you're equally misrepresenting the issue if I must be as well. LLMs can do far more than simply write stories. They can write stories, but that is just one capability among numerous. Can it write stories in the style of GRRM? I suppose, but honestly doesn't GRRM also borrow a lot of inspiration from other authors? Any writer claiming to be so unique that they aren't borrowing from other writers is full of shit.

I'm not a lawyer or legal expert, I'm just giving a layman's opinion on a topic. I hope Sam Altman and his merry band get nailed to the wall, I really do. It's going to be a clusterfuck of endless legal battles for the foreseeable future, especially now that OpenAI isn't even pretending to be nonprofit anymore.

[–] wewbull@feddit.uk 13 points 2 years ago (1 children)

This story is about a non-fiction work.

What is the purpose of a non-fiction work? It's to give the reader further knowledge on a subject.

Why does an LLM manufacturer train their model on a non-fiction work? To be able to act as a substitute source of the knowledge.

End result is that

the original is made redundant.
the original author is no longer credited.

So, not only have they stolen their work, they've stolen their income and reputation.

[+] bassomitron@lemmy.world -10 points 2 years ago* (last edited 2 years ago) (1 children)

If you're using an LLM as any form of authoritative source-and literally any LLM specifically warns NOT to do that--then you're going to have a bad time. No one is using them to learn in any serious capacity. Ideally, the AI should absolutely be citing its sources, and if someone is able to figure out how to do that reliably, they'll be made quite rich, I'd imagine. In my opinion, the fiction writers have a stronger case than non-fiction (I believe the fiction writers' class action against OpenAI in September is still ongoing).

[–] Stoneykins@mander.xyz 12 points 2 years ago (1 children)

For someone who claimed to not be a fan of OpenAI, you sure do know all the fan arguments against regulation for AI.

load more comments (9 replies)

load more comments (46 replies)