this post was submitted on 09 Jan 2024

529 points (98.2% liked)

Technology

84816 readers

3607 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

529

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says (www.theguardian.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

320 comments fedilink hide all child comments

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

you are viewing a single comment's thread
view the rest of the comments

[–] flop_leash_973@lemmy.world 76 points 2 years ago* (last edited 2 years ago) (2 children)

If it ends up being OK for a company like OpenAI to commit copyright infringement to train their AI models it should be OK for John/Jane Doe to pirate software for private use.

But that would never happen. Almost like the whole of copyright has been perverted into a scam.

[–] badbytes@lemmy.world 7 points 2 years ago (1 children)

You wouldn't steal a car, would you?

[–] flop_leash_973@lemmy.world 28 points 2 years ago (1 children)

[–] Honytawk@lemmy.zip 3 points 2 years ago

It is funny how Hollywood was droning that sentence into our head, and now they are downloading actors themselves. Oh the irony.

[+] tinwhiskers@lemmy.world -6 points 2 years ago (3 children)

Using copyrighted material is not the same thing as copyright infringement. You need to (re)publish it for it to become an infringement, and OpenAI is not publishing the material made with their tool; the users of it are. There may be some grey areas for the law to clarify, but as yet, they have not clearly infringed anything, any more than a human reading copyrighted material and making a derivative work.

[–] hperrin@lemmy.world 12 points 2 years ago (1 children)

It comes from OpenAI and is given to OpenAI’s users, so they are publishing it.

[–] linearchaos@lemmy.world 3 points 2 years ago (1 children)

It's being mishmashed with a billion other documents just like to make a derivative work. It's not like open hours giving you a copy of Hitchhiker's Guide to the Galaxy.

[–] hperrin@lemmy.world 1 points 2 years ago (2 children)

New York Times was able to have it return a complete NYT article, verbatim. That’s not derivative.

[–] Fraubush@lemm.ee 4 points 2 years ago (1 children)

I thought the same thing until I read another perspective into it from Mike Masnick and, from what he writes, it seems pretty clear they manipulated ChatGPT with some very specific prompts that someone who doesn't already pay NYT for access would not be able to do. For example, feeding it 3 verbatim paragraphs from an article and asking it to generate the rest if you understand how these LLMs work, its really not surprising that you can indeed force it to do things like that but it's an extreme and I'm qith Masnick and the user your responding to on this one myself.

I also watched most of today's subcommittee hearing on AI and journalism. A lot of the arguments are that this will destroy local journalism. Look, strong local journalism is some of the most important work that is dying right now. But the grave was dug by these large media companies and hedge funds that bought up and gutted those local news orgs and not many people outside of the industry batted an eye while that was happening. This is a bit of a tangent but I don't exactly trust the giant headgefunds who gutted these local news journalists ocer the padt deacde to all of a sudden care at all about how important they are.

Sorry fir the tangent butbheres the article i mentioned thats more on topic - http://mediagazer.com/231228/p11#a231228p11

[–] hperrin@lemmy.world 0 points 2 years ago

So they gave it the 3 paragraphs that are available publicly, said continue, and it spat out the rest of the article that’s behind a paywall. That sure sounds like copyright infringement.

[–] linearchaos@lemmy.world -1 points 2 years ago

And that's not the intent of the service, it's a bug and they'll fix it.

[–] Syntha@sh.itjust.works 2 points 2 years ago

Insane how this comment is downvoted, when, as far as a I'm aware, it's literally just the legal reality at this point in time.

[–] A_Very_Big_Fan@lemmy.world 2 points 2 years ago (1 children)

any more than a human reading copyrighted material and making a derivative work.

It seems obvious to me that it's not doing anything different than a human does when we absorb information and make our own works. I don't understand why practically nobody understands this

I'm surprised to have even found one person that agrees with me

[–] BURN@lemmy.world 1 points 2 years ago (1 children)

Because it’s objectively not true. Humans and ML models fundamentally process information differently and cannot be compared. A model doesn’t “read a book” or “absorb information”

[–] A_Very_Big_Fan@lemmy.world 1 points 2 years ago* (last edited 2 years ago)

I didn't say they processed information the same, I said generative AI isn't doing anything that humans don't already do. If I make a drawing of Gordon Freeman or Courage the Cowardly Dog, or even a drawing of Gordon Freeman in the style of Courage the Cowardly Dog, I'm not infringing on the copyright of Valve or John Dilworth. (Unless I monetize it, but even then there's fair-use...)

Or if I read a statistic or some kind of piece of information in an article and spoke about it online, I'm not infringing the copyright of the author. Or if I listen to hundreds of hours of a podcast and then do a really good impression of one of the hosts online, I'm not infringing on that person's copyright or stealing their voice.

Neither me making that drawing, nor relaying that information, nor doing that impression are copyright infringement. Me uploading a copy of Courage or Half-Life to the internet would be, or copying that article, or uploading the hypothetical podcast on my own account somewhere. Generative AI doesn't publish anything, and even if it did I think there would be a strong case for fair-use for the same reasons humans would have a strong case for fair-use for publishing their derivative works.