this post was submitted on 06 Sep 2024
1721 points (90.1% liked)

Technology

59589 readers
2891 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Those claiming AI training on copyrighted works is "theft" misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they're extracting general patterns and concepts - the "Bob Dylan-ness" or "Hemingway-ness" - not copying specific text or images.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in "vector space". When generating new content, the AI isn't recreating copyrighted works, but producing new expressions inspired by the concepts it's learned.

This is fundamentally different from copying a book or song. It's more like the long-standing artistic tradition of being influenced by others' work. The law has always recognized that ideas themselves can't be owned - only particular expressions of them.

Moreover, there's precedent for this kind of use being considered "transformative" and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

While it's understandable that creators feel uneasy about this new technology, labeling it "theft" is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn't make the current use of copyrighted works for AI training illegal or unethical.

For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. https://twit.tv/shows/floss-weekly/episodes/744

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Capricorn_Geriatric@lemmy.world 15 points 2 months ago (5 children)

Those claiming AI training on copyrighted works is "theft" misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves.

Sure.

When AI systems ingest copyrighted works, they're extracting general patterns and concepts - the "Bob Dylan-ness" or "Hemingway-ness" - not copying specific text or images.

Not really. Sure, they take input and garble it up and it is "transformative" - but so is a human watching a TV series on a pirate site, for example. Hell, it's eduactional is treated as a copyright violation.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages.

Perhaps. (Not an AI expert). But, as the law currently stands, only living and breathing persons can be educated, so the "educational" fair use protection doesn't stand.

The AI discards the original text, keeping only abstract representations in "vector space". When generating new content, the AI isn't recreating copyrighted works, but producing new expressions inspired by the concepts it's learned.

It does and it doesn't discard the original. It isn't impossible to recreate the original (since all the data it gobbled up gets stored somewhere in some shape or form and can be truthfully recreated, at least judging by a few comments bellow and news reports). So AI can and does recreate (duplicate or distribute, perhaps) copyrighted works.

Besides, for a copyright violation, "substantial similarity" is needed, not one-for-one reproduction.

This is fundamentally different from copying a book or song.

Again, not really.

It's more like the long-standing artistic tradition of being influenced by others' work.

Sure. Except when it isn't and the AI pumps out the original or something close enoigh to it.

The law has always recognized that ideas themselves can't be owned - only particular expressions of them.

I'd be careful with the "always" part. There was a famous case involving Katy Perry where a single chord was sued over as copyright infringement. The case was thrown out on appeal, but I do not doubt that some pretty wild cases have been upheld as copyright violations (see "patent troll").

Moreover, there's precedent for this kind of use being considered "transformative" and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

The problem is that Google books only lets you search some phrase and have it pop up as beibg from source xy. It doesn't have the capability of reproducing it (other than maybe the page it was on perhaps) - well, it does have the capability since it's in the index somewhere, but there are checks in place to make sure it doesn't happen, which seem to be yet unachieved in AI.

While it's understandable that creators feel uneasy about this new technology, labeling it "theft" is both legally and technically inaccurate.

Yes. Just as labeling piracy as theft is.

We may need new ways to support and compensate creators in the AI age, but that doesn't make the current use of copyrighted works for AI training illegal or

Yes, new legislation will made to either let "Big AI" do as it pleases, or prevent it from doing so. Or, as usual, it'll be somewhere inbetween and vary from jurisdiction to jurisdiction.

However,

that doesn't make the current use of copyrighted works for AI training illegal or unethical.

this doesn't really stand. Sure, morals are debatable and while I'd say it is more unethical as private piracy (so no distribution) since distribution and disemination is involved, you do not seem to feel the same.

However, the law is clear. Private piracy (as in recording a song off of radio, a TV broadcast, screen recording a Netflix movie, etc. are all legal. As is digitizing books and lending the digital (as long as you have a physical copy that isn't lended out as the same time representing the legal "original"). I think breaking DRM also isn't illegal (but someone please correct me if I'm wrong).

The problems arises when the pirated content is copied and distributed in an uncontrolled manner, which AI seems to be capable of, making the AI owner as liable of piracy if the AI reproduced not even the same, but "substantially similar" output, just as much as hosts of "classic" pirated content distributed on the Web.

Obligatory IANAL and as far as the law goes, I focused on US law since the default country on here is the US. Similar or different laws are on the books in other places, although most are in fact substantially similar. Also, what the legislators cone up with will definately vary from place to place, even more so than copyright law since copyright law is partially harmonised (see Berne convention).

[โ€“] Michal@programming.dev 1 points 2 months ago

I'd be careful with the "always" part. There was a famous case involving Katy Perry where a single chord was sued over as copyright infringement. The case was thrown out on appeal, but I do not doubt that some pretty wild cases have been upheld as copyright violations (see "patent troll").

Are you really trying to argue against a point by providing evidence supporting it?

load more comments (4 replies)