this post was submitted on 19 Feb 2025
84 points (92.0% liked)

Technology

63082 readers
3856 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] tal@lemmy.today 8 points 2 days ago* (last edited 2 days ago) (4 children)

So, I agree with the EFF that we should not introduce some kind of new legal right to prohibit training on something just because it's copyrighted. There's nothing that keeps a human from training themselves on content, so neither should an AI be prohibited.

However.

It is possible for a human to make a work that will infringe existing copyright rights, by producing a derivative work. Not every work inspired by something else will meet the legal bar for being derived, but some can. And just as a human can do that, so too can AIs.

I have no problem with, say, an AI being able to emulate a style. But it's possible for AIs today to produce works that do meet the bar for being derivative works. As things stand, I believe that that'd make the user of the AI liable. And yet, there's not really a very good way for them to avoid that. That's a legit point of complaint, I think, because it leads to people making derivative works.

The existing generative AI systems don't have a very good way of trying to hint to a user of the model whether a work is derivative.

However, I'd think that what we could do is operate something like a federal registry of images. For published, copyrighted works, we already have mandatory deposit with the Library of Congress.

If something akin to Tineye were funded by the government, it would be possible to maintain an archive of registered, copyrighted work. It would then be practical for someone who had just generated an image to check whether there was a pre-existing image.

I don't know whether Tineye works like this, but for it to work, we'd probably have to have a way to recognize an image under a bunch of transformations: scale, rotation, color, etc. I don't know what Tineye does today, but I'd assume some kind of feature recognition -- maybe does line-detection, vectorizes it, breaks an image up into a bunch of chuns, performs some operation to canonicalize the rotation based on the content of the chunk, and then performs some kind of fuzzy hash on the lines.

Then one could place an expectation that if one is to distribute an LLM-generated work, it be fed into such a system, and if not so verified and distributed and the work is derivative of a registered work, the presumption being that the infringement was intentional (which IIRC entitles a rights holder to treble damages under US law). We don't have a mathematical model today to determine whether one work is "derivative" of another, but we could make one or at least give an approximation and warning.

I think that that's practical for most cases for for holders of copyrighted images and LLM users. It permits people to use LLMs to generate images for non-distributed use. It doesn't create a legal minefield for an LLM user. It places no restrictions on model creators. It's doable using something like existing technology. It permits a viewer of a generated image to verify that the image is not derivative.

[–] Pyro@pawb.social 4 points 2 days ago

It is a mine field. The fact that it can generate almost an exact copy of some things if it's over trained on an image or if the stars hit just right

On a different not llm is Large Language Model not the image generator

load more comments (3 replies)