this post was submitted on 19 Feb 2025
80 points (91.7% liked)

Technology

62936 readers
3426 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
all 8 comments
sorted by: hot top controversial new old
[–] Ice@lemmy.dbzer0.com 5 points 4 hours ago

In my humble opinion, the most important aspect here is that it shouldn't be possible to copyright ai-generated works. They are trained on the collective body of human intellectual product, the entire public domain, if you will, and in turn whatever is produced should be public domain, and available to everyone.

Certainly an AI company may charge for usage/distribution and generation of content to fund their endeavours, but that is about the limit of it as I see.

[–] tal@lemmy.today 8 points 1 day ago* (last edited 1 day ago) (2 children)

So, I agree with the EFF that we should not introduce some kind of new legal right to prohibit training on something just because it's copyrighted. There's nothing that keeps a human from training themselves on content, so neither should an AI be prohibited.

However.

It is possible for a human to make a work that will infringe existing copyright rights, by producing a derivative work. Not every work inspired by something else will meet the legal bar for being derived, but some can. And just as a human can do that, so too can AIs.

I have no problem with, say, an AI being able to emulate a style. But it's possible for AIs today to produce works that do meet the bar for being derivative works. As things stand, I believe that that'd make the user of the AI liable. And yet, there's not really a very good way for them to avoid that. That's a legit point of complaint, I think, because it leads to people making derivative works.

The existing generative AI systems don't have a very good way of trying to hint to a user of the model whether a work is derivative.

However, I'd think that what we could do is operate something like a federal registry of images. For published, copyrighted works, we already have mandatory deposit with the Library of Congress.

If something akin to Tineye were funded by the government, it would be possible to maintain an archive of registered, copyrighted work. It would then be practical for someone who had just generated an image to check whether there was a pre-existing image.

I don't know whether Tineye works like this, but for it to work, we'd probably have to have a way to recognize an image under a bunch of transformations: scale, rotation, color, etc. I don't know what Tineye does today, but I'd assume some kind of feature recognition -- maybe does line-detection, vectorizes it, breaks an image up into a bunch of chuns, performs some operation to canonicalize the rotation based on the content of the chunk, and then performs some kind of fuzzy hash on the lines.

Then one could place an expectation that if one is to distribute an LLM-generated work, it be fed into such a system, and if not so verified and distributed and the work is derivative of a registered work, the presumption being that the infringement was intentional (which IIRC entitles a rights holder to treble damages under US law). We don't have a mathematical model today to determine whether one work is "derivative" of another, but we could make one or at least give an approximation and warning.

I think that that's practical for most cases for for holders of copyrighted images and LLM users. It permits people to use LLMs to generate images for non-distributed use. It doesn't create a legal minefield for an LLM user. It places no restrictions on model creators. It's doable using something like existing technology. It permits a viewer of a generated image to verify that the image is not derivative.

[–] artificialfish@programming.dev 1 points 9 hours ago (1 children)

You could always just do reverse search on the open dataset to see if it’s an exact copy (or over a threshold).

You MIGHT even be able to do that while masking the data using hashing.

[–] tal@lemmy.today 1 points 9 hours ago* (last edited 9 hours ago) (1 children)

You could always just do reverse search on the open dataset to see if it’s an exact copy (or over a threshold).

True, but "exact copy" almost certainly isn't going to be what gets produced -- and you can have a derivative work that isn't an exact copy of the original, just generate something that looks a lot like part of the original. Like, you'd want to have a pretty good chance of finding a derivative work.

And that would mean that anyone who generates a model to would need to provide access their training corpus, which is gonna be huge -- the models, which themselves are large, are a tiny fraction the size of the training set -- and I'm sure that some people generating models aren't gonna want to provide all of their training corpus.

[–] artificialfish@programming.dev 1 points 7 hours ago

Minhash might be able to produce a similarity metric without needing exactness and without revealing the training data.

[–] Pyro@pawb.social 4 points 1 day ago

It is a mine field. The fact that it can generate almost an exact copy of some things if it's over trained on an image or if the stars hit just right

On a different not llm is Large Language Model not the image generator

[–] MudMan@fedia.io 1 points 1 day ago

Some frequently repeated false premises, particularly on what AI is and does, but mostly correct conclusions on the effects of regulating it through copyright expansion, IMO.