this post was submitted on 26 Jan 2024
430 points (83.1% liked)

Technology

59495 readers
3081 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

you are viewing a single comment's thread
view the rest of the comments
[–] Jilanico@lemmy.world 2 points 9 months ago (8 children)

So stable diffusion, midjourney, etc., all have massive databases with every picture on the Internet stored in them? I know the AI models are trained on lots of images, but are the images actually stored? I'm skeptical, but I'm no expert.

[–] QubaXR@lemmy.world 0 points 9 months ago (7 children)

These models were trained on datasets that, without compensating the authors, used their work as training material. It's not every picture on the net, but a lot of it is scrubbing websites, portfolios and social networks wholesale.

A similar situation happens with large language models. Recently Meta admitted to using illegally pirated books (Books3 database to be precise) to train their LLM without any plans to compensate the authors, or even as much as paying for a single copy of each book used.

[–] archomrade@midwest.social -2 points 9 months ago* (last edited 9 months ago) (2 children)

These models were trained on datasets that, without compensating the authors, used their work as training material.

Couple things:

  • this doesn't explain ops question about how the information is stored. On fact op is right, that the images and source material is NOT stored in a database within the model, it basically just stores metadata about the source material as a whole in order to construct new material from text descriptions

  • the use of copyrighted works in the training isn't necessarily infringing if the model is found to be a fair use, and there is a very strong fair use argument here.

[–] QubaXR@lemmy.world 3 points 9 months ago (1 children)

"metadata" is such a pretty word. How about "recipe" instead? It stores all information necessary to reproduce work verbatim or grab any aspect of it.

The legal issue of copyright is a tricky one, especially in the US where copyright is often being weaponized by corporations. The gist of it is: The training model itself was an academic endeavor and therefore falls under a fair use. Companies like StabilityAI or OpenAI then used these datasets and monetized products built on them, which in my understanding skims gray zone of being legal.

If these private for-profit companies simply took the same data and built their own, identical dataset they would be liable to pay the authors for use of their work in commercial product. They go around it by using the existing model, originally created for research and not commercial use.

Lemmy is full of open source and FOSS enthusiasts, I'm sure someone can explain it better than I do.

All in all I don't argue about the legality of AI, but as a professional creative I highlight ethical (plagiarism) risks that are beginning to arise in majority of the models. We all know Joker, Marvel superheroes, popular Disney and WB cartoon characters - and can spot when "our" generations cross the line of copying someone else's work. But how many of us are familiar with Polish album cover art, Brazilian posters, Chinese film superheroes or Turkish logos? How sure can we be that the work "we" produced using AI is truly original and not a perfect copy of someone else's work? Does our ignorance excuse this second-hand plagiarism? Or should the companies releasing AI models stop adding features and fix that broken foundation first?

[–] archomrade@midwest.social 0 points 9 months ago

“metadata” is such a pretty word. How about “recipe” instead?

Well isn't recipe another one of those pretty words? 'Metadata' is specific to other precedents that deal with computer programs that gather data about works (see Authors Guild, Inc. v. HathiTrust and Authors Guild v. Google), but you're welcome to challenge the verbiage if you don't like it. Regardless, what we're discussing is objectively something that describes copyrighted works, not copies or a copy of the works themselves. A computer program that is very good at analyzing textual/pixelated data is still only analyzing data, it is itself a novel, non-expressive factual representation of other expressive works, and because of this, it cannot be considered as infringement on its own.

It stores all information necessary to reproduce work verbatim or grab any aspect of it.

This isn't really true, at least not for the majority of works analyzed by the model, but granted. If a person uses a tool to copy the work of another person, it is the person who is doing the copying, not the tool. I think it is far more reasonable to hold an individual who uses an AI model to infringe on a copyright responsible. If someone chooses to author a work with the use of a tool that does the work for them (in part or in whole), it is more than reasonable to expect that individual to check the work that is being produced.

All in all I don’t argue about the legality of AI, but as a professional creative I highlight ethical (plagiarism) risks that are beginning to arise in majority of the models.

As a professional creative myself, I think this is a load of horseshit. We always hold individual authors responsible for the work that they publish, and it should be no different here. That some choose to be lazy and careless is more of a reflection of them.

How sure can we be that the work “we” produced using AI is truly original and not a perfect copy of someone else’s work?

If you have the words to describe a desired image/text response to the model that produce a 'perfect copy of someone else's work', then we have the words to search for that work, too.

Or should the companies releasing AI models stop adding features and fix that broken foundation first?

How about we stop expanding the scope of an already broken copyright law and fix that broken foundation first?

load more comments (4 replies)
load more comments (4 replies)