this post was submitted on 26 Jan 2024
430 points (83.1% liked)
Technology
59605 readers
3501 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Because this proves that the "AI", at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.
Likely because the "AI" was trained upon this image at some point. This has repercussions with regards to copyright law. It means the training set contains copyrighted data and the use of said training set could be argued as piracy.
Legal discussions on how to talk about generative-AI are only happening now, now that people can experiment with the technology. But its not like our laws have changed, copyright infringement is copyright infringement. If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.
But where is the infringement?
This NYT article includes the same several copyrighted images and they surely haven't paid any license. It's obviously fair use in both cases and NYT's claim that "it might not be fair use" is just ridiculous.
Worse, the NYT also includes exact copies of the images, while the AI ones are just very close to the original. That's like the difference between uploading a video of yourself playing a Taylor Swift cover and actually uploading one of Taylor Swift's own music videos to YouTube.
Even worse the NYT intentionally distributed the copyrighted images, while Midjourney did so unintentionally and specifically states it's a breach of their terms of service. Your account might be banned if you're caught using these prompts.
Do Training weights have the data? Are the servers copying said data on a mass scale, in a way that the original copyrighters don't want or can't control?
There response well be we don't know we can't understand what its doing.
What the fuck is this kind of response? Its just a fucking neural network running on GPUs with convolutional kernels. For fucks sake, turn on your damn brain.
Generative AI is actually one of the easier subjects to comprehend here. Its just calculus. Use of derivatives to backpropagate weights in such a way that minimizes error. Lather-rinse-repeat for a billion iterations on a mass of GPUs (ie: 20 TFlop compute systems) for several weeks.
Come on, this stuff is well understood by Comp. Sci by now. Not only 20 years ago when I learned about this stuff, but today now that AI is all hype, more and more people are understanding the basics.
Understanding the math behind it doesn't immediately mean understanding the decision progress during forward propagation. Of course you can mathematically follow it, but you're quickly gonna lose the overview with that many weights. There's a reason XAI is an entire subfield in Machine Learning.
Ummm... its lossy compressed data from the training set.
Is it a perfect copy? No. But copyright law covers "derivative data" so whatever, the law remains clear on this situation.
Bro who even knows calculus anymore we have calculators for a reason 🤷♀️