Not just that, but to sell a product that by its very nature threatens the livelihoods of the same people whose labor and creativity is being used without permission.
Eccitaze
This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages.
Like fuck it is. An LLM "learns" by memorization and by breaking down training data into their component tokens, then calculating the weight between these tokens. This allows it to produce an output that resembles (but may or may not perfectly replicate) its training dataset, but produces no actual understanding or meaning--in other words, there's no actual intelligence, just really, really fancy fuzzy math.
Meanwhile, a human learns by memorizing training data, but also by parsing the underlying meaning and breaking it down into the underlying concepts, and then by applying and testing those concepts, and mastering them through practice and repetition. Where an LLM would learn "2+2 = 4" by ingesting tens or hundreds of thousands of instances of the string "2+2 = 4" and calculating a strong relationship between the tokens "2+2," "=," and "4," a human child would learn 2+2 = 4 by being given two apple slices, putting them down to another pair of apple slices, and counting the total number of apple slices to see that they now have 4 slices. (And then being given a treat of delicious apple slices.)
Similarly, a human learns to draw by starting with basic shapes, then moving on to anatomy, studying light and shadow, shading, and color theory, all the while applying each new concept to their work, and developing muscle memory to allow them to more easily draw the lines and shapes that they combine to form a whole picture. A human may learn off other peoples' drawings during the process, but at most they may process a few thousand images. Meanwhile, an LLM learns to "draw" by ingesting millions of images--without obtaining the permission of the person or organization that created those images--and then breaking those images down to their component tokens, and calculating weights between those tokens. There's about as much similarity between how an LLM "learns" compared to human learning as there is between my cat and my refrigerator.
And YET FUCKING AGAIN, here's the fucking Google Books argument. To repeat: Google Books used a minimal portion of the copyrighted works, and was not building a service to compete with book publishers. Generative AI is using the ENTIRE COPYRIGHTED WORK for its training set, and is building a service TO DIRECTLY COMPETE WITH THE ORGANIZATIONS WHOSE WORKS THEY ARE USING. They have zero fucking relevance to one another as far as claims of fair use. I am sick and fucking tired of hearing about Google Books.
EDIT: I want to make another point: I've commissioned artists for work multiple times, featuring characters that I designed myself. And pretty much every time I have, the art they make for me comes with multiple restrictions: for example, they grant me a license to post it on my own art gallery, and they grant me permission to use portions of the art for non-commercial uses (e.g. cropping a portion out to use as a profile pic or avatar). But they all explicitly forbid me from using the work I commissioned for commercial purposes--in other words, I cannot slap the art I commissioned on a T-shirt and sell it at a convention, or make a mug out of it. If I did so, that artist would be well within their rights to sue the crap out of me, and artists charge several times as much to grant a license for commercial use.
In other words, there is already well-established precedent that even if something is publicly available on the Internet and free to download, there are acceptable and unacceptable use cases, and it's broadly accepted that using other peoples' work for commercial use without compensating them is not permitted, even if I directly paid someone to create that work myself.
And to the argument itself: Just because AI is better at learning from existing works, faster, more complete, better memory, doesn’t meant that it’s fundamentally different than humans learning from artwork. Almost EVERY artist arguing for this is stealing themselves since they learned and was inspired by existing works.
Tell me you're not an artist without telling me you're not an artist
Fucking Christ I am so sick of people referencing the Google books lawsuit in any discussion about AI
The publishers lost that case because the judge ruled that Google Books was copying a minimal portion of the books, and that Google Books was not competing against the publishers, thus the infringement was ruled as fair use.
AI training does not fall under this umbrella, because it's using the entirety of the copyrighted work, and the purpose of this infringement is to build a direct competitor to the people and companies whose works were infringed. You may as well talk about OJ Simpson's criminal trial, it's about as relevant.
That feels like it's rather besides the point, innit? You've got AI companies showing off AI art and saying "look at what this model can do," you've got entire communities on Lemmy and Reddit dedicated to posting AI art, and they're all going "look at what I made with this AI, I'm so good at prompt engineering" as though they did all the work, and the millions of hours spent actually creating the art used to train the model gets no mention at all, much less any compensation or permission for their works to be used in the training. Sure does seem like people are passing AI art off as their own, even if they're not claiming copyright.
What evidence is there that gen AI hasn't peaked? They've already scraped most of the public Internet to get what we have right now, what else is there to feed it? The AI companies are also running out of time--VCs are only willing to throw money at them for so long, and given the rate of expenditure on AI so far outpaces pretty much every other major project in human history, they're going to want a return on investment sooner rather than later. If they were making significant progress on a model that could do the things you were saying, they would be talking about it so that they could buy time and funding from VCs. Instead, we're getting vague platitudes about "AGI" and meaningless AI sentience charts.
I actually had some thoughts about this and posted this in a similar thread:
First, that artist will only learn from a few handful of artists instead of every artist's entire field of work all at the same time. They will also eventually develop their own unique style and voice--the art they make will reflect their own views in some fashion, instead of being a poor facsimile of someone else's work.
Second, mimicking the style of other artists is a generally poor way of learning how to draw. Just leaping straight into mimicry doesn't really teach you any of the fundamentals like perspective, color theory, shading, anatomy, etc. Mimicking an artist that draws lots of side profiles of animals in neutral lighting might teach you how to draw a side profile of a rabbit, but you'll be fucked the instant you try to draw that same rabbit from the front, or if you want to draw a rabbit at sunset. There's a reason why artists do so many drawings of random shit like cones casting a shadow, or a mannequin doll doing a ballet pose, and it ain't because they find the subject interesting.
Third, an artist spends anywhere from dozens to hundreds of hours practicing. Even if someone sets out expressly to mimic someone else's style, teaches themselves the fundamentals, it's still months and years of hard work and practice, and a constant cycle of self-improvement, critique, and study. This applies to every artist, regardless of how naturally talented or gifted they are.
Fourth, there's a sort of natural bottleneck in how much art that artist can produce. The quality of a given piece of art scales roughly linearly with the time the artist spends on it, and even artists that specialize in speed painting can only produce maybe a dozen pieces of art a day, and that kind of pace is simply not sustainable for any length of time. So even in the least charitable scenario, where a hypothetical person explicitly sets out to mimic a popular artist's style in order to leech off their success, it's extremely difficult for the mimic to produce enough output to truly threaten their victim's livelihood. In comparison, an AI can churn out dozens or hundreds of images in a day, easily drowning out the artist's output.
And one last, very important point: artists who trace other people's artwork and upload the traced art as their own are almost universally reviled in the art community. Getting caught tracing art is an almost guaranteed way to get yourself blacklisted from every art community and banned from every major art website I know of, especially if you're claiming it's your own original work. The only way it's even mildly acceptable is if the tracer explicitly says "this is traced artwork for practice, here's a link to the original piece, the artist gave full permission for me to post this." Every other creative community writing and music takes a similarly dim views of plagiarism, though it's much harder to prove outright than with art. Given this, why should the art community treat someone differently just because they laundered their plagiarism with some vector multiplication?
You literally haven't, except maybe by sticking your fingers in you ears and going "NUH UH"
but go on king
Here's the point since you clearly missed it:
If Brave gets even a moderate market share, Google will continue to mess them around like this as they really don't like people not seeing their adverts.
Ultimately it's software, so the Brave devs can do pretty much whatever they want, limited by the available time and money. Google's influence extends to making that either easier or harder, it much the same way as they influence the Android ecosystem.
Brave may not be particularly affected by this change, but that's besides the point. If Brave starts becoming a viable threat to Google, Google can easily start making changes to Chromium that target Brave and breaks the changes they make, just like they targeted uBlock Origin and broke it with manifest v3. Brave might be able to work around these changes, but it costs time and developer labor (i.e. money) that would have been spent elsewhere, and if Google makes things hard enough on Brave they could be forced to abandon the project.
Or anywhere relatively rural. I just got home from a long weekend in rural Minnesota/Wisconsin, and there's literally no viable way to run public transit out there in a manner that wouldn't either be so restrictive as to be useless, or would lose so much money it would be first on the block for service cuts (and therefore become useless). I'm talking "town of 600 residents, most people live on unincorporated county land on a farmstead, and the only grocery store in a 50 mile radius is a Dollar General" rural. Asking these folks to give up cars is an insane prospect.
People always assume that generative AI (and technology in general) will continue improving at the same pace it always has been. They always assume that there are no limits in the number of parameters, that there's always more useful data to train it on, and that things like physical limits in electricity infrastructure, compute resources, etc., don't exist. In five years generative AI will have roughly the same capability it has today, barring massive breakthroughs that result in a wholesale pivot away from LLMs. (More likely, in five years it'll be regarded similarly to cryptocurrency is today, because once the hype dies down and the VC money runs out the AI companies will have to jack prices to a level where it's economically unviable to use in most commercial environments.)
Wrong. The infringement is in obtaining the data and presenting it to the AI model during the training process. It makes no difference that the original work is not retained in the model's weights afterwards.
Yes, because copyright law is intended to benefit human creativity.
Wrong. Search engines retain a minimal amount of the indexed website's data, and the purpose of the search engine is to generate traffic to the website, providing benefit for both the engine and the website (increased visibility, the opportunity to show ads to make money). Banning the use of copyrighted content for AI training (which uses the entire copyrighted work and whose purpose is to replace the organizations whose work is being used) will have no effect.