And even if local small-scale models turn out to be optimal, that wouldn't stop big business from using them. I'm not sure what "it" is being referred to with "I hope it collapses."
FaceDeer
Conversely, there are way too many people who think that humans are magic and that it's impossible for AI to ever do <insert whatever is currently being debated here>.
I've long believed that there's a smooth spectrum between not-intelligent and human-intelligent. It's not a binary yes/no sort of thing. There's basic inert rocks at one end, and humans at the other, and everything else gets scattered at various points in between. So I think it's fine to discuss where exactly on that scale LLMs fall, and accept the possibility that they're moving in our direction.
I actually think public perception is not going to be that big a deal one way or the other. A lot of decisions about AI applications will be made by businessmen in boardrooms, and people will be presented with the results without necessarily even knowing that it's AI.
Those recent failures only come across as cracks for people who see AI as magic in the first place. What they're really cracks in is people's misperceptions about what AI can do.
Recent AI advances are still amazing and world-changing. People have been spoiled by science fiction, though, and are disappointed that it's not the person-in-a-robot-body kind of AI that they imagined they were being promised. Turns out we don't need to jump straight to that level to still get dramatic changes to society and the economy out of it.
I get strong "everything is amazing and nobody is happy" vibes from this sort of thing.
And some of those hosts can decide to serve up their content to AI trainers. Some of those hosts can be run by AI trainers, specifically to gather data for training. If one was to try to prevent that then one would be attacking the open nature of the fediverse.
There have been many people raging about their content being used to train AIs without permission or compensation. I'm speaking to those people, not the "fediverse collectively". As you suggest, the fediverse can't say anything collectively.
It's the "make some people non-white" kludge that's the specific problem being discussed here.
The training data skewing white is a different problem, but IMO not as big of one. The solution is simple, as I've discovered over many months of using local image generators. Let the user specify what exactly they want.
It's not the training data that's the problem here.
The term "AI" has a much broader meaning and use than the sci-fi "thinking machine" that people are interpeting it as. The term has been in use by scientists for many decades already and these generative image programs and LLMs definitely fit within it.
You are likely thinking of AGI, or artificial general intelligence. We don't have those yet, but these things aren't intended to be AGI so that's to be expected.
And even if it was Google, these companies aren't magic. Once there's a proof of concept out there that something like this can be done other companies will dump resources into catching up with it. Cue the famous "we have no moat" memo.
Yup. There are dumps of Reddit's entire archive of comments and posts available via torrent, I suspect the only reason Reddit's getting paid for that stuff right now is that it's a legal ass-covering that's comparatively cheap. Anyone who's a little daring could use it to train an LLM and if they prep the data well enough it'd be hard to even notice.
Well, I hope my answer clarifies it. You can't prevent LLMs from being trained on your public posts.
There was an interesting paper published just recently titled Generative Models: What do they know? Do they know things? Let's find out! (a lot of fun names and titles in the AI field these days :) ) That does a lot of work in actually analyzing what an AI image generator "knows" about what they're depicting. They seem to have an awareness of three dimensional space, of light and shadow and reflectivity, lots of things you wouldn't necessarily expect from something trained just on 2-D images tagged with a few short descriptive sentences. This article from a few months ago also delved into this, it showed that when you ask a generative AI to create a picture of a physical object the first thing the AI does is come up with the three-dimensional shape of the scene before it starts figuring out what it looks like. Quite interesting stuff.