this post was submitted on 23 Feb 2024
97 points (91.5% liked)
Technology
59589 readers
3148 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
They got the training data from Reddit, what did they expect?
It's not the training data that's the problem here.
Yeah it is. The training data skews white, so they added a "make some people non-white" kludge. It wouldn't be needed if there was actually racial diversity in the training data.
It's the "make some people non-white" kludge that's the specific problem being discussed here.
The training data skewing white is a different problem, but IMO not as big of one. The solution is simple, as I've discovered over many months of using local image generators. Let the user specify what exactly they want.
I don’t even see the problem with that. If western corps make an ai based overwhelmingly on western (aka majorities white people) datasets they get an ai that skews white in all things.
If they want more well rounded data they would need to buy them from China and India, probably other parts of Asia too. Only that I don’t think they are willing to give those datasets away because they are aware of their actual value, and/or are more interested in creating their own ai with it (which will then of course skew chinese for example).