this post was submitted on 29 Aug 2024
53 points (70.5% liked)

Technology

59589 readers
3024 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] lvxferre@mander.xyz 25 points 2 months ago* (last edited 2 months ago) (2 children)

I'm not from USA, black, nor a native English speaker, but due to Linguistics I can give you guys some further info.

AAE (Afro-American English), in a nutshell, is a group of English varieties used by some speakers from USA and Canada. In a lot of aspects they resemble geographical varieties, like the ones you'd see in plenty other languages, but there's a key difference: it isn't used by people "of a certain region", but rather by people "of a certain race" (black people).

This is mostly but not completely spoken (cue to the term AAVE - the "V" stands for "vernacular"); it affects also the way that those people use the written language. So often you see AAE features in written English, like:

  • Negative concord - for example, "I don't want to hear nothing about this shit, man."
  • Habitual-be - for example, "They be talking about this everyday."
  • bits of non-standard spelling, due to phonetic differences
  • expressions and vocab typically used primarily by black people

What the article is saying is that LLMs are biased against those features. It's a rather strong bias, and not noticed for a geographical variety used as reference (Appalachian English). In other words: the LLM has been fed racist babble, and now it's regurgitating it.

[–] yamanii@lemmy.world 4 points 2 months ago (1 children)

I see, that's very different from most countries I imagine? People often speak on their own local dialect, here a northeastern would informally speak a completely different portuguese than someone from the south, doesn't matter the race.

[–] lvxferre@mander.xyz 4 points 2 months ago

Yup, it's atypical even in the rest of the Americas. I think that the nearest equivalent in Portuguese would be the quilombola dialects, but even then it's way off - because those dialects are still geographically associated with their respective quilombos, not just with race.

[–] givesomefucks@lemmy.world 3 points 2 months ago (1 children)

Since they’re vernacular you’ll mostly hear them being spoken, they aren’t really written

AAVE is commonly "written" now because most writing is texts and social media comments. So even if they luck out and learn "proper" English, people still going to type on their phones the same way they talk.

Even for white kids, most of Gen Z slang is just taken from AAVE, when older people complaining about not being able to read zoomer slang from text or comments, it's just heavily influenced by AAVE.

There's been bleed over for centuries, but with the Internet and social media it's merging faster, which is common for dialects of people that interact frequently

[–] lvxferre@mander.xyz 9 points 2 months ago (1 children)

Warning: I've edited the comment that you're replying to. I'm saying this for the sake of transparency, as you're clearly quoting the earlier version.

The key here is that AAVE is not written, but AAE is. That "V" is for vernacular, it excludes written English by definition.

Now, I'm not sure if those white kids are using AAE or simply borrowing things from AAE into their written English. I simply don't have data on that.

There’s been bleed over for centuries, but with the Internet and social media it’s merging faster, which is common for dialects of people that interact frequently

Varieties merging or splitting is rarely the result of just more contact between people; it's all about identity. If things are happening as you described them, it's simply that those white kids stopped seeing black people as "the others", to see them as "part of the same group as us".

[–] givesomefucks@lemmy.world -2 points 2 months ago (1 children)

That “V” is for vernacular, it excludes written English by definition.

Yeah. But most people "write" online like they speak...

https://commonwealthtimes.org/2021/02/18/aave-is-not-your-internet-slang-it-is-black-culture/

If people followed rules about language, yeah, vernacular would just be spoken speech. But that's not how it works. The rules are made to reflect what people are doing. The rules don't control what people do.

So yes, while the word vernacular commonly meant only spoken words, there ain't nothing stopping nobody from typing like they speak.

And people been doing it for a long time

[–] lvxferre@mander.xyz 12 points 2 months ago (2 children)

Yeah. But most people “write” online like they speak…

That's a common misconception.

While your written and spoken varieties do interact a fair bit, no, people don't "write like they speak". Not even online.

And that is not simply an "ackshyually". A lot of AAVE features simply don't transpose into writing - like prosody, non-rhoticity, /ɪ/-breaking, /äɪ/-monophtongisation... at most you can consciously approximate them into writing, but they won't be there.

If people followed rules about language, yeah, vernacular would just be spoken speech. But that’s not how it works. The rules are made to reflect what people are doing.

That is not about people following/not following "rules", it's about nomenclature - it's exactly the reason why "AAE" and "AAVE" are necessary as separated terms.

[–] treefrog@lemm.ee -2 points 2 months ago (1 children)

More and more people are using speech to text. And it does show how differently people speak than write (apparently I never say my be in because, for example).

But it also means that llms aren't only being fed text, but also speech converted into text.

[–] lvxferre@mander.xyz 4 points 2 months ago

For me it's like "holy fuck... do I eat so fucking many vowels???" It reaches a point that I eventually gave up using text-to-speech with Portuguese in my cell phone, I go straight for Italian because at least then it gets me right.

But it also means that llms aren’t only being fed text, but also speech converted into text.

That might be part of the issue causing the bias shown in the article.