this post was submitted on 23 Feb 2026
710 points (97.3% liked)

Technology

82488 readers
3968 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the 'reasoning' models.

you are viewing a single comment's thread
view the rest of the comments
[–] Slashme@lemmy.world 69 points 2 weeks ago (21 children)

The most common pushback on the car wash test: "Humans would fail this too."

Fair point. We didn't have data either way. So we partnered with Rapidata to find out. They ran the exact same question with the same forced choice between "drive" and "walk," no additional context, past 10,000 real people through their human feedback platform.

71.5% said drive.

So people do better than most AI models. Yay. But seriously, almost 3 in 10 people get this wrong‽‽

[–] masterofn001@lemmy.ca 1 points 2 weeks ago* (last edited 2 weeks ago) (14 children)

Without reading the article, the title just says wash the car.

I could go for a walk and wash my car in my driveway.

Reading the article... That is exactly the question asked. It is a very ambiguous question.

*I do understand the intent of the question, but it could be phrased more clearly.

[–] elucubra@sopuli.xyz 4 points 2 weeks ago (1 children)

It is not. It says what I want to do, and where.

[–] masterofn001@lemmy.ca 2 points 2 weeks ago* (last edited 2 weeks ago)

Understanding the intent of the question *and understanding why it could be interpreted differently *\and understanding why is it is a poorly phrased question:

There are 3 sentences.

I want to wash my car. No location or method is specified. No 'at the car wash'. No 'take my car to the car wash' . No 'take the car through the car wash'

A car wash is this far. Is this an option? A question. A suggestion. A demand?

Should I walk or drive? To do what? Wash the car? Ok. If the car wash is an option, that seems very far. But walking there seems silly. Since no method or location for washing the car was mentioned I could wash my own car.

Do you see how this works?

Yes, you can infer what was implied, but the question itself offers no certainty that what you infer is what it is actually implying.

load more comments (12 replies)
load more comments (18 replies)