this post was submitted on 15 Jun 2024
35 points (60.4% liked)

Technology

59963 readers
3257 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] NutWrench@lemmy.world 52 points 6 months ago (15 children)

Each conversation lasted a total of five minutes. According to the paper, which was published in May, the participants judged GPT-4 to be human a shocking 54 percent of the time. Because of this, the researchers claim that the large language model has indeed passed the Turing test.

That's no better than flipping a coin and we have no idea what the questions were. This is clickbait.

[–] SkyeStarfall@lemmy.blahaj.zone 2 points 6 months ago (7 children)

While I agree it's a relatively low percentage, not being sure and having people pick effectively randomly is still an interesting result.

The alternative would be for them to never say that gpt-4 is a human, not 50% of the time.

[–] Hackworth@lemmy.world 7 points 6 months ago (6 children)

Participants only said other humans were human 67% of the time.

[–] SkyeStarfall@lemmy.blahaj.zone 5 points 6 months ago (1 children)

Which makes the difference between the AIs and humans lower, likely increasing the significance of the result.

[–] Hackworth@lemmy.world 1 points 6 months ago (1 children)

Aye, I'd wager Claude would be closer to 58-60. And with the model probing Anthropic's publishing, we could get to like ~63% on average in the next couple years? Those last few % will be difficult for an indeterminate amount of time, I imagine. But who knows. We've already blown by a ton of "limitations" that I thought I might not live long enough to see.

[–] dustyData@lemmy.world 2 points 6 months ago (1 children)

The problem with that is that you can change the percentage of people who identify correctly other humans as humans. Simply by changing the way you setup the test. If you tell people they will be, for certain, talking to x amount of bots, they will make their answers conform to that expectation and the correctness of their answers drop to 50%. Humans are really bad at determining whether a chat is with a human or a bot, and AI is no better either. These kind of tests mean nothing.

[–] Hackworth@lemmy.world 1 points 6 months ago (1 children)

Humans are really bad at determining whether a chat is with a human or a bot

Eliza is not indistinguishable from a human at 22%.

Passing the Turing test stood largely out of reach for 70 years precisely because Humans are pretty good at spotting counterfeit humans.

This is a monumental achievement.

[–] dustyData@lemmy.world 0 points 6 months ago* (last edited 6 months ago) (1 children)

First, that is not how that statistic works, like you are reading it entirely wrong.

Second, this test is intentionally designed to be misleading. Comparing ChatGPT to Eliza is the equivalent of me claiming that the Chevy Bolt is the fastest car to ever enter a highway by comparing it to a 1908 Ford Model T. It completely ignores a huge history of technological developments. There have been just as successful chatbots before ChatGPT, just they weren't LLM and they were measured by other methods and systematic trials. Because the Turing test is not actually a scientific test of anything, so it isn't standardized in any way. Anyone is free to claim to do a Turing Test whenever and however without too much control. It is meaningless and proves nothing.

load more comments (4 replies)
load more comments (4 replies)
load more comments (11 replies)