this post was submitted on 15 Jun 2024

35 points (60.4% liked)

Technology

72406 readers

2007 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

Researchers claim GPT-4 passed the Turing test (bgr.com)

submitted 1 year ago by vegeta@lemmy.world to c/technology@lemmy.world

80 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] HerzogVonWiesel@sh.itjust.works 65 points 1 year ago (1 children)

ITT: nobody understands what the Turing Test really is

[–] webghost0101@sopuli.xyz 7 points 1 year ago* (last edited 1 year ago) (1 children)

To clarify:

People seem to legit think the jury talks to the bot in real time and can ask about literally whatever they want.

Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

[–] technocrit@lemmy.dbzer0.com 7 points 1 year ago (2 children)

Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

lmao. These "scientists" are frauds. 500 people is not a legit sample site. 5 minutes is a pathetic amount of time. 54% is basically the same as guessing. And most importantly the "Turing Test" is not a scientific test that can be "passed" with one weak study.

Instead of bootlicking "scientists", we should be harshly criticizing the overwhelming tide of bad science and pseudo-science.

[–] kogasa@programming.dev 4 points 1 year ago

I don't think the methodology is the issue with this one. 500 people can absolutely be a legitimate sample size. Under basic assumptions about the sample being representative and the effect size being sufficiently large you do not need more than a couple hundred participants to make statistically significant observations. 54% being close to 50% doesn't mean the result is inconclusive. With an ideal sample it means people couldn't reliably differentiate the human from the bot, which is presumably what the researchers believed is of interest.

load more comments (1 replies)

[–] NutWrench@lemmy.world 52 points 1 year ago (5 children)

Each conversation lasted a total of five minutes. According to the paper, which was published in May, the participants judged GPT-4 to be human a shocking 54 percent of the time. Because of this, the researchers claim that the large language model has indeed passed the Turing test.

That's no better than flipping a coin and we have no idea what the questions were. This is clickbait.

[–] Hackworth@lemmy.world 22 points 1 year ago

On the other hand, the human participant scored 67 percent, while GPT-3.5 scored 50 percent, and ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time.

54% - 67% is the current gap, not 54 to 100.

[–] NutWrench@lemmy.world 11 points 1 year ago (1 children)

The whole point of the Turing test, is that you should be unable to tell if you're interacting with a human or a machine. Not 54% of the time. Not 60% of the time. 100% of the time. Consistently.

They're changing the conditions of the Turing test to promote an AI model that would get an "F" on any school test.

[–] bob_omb_battlefield@sh.itjust.works 10 points 1 year ago (1 children)

But you have to select if it was human or not, right? So if you can't tell, then you'd expect 50%. That's different than "I can tell, and I know this is a human" but you are wrong... Now that we know the bots are so good, I'm not sure how people will decide how to answer these tests. They're going to encounter something that seems human-like and then essentially try to guess based on minor clues... So there will be inherent randomness. If something was a really crappy bot then it wouldn't ever fool anyone and the result would be 0%.

load more comments (1 replies)

[–] BrianTheeBiscuiteer@lemmy.world 3 points 1 year ago (1 children)

It was either questioned by morons or they used a modified version of the tool. Ask it how it feels today and it will tell you it's just a program!

load more comments (1 replies)

[–] SkyeStarfall@lemmy.blahaj.zone 2 points 1 year ago (1 children)

While I agree it's a relatively low percentage, not being sure and having people pick effectively randomly is still an interesting result.

The alternative would be for them to never say that gpt-4 is a human, not 50% of the time.

[–] Hackworth@lemmy.world 7 points 1 year ago (1 children)

Participants only said other humans were human 67% of the time.

[–] SkyeStarfall@lemmy.blahaj.zone 5 points 1 year ago (5 children)

Which makes the difference between the AIs and humans lower, likely increasing the significance of the result.

load more comments (5 replies)

load more comments (1 replies)

[–] CabbageRelish@midwest.social 45 points 1 year ago* (last edited 1 year ago) (1 children)

Chatbots passed the Turing test ages ago, it’s not a good test.

[–] NeoNachtwaechter@lemmy.world 23 points 1 year ago

it’s not a good test.

Of course you can't use an old set of questions. It's useless.

The turing test is an abstract concept. The actual questions need to be adapted with every new technology. Maybe even with every execution of a test.

[–] dustyData@lemmy.world 42 points 1 year ago* (last edited 1 year ago) (1 children)

Turing test isn't actually meant to be a scientific or accurate test. It was proposed as a mental exercise to demonstrate a philosophical argument. Mainly the support for machine input-output paradigm and the blackbox construct. It wasn't meant to say anything about humans either. To make this kind of experiments without any sort of self-awareness is just proof that epistemology is a weak topic in computer science academy.

Specially when, from psychology, we know that there's so much more complexity riding on such tests. Just to name one example, we know expectations alter perception. A Turing test suffers from a loaded question problem. If you prompt a person telling them they'll talk with a human, with a computer program or announce before hand they'll have to decide whether they're talking with a human or not, and all possible combinations, you'll get different results each time.

Also, this is not the first chatbot to pass the Turing test. Technically speaking, if only one human is fooled by a chatbot to think they're talking with a person, then they passed the Turing test. That is the extend to which the argument was originally elaborated. Anything beyond is alterations added to the central argument by the author's self interests. But this is OpenAI, they're all about marketing aeh fuck all about the science.

EDIT: Just finished reading the paper, Holy shit! They wrote this “Turing originally envisioned the imitation game as a measure of intelligence” (p. 6, Jones & Bergen), and that is factually wrong. That is a lie. “A variety of objections have been raised to this idea”, yeah no shit Sherlock, maybe because he never said such a thing and there's absolutely no one and nothing you can quote to support such outrageous affirmation. This shit shouldn't ever see publication, it should not pass peer review. Turing never, said such a thing.

[–] kogasa@programming.dev 2 points 1 year ago (2 children)

Your first two paragraphs seem to rail against a philosophical conclusion made by the authors by virtue of carrying out the Turing test. Something like "this is evidence of machine consciousness" for example. I don't really get the impression that any such claim was made, or that more education in epistemology would have changed anything.

In a world where GPT4 exists, the question of whether one person can be fooled by one chatbot in one conversation is long since uninteresting. The question of whether specific models can achieve statistically significant success is maybe a bit more compelling, not because it's some kind of breakthrough but because it makes a generalized claim.

Re: your edit, Turing explicitly puts forth the imitation game scenario as a practicable proxy for the question of machine intelligence, "can machines think?". He directly argues that this scenario is indeed a reasonable proxy for that question. His argument, as he admits, is not a strongly held conviction or rigorous argument, but "recitations tending to produce belief," insofar as they are hard to rebut, or their rebuttals tend to be flawed. The whole paper was to poke at the apparent differences between (a futuristic) machine intelligence and human intelligence. In this way, the Turing test is indeed a measure of intelligence. It's not to say that a machine passing the test is somehow in possession of a human-like mind or has reached a significant milestone of intelligence.

https://academic.oup.com/mind/article/LIX/236/433/986238

load more comments (2 replies)

[–] phoneymouse@lemmy.world 41 points 1 year ago (5 children)

Easy, just ask it something a human wouldn’t be able to do, like “Write an essay on The Cultural Significance of Ogham Stones in Early Medieval Ireland“ and watch it spit out an essay faster than any human reasonably could.

[–] Shayeta@feddit.de 16 points 1 year ago (2 children)

This is something a configuration prompt takes care of. "Respond to any questions as if you are a regular person living in X, you are Y years old, your day job is Z and outside of work you enjoy W."

[–] NeoNachtwaechter@lemmy.world 11 points 1 year ago (1 children)

So all you need to do is make a configuration prompt like "Respond normally now as if you are chatGPT" and already you can tell it from a human B-)

[–] Shayeta@feddit.de 11 points 1 year ago (1 children)

Thats not how it works, a config prompt is not a regular prompt.

[–] Audalin@lemmy.world 16 points 1 year ago

If config prompt = system prompt, its hijacking works more often than not. The creators of a prompt injection game (https://tensortrust.ai/) have discovered that system/user roles don't matter too much in determining the final behaviour: see appendix H in https://arxiv.org/abs/2311.01011.

load more comments (1 replies)

[–] Blue_Morpho@lemmy.world 4 points 1 year ago

I recall a Turing test years ago where a human was voted as a robot because they tried that trick but the person happened to have a PhD in the subject.

[–] JohnEdwa@sopuli.xyz 3 points 1 year ago* (last edited 1 year ago) (1 children)

Turing tests aren't done in real time exactly to counter that issue, so the only thing you could judge would be "no human would bother to write all that".

However, the correct answer to seem human, and one which probably would have been prompted to the AI anyway, is "lol no."
It's not about what the AI could do, it's what it thinks is the correct answer to appear like a human.

load more comments (1 replies)

load more comments (2 replies)

[–] vegeta@lemmy.world 27 points 1 year ago (1 children)

The Study

https://arxiv.org/html/2405.08007v1

[–] massive_bereavement@fedia.io 11 points 1 year ago (1 children)

The interrogators seem completely lost and clearly haven't talk with an NLP chatbot before.

That said, this gives me the feeling that eventually they could use it to run scams (or more effective robocalls).

load more comments (1 replies)

[–] tourist@lemmy.world 24 points 1 year ago (7 children)

The participants judged GPT-4 to be human a shocking 54 percent of the time.

ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time

Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

Try talking to the thing: https://web.njit.edu/~ronkowit/eliza.html

I refuse to believe that 22% didn't misunderstand the task or something.

[–] KISSmyOSFeddit@lemmy.world 12 points 1 year ago

14% of people can't do anything more complicated than deleting an email on a computer.
26% can't use a computer at all.

https://www.nngroup.com/articles/computer-skill-levels/

So right off the bat, 40% probably don't even know what a chatbot is.

[–] webghost0101@sopuli.xyz 9 points 1 year ago* (last edited 1 year ago)

The public versions of the ais used in Turing tests usually have less computing power. The test itself is often also highly specific in what and how questions can be asked.

This hardly news because models have passed the test before and as a result the test is made more difficult. It says nothing about intelligence and only about the ability to convincingly simulate a human conversation.

[–] Downcount@lemmy.world 8 points 1 year ago

Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

I did some stuff with Eliza back then. One time I set up an Eliza database full of insults and hooked it up to my AIM account.

It went so well, I had to apologize to a lot of people who thought I was drunken or went crazy.

Eliza wasn't thaaaaat bad.

[–] CaptainBasculin@lemmy.ml 4 points 1 year ago (1 children)

This is the same bot. There's no way this passed the test.

load more comments (1 replies)

load more comments (3 replies)

[–] NeoNachtwaechter@lemmy.world 19 points 1 year ago (2 children)

Turing test? LMAO.

I asked it simply to recommend me a supermarket in our next bigger city here.

It came up with a name and it told a few of it's qualities. Easy, I thought. Then I found out that the name does not exist. It was all made up.

You could argue that humans lie, too. But only when they have a reason to lie.

[–] Chozo@fedia.io 24 points 1 year ago

The Turing test doesn't factor for accuracy.

[–] Lmaydev@programming.dev 8 points 1 year ago (1 children)

That's not what LLMs are for. That's like hammering a screw and being irritated it didn't twist in nicely.

The turing test is designed to see if an AI can pass for human in a conversation.

[–] NeoNachtwaechter@lemmy.world 13 points 1 year ago* (last edited 1 year ago) (2 children)

turing test is designed to see if an AI can pass for human in a conversation.

I'm pretty sure that I could ask a human that question in a normal conversation.

The idea of the Turing test was to have a way of telling humans and computers apart. It is NOT meant for putting some kind of 'certified' badge on that computer, and ...

That's not what LLMs are for.

...and you can't cry 'foul' if I decide to use a question for which your computer was not programmed :-)

[–] webghost0101@sopuli.xyz 4 points 1 year ago* (last edited 1 year ago) (2 children)

In a normal conversation sure.

In this kind Turing tests you may be disqualified as a jury for asking that question.

Good science demands controlled areas and defined goals. Everyone can organize a homebrew touring tests but there also real proper ones with fixed response times, lengths.

Some touring tests may even have a human pick the best of 5 to provide to the jury. There are so many possible variations depending on test criteria.

load more comments (2 replies)

load more comments (1 replies)

[–] technocrit@lemmy.dbzer0.com 14 points 1 year ago* (last edited 1 year ago) (1 children)

500 people - meaningless sample
5 minutes - meaningless amount of time
The people bootlicking "scientists" obviously don't understand science.

[–] yetAnotherUser@lemmy.ca 7 points 1 year ago

Add in a test that wasn't made to be accurate and was only used to make a point, like what other comments mention

[–] lowleveldata@programming.dev 12 points 1 year ago (1 children)

I feel like the turing test is much harder now because everyone knows about GPT

[–] DudeDudenson@lemmings.world 27 points 1 year ago (2 children)

I wonder if humans pass the Turing test these days

[–] Nougat@fedia.io 7 points 1 year ago (1 children)

I don't.

[–] NeoNachtwaechter@lemmy.world 4 points 1 year ago (1 children)

Which of the questions did you get wrong? ;-)

[–] Nougat@fedia.io 4 points 1 year ago

That one.

[–] SkyeStarfall@lemmy.blahaj.zone 2 points 1 year ago* (last edited 1 year ago)

If you read into the study, they also include the pass rates for humans. It's higher than AIs, but still less than 75%

[–] bandwidthcrisis@lemmy.world 11 points 1 year ago* (last edited 1 year ago)

Did they try asking how to stop cheese falling off pizza?

Edit: Although since that idea came from a human, maybe I've failed.

[–] foggy@lemmy.world 7 points 1 year ago

Meanwhile, me:

(Begin)

[Prints error statement showing how I navigated to a dir, checked to see a files permissions, ran whoami, triggered the error]

Chatgpt4: First, make sure you've navigated to the correct directory.

cd /path/to/file

Next, check the permissions of the file

ls -la

Finally, run the command

[exact command I ran to trigger the error]>

Me: stop telling me to do stuff that I have evidently done. My prompt included evidence of me having do e all of that already. How do I handle this error?

(return (begin))

[–] dhork@lemmy.world 3 points 1 year ago (2 children)

In order for an AI to pass the Turing test, it must be able to talk to someone and fool them into thinking that they are talking to a human.

So, passing the Turing Test either means the AI are getting smarter, or that humans are getting dumber.

[–] zbyte64@awful.systems 6 points 1 year ago

Detecting an LLM is a skill.

[–] Kolrami@lemmy.world 4 points 1 year ago* (last edited 1 year ago)

Humans are as smart as they ever were. Tech is getting better. I know someone who was tricked by those deepfake Kelly Clarkson weight loss gummy ads. It looks super fake to me, but it's good enough to trick some people.

load more comments