Technology

76587 readers

4457 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

561

ChatGPT Is Still a Bullshit Machine (gizmodo.com)

submitted 2 months ago by chobeat@lemmy.ml to c/technology@lemmy.world

52 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Pechente@feddit.org 19 points 2 months ago (1 children)

Yeah right? I tried it yesterday to build a simple form for me. Told it to look at the structure of other forms for reference which it did and somehow it used NONE of the UI components and helpers from the other forms. It was bafflingly bad

[–] errer@lemmy.world 19 points 2 months ago (2 children)

Despite the “official” coding score for GPT5 being higher, Claude sonnet still seems to blow it out of the water. That seems to suggest they are training to the test and the test must not be a very good test. Or they are lying.

[–] elvith@feddit.org 22 points 2 months ago (1 children)

They'd never be lying! Look at these beautiful graphs from their presentation of GPT5. They'd never!

Source: https://www.theverge.com/news/756444/openai-gpt-5-vibe-graphing-chart-crime

[–] errer@lemmy.world 12 points 2 months ago (1 children)

Wut…did GPT5 evaluate itself?

[–] elvith@feddit.org 18 points 2 months ago* (last edited 2 months ago)

Now that we have vibe coding and all programmers have been sacked, theyr apparently trying out vibe presenting and vibe graphing. Management watch out, you're obviously next!

[–] jj4211@lemmy.world 2 points 2 months ago

Problem with the "benchmarks" is Goodhart's Law: one a measure becomes a target, it ceases to be a good measurement.

The AI companies obsession with these tests cause them to maniacly train on them, making then better at those tests, but that doesn't necessarily map to actual real world usefulness. Occasionally you'll see a guy that interviews well, but it's petty useless in general on the job. LLMs are basically those all the time, but at least useful because they are cheap and fast enough to be worth it for super easy bits.