Technology

83150 readers

3638 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

How Quickly Do Large Language Models Learn Unexpected Skills? (nautil.us)

submitted 2 years ago by dominiquec@lemmy.world to c/technology@lemmy.world

19 comments fedilink hide all child comments

So-called "emergent" behavior in LLMs may not be the breakthrough that researchers think.

you are viewing a single comment's thread
view the rest of the comments

[–] kromem@lemmy.world 6 points 2 years ago

I'm not sure why they are describing it as "a new paper" - this came out in May of 2023 (and as such notably only used GPT-3 and not GPT-4, which was where some of the biggest leaps to date have been documented).

For those interested in the debate on this, the rebuttal by Jason Wei (from the original emergent abilities paper and also the guy behind CoT prompting paper) is interesting: https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities

In particular, I find his argument at the end compelling:

Another popular example of emergence which also underscores qualitative changes in the model is chain-of-thought prompting, for which performance is worse than answering directly for small models, but much better than answering directly for large models. Intuitively, this is because small models can’t produce extended chains of reasoning and end up confusing themselves, while larger models can reason in a more-reliable fashion.

If you follow the evolution of prompting in research lately, there's definitely a pattern of reliance on increased inherent capabilities.

Whether that's using analogy to solve similar problems (https://openreview.net/forum?id=AgDICX1h50) or self-determining the optimal strategy for a given problem (https://arxiv.org/abs/2402.03620), there's double digit performance gains in state of the art models by having them perform actions that less sophisticated models simply cannot achieve.

The compounding effects of competence alone mean that progress here isn't going to be a linear trajectory.