Technology

59589 readers

3332 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

Challenges in Deciphering the Secrets of Large Language Models (archive.is)

submitted 8 months ago* (last edited 8 months ago) by Squire1039@lemm.ee to c/technology@lemmy.world

10 comments fedilink hide all child comments

The article discusses the mysterious nature of large language models and their remarkable capabilities, focusing on the challenges of understanding why they work. Researchers at OpenAI stumbled upon unexpected behavior while training language models, highlighting phenomena such as "grokking" and "double descent" that defy conventional statistical explanations. Despite rapid advancements, deep learning remains largely trial-and-error, lacking a comprehensive theoretical framework. The article emphasizes the importance of unraveling the mysteries behind these models, not only for improving AI technology but also for managing potential risks associated with their future development. Ultimately, understanding deep learning is portrayed as both a scientific puzzle and a critical endeavor for the advancement and safe implementation of artificial intelligence.

you are viewing a single comment's thread
view the rest of the comments

[–] Redacted@lemmy.world 5 points 8 months ago* (last edited 8 months ago) (2 children)

This article, along with others covering the topic, seem to foster an air of mystery about machine learning which I find quite offputting.

Known as generalization, this is one of the most fundamental ideas in machine learning—and its greatest puzzle. Models learn to do a task—spot faces, translate sentences, avoid pedestrians—by training with a specific set of examples. Yet they can generalize, learning to do that task with examples they have not seen before.

Sounds a lot like Category Theory to me which is all about abstracting rules as far as possible to form associations between concepts. This would explain other phenomena discussed in the article.

Like, why can they learn language? I think this is very mysterious.

Potentially because language structures can be encoded as categories. Any possible concept including the whole of mathematics can be encoded as relationships between objects in Category Theory. For more info see this excellent video.

He thinks there could be a hidden mathematical pattern in language that large language models somehow come to exploit: “Pure speculation but why not?”

Sound familiar?

models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on.

Maybe there is a threshold probability of a positied association being correct and after enough iterations, the model flipped it to "true".

I'd prefer articles to discuss the underlying workings, even if speculative like the above, rather than perpetuating the "It's magic, no one knows." narrative. Too many people (especially here on Lemmy it has to be said) pick that up and run with it rather than thinking critically about the topic and formulating their own hypotheses.

[–] orclev@lemmy.world 5 points 8 months ago

Yeah pretty much this. My understanding of the way LLMs function is that they operate on statistical associations of words which would amount to categories in Category Theory. Basically the training phase is classifying words into categories based on the examples in the training input. Then when you feed it a prompt it just uses those categories to parse and "solve" your prompt. It's not "mysterious" it's just opaque because it's an incredibly complicated model. Exactly the sort of thing that people are really bad at working with, but which computers are really good with.

[–] PipedLinkBot@feddit.rocks 2 points 8 months ago

Here is an alternative Piped link(s):

this excellent video

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.