Technology

83027 readers

3628 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

Nvidia CEO Jensen Huang says ‘I think we’ve achieved AGI’ (www.theverge.com)

submitted 1 day ago by return2ozma@lemmy.world to c/technology@lemmy.world

57 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Technus@lemmy.zip 1 points 19 hours ago (1 children)

The size of the context window is fixed in the structure of the model. LLMs are still at their core artificial neural networks, so an analogy to biology might be helpful.

Think of the input layer of the model like the retinas in your eyes. Each token in the context window, after embedding (i.e. conversion to a series of numbers, because ofc it's just all math under the hood), is fed to a certain set of input neurons, just like the rods and cones in your retina capture light and convert it to electrical signals, which are passed to neurons in your optic nerve, which connect to neurons in your visual cortex, each layer along the way processing and analyzing the signal.

The number of tokens in the context window is directly proportional to the number of neurons in the input layer of the model. To make the context window bigger, you have to add more neurons to the input layer, but that quickly results in diminishing returns without adding more neurons to the inner layers to be able to process the extra information. Ultimately, you have to make the whole model larger, which means more parameters, which means more data to store and more processing power per prompt.

[–] scarabic@lemmy.world 1 points 18 hours ago

Oh… so it’s kind of like taking something that’s few-to-many and making it many-to-many, and the number of connections is what costs you.