vrighter

joined 2 years ago
[–] vrighter@discuss.tchncs.de 1 points 3 months ago* (last edited 3 months ago) (12 children)

the probabilities are also fixed after training. You seem to be conflating running the llm with different input to the model somehow adapting. The new context goes into the same fixed model. And yes, it can be reduced to fixed transition logic, you just need to have all possible token combinations in the table. This is obviously intractable due to space issues, so we came up with a lossy compression scheme for it. The table itself is learned once, then it's fixed. The training goes into generating a huge markov chain. Just because the table is learned from data, doesn't change what it actually is.

[–] vrighter@discuss.tchncs.de 1 points 3 months ago (14 children)

an llm works the same way! Once it's trained,none of what you said applies anymore. The same model can respond differently with the same inputs specifically because after the llm does its job, sometimes we intentionally don't pick the most likely token, but choose a different one instead. RANDOMLY. Set the temperature to 0 and it will always reply with the same answer. And llms also have a fixed order state transition. Just because you only typed one word doesn't mean that that token is not preceded by n-1 null tokens. The llm always receives the same number of tokens. It cannot work with an arbitrary number of tokens.

all relevant information "remains in the prompt" only until it slides out of the context window, just like any markov chain.

[–] vrighter@discuss.tchncs.de 1 points 3 months ago (16 children)

an llm also works on fixed transition probabilities. All the training is done during the generation of the weights, which are the compressed state transition table. After that, it's just a regular old markov chain. I don't know why you seem so fixated on getting different output if you provide different input (as I said, each token generated is a separate independent invocation of the llm with a different input). That is true of most computer programs.

It's just an implementation detail. The markov chains we are used to has a very short context, due to combinatorial explosion when generating the state transition table. With llms, we can use a much much longer context. Put that context in, it runs through the completely immutable model, and out comes a probability distribution. Any calculations done during the calculation of this probability distribution is then discarded, the chosen token added to the context, and the program is run again with zero prior knowledge of any reasoning about the token it just generated. It's a seperate execution with absolutely nothing shared between them, so there can't be any "adapting" going on

[–] vrighter@discuss.tchncs.de 2 points 3 months ago* (last edited 3 months ago) (18 children)

their input is the context window. Markov chains also use their whole context window. Llms are a novel implementation that can work with much longer contexts, but as soon as something slides out of its window, it's forgotten. just like any other markov chain. They don't adapt. You add their token to the context, slide the oldest one out and then you have a different context, on which you run the same thing again. A normal markov chain will also give you a different outuut if you give it a different context. Their biggest weakness is that they don't and can't adapt. You are confusing the encoding of the context with the model itself. Just to see how static the model is, try setting temperature to 0, and giving it the same context. i.e. only try to predict one token with the exact same context each time. As soon as you try to predict a 2nd token, you've just changed the input and ran the thing again. It's not adapting, you asked it something different, so it came up with a different answer

[–] vrighter@discuss.tchncs.de 2 points 3 months ago (20 children)

previous input goes in. Completely static, prebuilt model processes it and comes up with a probability distribution.

There is no "unlike markov chains". They are markov chains. Ones with a long context (a markov chain also kakes use of all the context provided to it, so I don't know what you're on about there). LLMs are just a (very) lossy compression scheme for the state transition table. Computed once, applied blindly to any context fed in.

[–] vrighter@discuss.tchncs.de 2 points 3 months ago* (last edited 3 months ago) (1 children)

where did i say it's less secure? I said it will be coded around. as in forked and the changes patched out/worked around. The point is that it's pointless to even try. Because it won't work for those who do choose to use it, due to all the ones bypassing it

[–] vrighter@discuss.tchncs.de 5 points 3 months ago (4 children)

if it's linux, it has to be open source. If it's open source, people will code around it immediately. How about not trying to shoehorn this useless crap in the first place?

[–] vrighter@discuss.tchncs.de 5 points 3 months ago

the summary (not ecessarily ai generated) I read elsewhere is what got me to wikipedia in the first place.

[–] vrighter@discuss.tchncs.de 16 points 3 months ago (3 children)

it also would have very publically been a huge failure. Tesla tended to ignore the science when he didn't like it. It could not have possibly worked

[–] vrighter@discuss.tchncs.de 6 points 3 months ago (1 children)

eh, it's close enough. I only know up to 3.141592653589793238462643383279502884197169399375 by heart.

[–] vrighter@discuss.tchncs.de 7 points 3 months ago (8 children)
[–] vrighter@discuss.tchncs.de 4 points 3 months ago (1 children)

if only they made the thumbnails just a little bit bigger. My monitor still fits 9 of them, tsk tsk

view more: ‹ prev next ›