Yeah that implies that the other network(s) can tell right from wrong. Which they can't. Because if they did the problem wouldn't need solving.
vrighter
the problem isn't being pro ai. It's people puling ai supposed ai capabilities out of their asses without having actually looked at a single line of code. This is obvious to anyone who has coded a neural network. Yes even to openai themselves, but if they let you believe that, then the money stops flowing. You simply can't get an 8-ball to give the correct answer consistently. Because it's fundamentally random.
yes it is, and it doesn't work.
edit: too expand, if you're generating data it's an estimation. The network will learn the same biases and make the same mistakes and assumtlptions you did when enerating the data. Also, outliers won't be in the set (because you didn't know about them, so the network never sees any)
also, what you described has already been studied. Training an llm its own output completely destroys it, not makes it better.
The outputs of the nn are sampled using a random process. Probability distribution is decided by the llm, loaded die comes after the llm. No, it's not solvable. Not with LLMs. not now, not ever.
no need for that subjective stuff. The objective explanation is very simple. The output of the llm is sampled using a random process. A loaded die with probabilities according to the llm's output. It's as simple as that. There is literally a random element that is both not part of the llm itself, yet required for its output to be of any use whatsoever.
why did it? because it's intrinsic to how it works. This is not a solvable problem.
I've met someone employed as a dev, who not only didn't know that the compiler generates an executable file, but actually spent a month trying to change the code, not noticing that 0 of their code changes were having any effect whatsoever (because they kept running an old build of mine)
it is?????????????????? Holy shit you made my day!
no, intel pay amd to use the 64 bit instruction set. amd pay intel to support the 32 bit instruction set.
it has the potential to revolutionize some optimization problems that are hard to solve classically. It's going to be practically useless for the average user.
here's that same conversation with a human:
"why is X?" "because y!" "you're wrong" "then why the hell did you ask me for if you already know the answer?"
What you're describing will train the network to get the wrong answer and then apologize better. It won't train it to get the right answer