this post was submitted on 28 Mar 2026
220 points (90.4% liked)
Technology
83185 readers
3400 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
They dont lol
Pretty much always this is just the fact cheaper, especially free, chatbots, have very limited context windows.
Which means the initial restrictions you set like "dont do this, dont touch that" etc get dropped, the LLM no longer has them loaded. But it does have in the past history the very clear and urgent directives of it trying to do this task, its important, so it'll do whatever it autocompletes its gotta do to accomplish the task. And then... fucks something up.
When you react to their fuck up, it *reloads the context back in
So now the LLM has in its history just this:
So now the LLM is going to autocomplete its generated text on top being very apologetic and going on about how it'll never happen again.
Thats all there is to it.
Not really. Even with (theoretical) infinite context windows, things would end up getting diluted. It's a statistic machine; no matter how complex we make them look. Even with all the safeguards in place, as these grows larger and larger, each "directive" would end up being less represented in the next token.
People can keep trying to hammer with a screwdriver all they want and keep being impressed when the bent nail is almost flush, though. I'm just enjoying the show from the side at this point.
Very true, though theres a certain threshold you can get past where the context, at least, is usable in size where the machine can at least hold enough data at once for common tasks.
One of the pieces of tech we are really missing atm is an automation of being able to filter info.
Specifically, for the LLM to be able to "release" info as it goes asap as unimportant and forget it, or at least it gets stored into some form of long term storage it can use a tool to look up.
But for a given convo the LLM can do a lot of reasoning but all that reasoning takes up context.
Itd be nice if after it reasons, it then can discard a bunch of the data from that and only keep what matters.
This eould tremendously lower context pressure and allow the LLM to last way longer memory wise
I think tooling needs to approach how we manage LLM context in a very different way to make further advancement.
LLMs have to be trained to have different types of output, that control if they'll actually remember it or not.
It's not just cheap agents. I've witnessed paid MS Copilot give a decade old depreciated Microsoft product in response to a single sentence prompt, then when called out a non-existent Microsoft product, then finally giving the right answer after being called out a second time.
LLMs are not good at answering fact based questions, fundamentally. Unless its an incredibly well known answer that has never changed (like a math or physics question), they dont magically "know" things.
However, they're way better at summarizing and reasoning.
Give them access to playwright web search capability via MCP tooling to go research info, find the answer(s), and then produce output based on the results, and now you can get something useful.
"Whats the best way to do (task)" << prone to failure, functional of how esoteric it is.
"Research for me the top 3 best ways to do (task), report on your results and include your sources you found" << actually useful output, assuming you have something like playwright installed for it.
A user on here built what appears to be a layer over the LLM that runs the query through several other processes first in an attempt to answer the question before it gets to the LLM, and I think it's brilliant.
They get bonus points because they made it so the reasoning the LLM uses is given to you. Although I haven't fully gone through the documentation yet.
Cheap fuckers cheaping out, shocker (context is (V)RAM). AI speedrunning enshittification, who'd of thunk.
Uh... no its just the free models being free, theyre lower cost intentionally to provide free options for people who dont wanna pay subscription fees.
Eh sort of, its more operating costs, the larger the context size the more expensive the model is to run, literally in terms of power consumption.
Keep in mind we are on the scale of fractions of cents here, but multiply that by millions of users and it adds up fast.
But the end result is that the agent will fuck stuff up, and will even quickly /forget/ it fucked that up if you dont catch it asap
A lot of them have a context window that can be wiped out within like, 2 minutes of steady busywork...
I love how your response to the catastrophic results of stupidly trusting ai is "pay more money to ai companies".
Sane person's response: don't trust llms.
What are you talking about.
No? I never said that.
I just explained /why/ it happened, I literally nowhere in my post said, or implied, someone should pay for more expensive models. What are you smoking?
You just have to be aware they have very short memory when using a cheap model and assume anything you wrote 1 minute ago has already left its memory, which is why they produce pretty dumb output if you try and depend on that... so... dont depend on that.
Everyone else who has any sense: llms are shit and you shouldn't trust them with executive power.
You: just the cheap ones.
Me: no, all of them. What kind of lunatic trusts control of anything important to a fundamentally stochastic process?
I never said that. I just said that the cheap ones are especially shitty.
People on this site really lack reading comprehension it seems.