this post was submitted on 28 Mar 2026

233 points (89.2% liked)

Technology

84769 readers

3596 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

233

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission (www.longtermresilience.org)

submitted 1 month ago* (last edited 1 month ago) by Beep@lemmus.org to c/technology@lemmy.world

41 comments fedilink hide all child comments

Full Report(76 Pages PDF).

you are viewing a single comment's thread
view the rest of the comments

[–] pixxelkick@lemmy.world 53 points 1 month ago* (last edited 1 month ago) (3 children)

They dont lol

Pretty much always this is just the fact cheaper, especially free, chatbots, have very limited context windows.

Which means the initial restrictions you set like "dont do this, dont touch that" etc get dropped, the LLM no longer has them loaded. But it does have in the past history the very clear and urgent directives of it trying to do this task, its important, so it'll do whatever it autocompletes its gotta do to accomplish the task. And then... fucks something up.

When you react to their fuck up, it *reloads the context back in

So now the LLM has in its history just this:

It doing a thing against the rules
The user yelling at it
The users now getting loaded after that on top

So now the LLM is going to autocomplete its generated text on top being very apologetic and going on about how it'll never happen again.

Thats all there is to it.

[–] MalReynolds@slrpnk.net 4 points 1 month ago (1 children)

Cheap fuckers cheaping out, shocker (context is (V)RAM). AI speedrunning enshittification, who'd of thunk.

[–] pixxelkick@lemmy.world 3 points 1 month ago (1 children)

Uh... no its just the free models being free, theyre lower cost intentionally to provide free options for people who dont wanna pay subscription fees.

(context is (V)RAM)

Eh sort of, its more operating costs, the larger the context size the more expensive the model is to run, literally in terms of power consumption.

Keep in mind we are on the scale of fractions of cents here, but multiply that by millions of users and it adds up fast.

But the end result is that the agent will fuck stuff up, and will even quickly /forget/ it fucked that up if you dont catch it asap

A lot of them have a context window that can be wiped out within like, 2 minutes of steady busywork...

[–] davidagain@lemmy.world 0 points 1 month ago (1 children)

I love how your response to the catastrophic results of stupidly trusting ai is "pay more money to ai companies".

Sane person's response: don't trust llms.

[–] pixxelkick@lemmy.world 0 points 1 month ago (1 children)

What are you talking about.

No? I never said that.

I just explained /why/ it happened, I literally nowhere in my post said, or implied, someone should pay for more expensive models. What are you smoking?

You just have to be aware they have very short memory when using a cheap model and assume anything you wrote 1 minute ago has already left its memory, which is why they produce pretty dumb output if you try and depend on that... so... dont depend on that.

[–] davidagain@lemmy.world 1 points 1 month ago (1 children)

Everyone else who has any sense: llms are shit and you shouldn't trust them with executive power.

You: just the cheap ones.

Me: no, all of them. What kind of lunatic trusts control of anything important to a fundamentally stochastic process?

[–] pixxelkick@lemmy.world 1 points 1 month ago (1 children)

You: just the cheap ones

I never said that. I just said that the cheap ones are especially shitty.

People on this site really lack reading comprehension it seems.

[–] davidagain@lemmy.world 1 points 1 month ago (1 children)

no its just the free models...

You just have to be aware... when using a cheap model

You: just the cheap ones

I never said that.

Ohhhhhhhhh ok yes of course you never said or implied that. Not your repeated message at all. And yet you can't keep away from adressing your criticism towards free or cheap LLMs! It's like your subtext or your underlying belief is that of you just pay big tech enough money and they can just build a big enough set of server farms, it'll be ok. No, it will not be ok and the enshittification has begun from an already shitty base point.

All LLMs are shit, the cheap and free ones are indeed just easier to spot as generating shit, if you ask them about things you know about. But you have to accept that they're ALL shit and STOP making get out clauses for the expensive ones by firing your criticisms exclusively at the cheap or free ones.

Giving ANY LLM executive power over your data is A BIG MISTAKE because you're putting your data in the control of something which operates, at its heart, as a random number generator. They're trained to sound right. People trust them because they sound right. This is a fundamental error.

[–] pixxelkick@lemmy.world 1 points 1 month ago (1 children)

The only people who have these issues, are people who are using the tools wrong or poorly.

Using these models in a modern tooling context is perfectly reasonable, going beyond just guard rails and instead outright only giving them explicit access to approved operations in a proper sandbox.

Unfortunately that takes effort and know-how, skill, and understanding how these tools work.

And unfortunately a lot of people are lazy and stupid, and take the "easy" way out and then (deservedly) get burned for it.

But I would say, yes, there are safe ways yo grant an llm "access" to data in a way where it does not even have the ability to muck it up.

My typical approach is keeping it sandbox'd inside a docker environment, where even if it goes off the rails and deletes something important, the worst it can do is cause its docker instance to crash.

And then setting up via MCP tooling that commands and actions it can prefer are explicit opt in whitelist. It can only run commands I give it access to.

Example: I grant my LLMs access to git commit and status, but not rebase or checkout.

Thus it can only commit stuff forward, but it cant even change branches, rebase, nor push either.

This isnt hard imo, but too many people just yolo it and raw dawg an LLM on their machine like a fuckin idiot.

These people are playing with fire imo.

[–] davidagain@lemmy.world 1 points 1 month ago* (last edited 1 month ago) (1 children)

You'll be the 4753rd guy with the oops my llm trashed my setup and disobeyed my explicit rules for keeping it in check.

You know programmers who use llms believe they're much more productive because they keep getting that dopamine hit, but when you actually measure it, they're slower by about 20%.

You appointed yourself boss over a fast and plausible intern who pastes and edits a LOT of stack overflow code, but never really understands it and absolutely is incapable of learning. You either spend almost all of your time in code review now for your stupid sycophantic llm interns who always tell you you're right but never learn from you, or you're checking in vast quantities of shit to your projects.

You know really subtle, hard to find bugs on rare cases that pass your CI every single time? Or ones that no one in their right mind would have made, but yet they compile and look right at first glance. They're now your main type of bug. You are rotting your projects with your random number generator.

And you think that all the money you're playing for your blagging llms protects you from them fucking up everything for you. But it doesn't. And you'll also find that your contract with your llm supplier expressly excludes them from any liability whatsoever arising from you using it instead pre-blaming you for trusting it.

[–] pixxelkick@lemmy.world 1 points 1 month ago (1 children)

You’ll be the 4753rd guy with the oops my llm trashed my setup and disobeyed my explicit rules for keeping it in check

Read what I wrote.

Its not a matter of "rules" it "obeys"

Its a matter of literally not it even having access to do such things.

This is what Im talking about. People are complaining about issues that were solved a long time ago.

People are running into issues that were solved long ago because they are too lazy to use the solutions to those issues.

We now live in a world with plenty of PPE in construction and people are out here raw dogging tools without any modern protection and being ShockedPikachuFace when it fails.

The approach of "Im gonna tell the LLM not to do stuff in a markdown file" is tech from like 2 years ago.

People still do that. Stupid people who deserve to have it blow up in their face.

Use proper tools. Use MCP. Use a sandbox environment. Use whitelist opt in tooling.

Agents shouldn't even have the ability to do damaging actions in the first place.

[–] davidagain@lemmy.world 1 points 1 month ago (1 children)

Ah yes, lovely mcp. Lovely anthropic mcp. Make sure you give anthropic lots of money and use their tools and then you'll be completely safe plugging the output of the llm into the os. Definitely fine yes.

I bet you your contract with them says they're not liable for shit their llm does to your files, your environment or your repositories, mcp or no mcp.

Fool.

[–] pixxelkick@lemmy.world 1 points 1 month ago (1 children)

Lovely anthropic mcp. Make sure you give anthropic lots of money and use their tools

Its becoming clear you have no clue wtf you are talking about.

Model Context Protocol is a protocol, like http or json or etc.

Its just a format for data, that is open sourced and anyone can use. Models are trained to be able to invoke MCP tools to perform actions, and anyone can just make their own MCP tools, its incredibly simple and easy. I have a pretty powerful one I personally maintain myself.

Anthropic doesnt make any money off me, in fact, I dont use any of their shit, except maybe whatever licensing fees microsoft pays to them to use Claude Sonnet, but microsoft copilot is my preferred service I use overall.

I bet you your contract with them says they’re not liable for shit their llm does to your files

Setting aside the fact that I dont even use anthropic's tools, my copilot LLMs dont have access to my files either. Full stop.

The only context in which they do have access to files is inside of the aforementioned docker based sandbox I run them inside of, which is an ephemeral immutable system that they can do whatever the fuck they want inside of because even if they manage to delete /var/lib or whatever, I click 1 button to reboot and reset it back to working state.

The working workspace directory they have access to has readonly git access, so they can pull and do work, but they literally dont even have the ability to push. All they can do is pull in the stuff to work on and work on it

After they finish, I review what changes they made and only I, the human, have the ability to accept what they have done, or deny it, and then actually push it myself.

This is all basic shit using tools that have existed for a long time, some of which are core principles of linux and have existed for decades

Doing this isnt that hard, its just that a lot of people are:

Stupid
Lazy
Scared of linux

The concept of "make a docker image that runs an "agent" user in a very low privilege env with write access only to its home directory" isnt even that hard.

It took me all of 2 days to get it setup personally, from scratch.

But now my sandbox literally doesnt even expose the ability to do damage to the llm, it doesnt even have access to those commands

Let me make this abundantly clear if you cant wrap your head around it:

LLM Agents, that I run, dont even have the executable commands exposed to them to invoke that can cause any damage, they literally dont even have the ability to do it, full stop

And it wasnt even that hard to do

[–] davidagain@lemmy.world 1 points 1 month ago (1 children)

Congratulations on responding to the first paragraph of his post. https://lemmy.world/post/44873477/23080810 (The one that made you super cross. Sure nothing from your sandbox ever makes it into production. Great. Very wise and very careful.)

No congratulations on responding to any of the rest of what I said.

[–] pixxelkick@lemmy.world 1 points 1 month ago (1 children)

You know programmers who use llms believe they’re much more productive because they keep getting that dopamine hit, but when you actually measure it, they’re slower by about 20%.

Everyone keeps citing this preliminary study and ignores:

Its old now
Its sample size was incredibly tiny
Its sample group were developers not using proper tooling or trained on how to use the tools

Its the equivalent of taking 12 seasoned carpenters with very little experience on industrial painting, handing them industrial grade paint guns that are misconfigured and uncalibrated, and then asking them to paint some of their work and watching them struggle... and then going "wow look at that industrial grade paint guns are so bad"

Anyone with any sense should look at that and go "thats a bogus study"

But people with intense anti-ai bias cling to that shoddy ass study with such religious fervor. Its cringe.

Every professional developer with actual training and actual proper tooling can confirm that they are indeed tremendously more productive.

[–] davidagain@lemmy.world 1 points 1 month ago (1 children)

Every professional developer with actual training and actual proper tooling can confirm that they ~~are~~ feel indeed tremendously more productive.

ftfy

[–] pixxelkick@lemmy.world 1 points 1 month ago

The difference, when the tool is used correctly, is so massive that only someone deeply uninformed or naive would contend it.

I got about 4 entire days worth of work completed in about 5 hours yesterday at my job, thats just objective fact.

Tasks that used to take weeks now take days, and tasks that used to take days now take hours. Theres no "feeling" about this, Ive been a software developer for approaching 17 years now professionally. I know how long it takes to produce an entire gambit of integration tests for a given feature. I spend almost all of my time now reviewing mountains of code (which is fairly good quality, the machines produce fairly accurate results), and then a small amount of time refining it.

People deeply do not at all understand how dramatically the results have changed over the past 2 years, and their biases are based on how things were 2 years ago.

Sure, 2 years ago the quality was way worse, the security was bad, the enforcement almost non existent, and peoples overall skill with how to use the tools was just beginning to grow. You cant exactly be good at using a tool that only just came out.

But its been two years of very rapid improvement. Its good now. Anyone who has been using these tools and actually monitoring progression can speak to this.

Things heavily shifted about 5 months ago when competition started to really fire up between different providers, and I wont say its even close to great yet, but its definitely good, it works, its fast, and it's pretty damn good at what I need it to do.

[–] cley_faye@lemmy.world 3 points 1 month ago (1 children)

Thats all there is to it.

Not really. Even with (theoretical) infinite context windows, things would end up getting diluted. It's a statistic machine; no matter how complex we make them look. Even with all the safeguards in place, as these grows larger and larger, each "directive" would end up being less represented in the next token.

People can keep trying to hammer with a screwdriver all they want and keep being impressed when the bent nail is almost flush, though. I'm just enjoying the show from the side at this point.

[–] pixxelkick@lemmy.world 1 points 1 month ago

Very true, though theres a certain threshold you can get past where the context, at least, is usable in size where the machine can at least hold enough data at once for common tasks.

One of the pieces of tech we are really missing atm is an automation of being able to filter info.

Specifically, for the LLM to be able to "release" info as it goes asap as unimportant and forget it, or at least it gets stored into some form of long term storage it can use a tool to look up.

But for a given convo the LLM can do a lot of reasoning but all that reasoning takes up context.

Itd be nice if after it reasons, it then can discard a bunch of the data from that and only keep what matters.

This eould tremendously lower context pressure and allow the LLM to last way longer memory wise

I think tooling needs to approach how we manage LLM context in a very different way to make further advancement.

LLMs have to be trained to have different types of output, that control if they'll actually remember it or not.

[–] village604@adultswim.fan 2 points 1 month ago (1 children)

It's not just cheap agents. I've witnessed paid MS Copilot give a decade old depreciated Microsoft product in response to a single sentence prompt, then when called out a non-existent Microsoft product, then finally giving the right answer after being called out a second time.

[–] pixxelkick@lemmy.world 2 points 1 month ago (1 children)

LLMs are not good at answering fact based questions, fundamentally. Unless its an incredibly well known answer that has never changed (like a math or physics question), they dont magically "know" things.

However, they're way better at summarizing and reasoning.

Give them access to playwright web search capability via MCP tooling to go research info, find the answer(s), and then produce output based on the results, and now you can get something useful.

"Whats the best way to do (task)" << prone to failure, functional of how esoteric it is.

"Research for me the top 3 best ways to do (task), report on your results and include your sources you found" << actually useful output, assuming you have something like playwright installed for it.

[–] village604@adultswim.fan 1 points 1 month ago* (last edited 1 month ago)

A user on here built what appears to be a layer over the LLM that runs the query through several other processes first in an attempt to answer the question before it gets to the LLM, and I think it's brilliant.

They get bonus points because they made it so the reasoning the LLM uses is given to you. Although I haven't fully gone through the documentation yet.