this post was submitted on 29 Aug 2025
318 points (99.7% liked)

Not The Onion

17833 readers
1585 users here now

Welcome

We're not The Onion! Not affiliated with them in any way! Not operated by them in any way! All the news here is real!

The Rules

Posts must be:

  1. Links to news stories from...
  2. ...credible sources, with...
  3. ...their original headlines, that...
  4. ...would make people who see the headline think, “That has got to be a story from The Onion, America’s Finest News Source.”

Please also avoid duplicates.

Comments and post content must abide by the server rules for Lemmy.world and generally abstain from trollish, bigoted, or otherwise disruptive behavior that makes this community less fun for everyone.

And that’s basically it!

founded 2 years ago
MODERATORS
 

"The surveillance, theft and death machine recommends more surveillance to balance out the death."

top 38 comments
sorted by: hot top controversial new old
[–] Tollana1234567@lemmy.today 4 points 5 hours ago

they have been using PAlintir in law enforcement for a while, dont see people raging over thiel.

[–] TankovayaDiviziya@lemmy.world 2 points 6 hours ago (1 children)

Will people stop calling the capital and fascist protectors law enforcement? Calling them that makes it sound like they are honourable when they're not.

[–] FluffMongo@lemmy.world 1 points 44 minutes ago

Only if you consider the laws to not be capital and fascist protection

[–] Tollana1234567@lemmy.today 1 points 5 hours ago

it would be funny if they are getting mad over thier "AI GF/BF" getting reported to the police.

[–] chicken@lemmy.dbzer0.com 25 points 12 hours ago (1 children)

I kind of assumed it worked like this before anyway. Good reason to use local models.

[–] bytesonbike@discuss.online 8 points 12 hours ago (2 children)

Sadly, Local models arent there yet. I have tech nerds in my company spending $3-10k building their own systems and they're still not getting the speeds and quality that these subscriptions have.

[–] tal@lemmy.today 4 points 10 hours ago* (last edited 10 hours ago)

$3-10k...not getting the speeds and quality

I mean, that's true. But the hardware that OpenAI is using costs more than that per pop.

The big factor in the room is that unless the tech nerds you mention are using the hardware for something that requires keeping the hardware under constant load


which occasionally interacting with a chatbot isn't going to do


it's probably going to be cheaper to share the hardware with others, because it'll keep the (quite expensive) hardware at a higher utilization rate.

I'm also willing to believe that there is some potential for technical improvement. I haven't been closely following the field, but one thing that I'll bet is likely technically possible


if people aren't banging on it already


is redesigning how LLMs work such that they don't need to be fully loaded into VRAM at any one time.

Right now, the major limiting factor is the amount of VRAM available on consumer hardware. Models get fully loaded onto a card. That makes for nice, predictable computation times on a query, but it's the equivalent of...oh, having video games limited by needing to load an entire world onto the GPU's memory. I would bet that there are very substantial inefficiencies there.

The largest GPU you're going to get is something like 24GB, and some workloads can be split that across multiple cards to make use of VRAM on multiple cards.

You can partially mitigate that with something like a 128GB Ryzen AI Max 395+ processor-based system. But you're still not going to be able to stuff the largest models into even that.

My guess is that it is probably possible to segment sets of neural net edge weightings into "chunks" that have a likelihood to not concurrently be important, and then keep not-important chunks not loaded, and not run non-loaded chunks. One would need to have a mechanism to identify when they likely do become important, and swap chunks out. That will make query times less-predictable, but also probably a lot more memory-efficient.

IIRC from my brief skim, they do have specialized sub-neural-networks, which are called "MoE", for "Mixture of Experts". It might be possible to unload some of those, though one is going to need more logic to decide when to include and exclude them, and probably existing systems are not optimal for these:

kagis

Yeah, sounds like it:

https://arxiv.org/html/2502.05370v1

fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving

Despite the computational efficiency, MoE models exhibit substantial memory inefficiency during the serving phase. Though certain model parameters remain inactive during inference, they must still reside in GPU memory to allow for potential future activation. Expert offloading [54, 47, 16, 4] has emerged as a promising strategy to address this issue, which predicts inactive experts and transfers them to CPU memory while retaining only the necessary experts in GPU memory, reducing the overall model memory footprint.

[–] chicken@lemmy.dbzer0.com 4 points 10 hours ago (1 children)

Naturally the commercial systems are going to be strictly better, but the best models I can run on my 3090 have been good enough for me for a couple years now, and have massively improved over that time. Currently mostly use qwen3-coder which is really solid. It just feels so much nicer to use knowing it's private and not being datamined for who knows what,

[–] Techlos@lemmy.dbzer0.com 3 points 9 hours ago (1 children)

i'm running 2x 3060 (12gb cards, vram matters more than clockspeed usually) and if you have the patience to run them the 30b qwen models are honestly pretty decent; if you have the ability and data to fine-tune or LORA them to the task you're doing, it can sometimes exceed zero shot performance from SOTA subscription models.

the real performance gains come from agentifying the language model. With access to wikipedia, arxiv, and a rolling embedding database of prior conversations, the quality of the output shoots way up.

[–] chicken@lemmy.dbzer0.com 1 points 9 hours ago (1 children)

agentifying the language model

Any recommendations on setups for this?

[–] Techlos@lemmy.dbzer0.com 2 points 6 hours ago* (last edited 6 hours ago) (1 children)

this is a good guide on a standard approach to agentic tool use for LLMs

For conversation history:

this one is a bit more arbitrary, because it seems whenever someone finds a good history model they found a startup about it so i had to make it up as i went along.

First, BERT is used to create an embedding for each input/reply pair, and a timestamp is attached to give it some temporal info.

A buffer keeps the last 10 or so input/reply pairs beyond a token limit, and periodically summarise the short-term buffer into paragraph summaries that are then embedded; i used an overlap of 2 pairs between each summary. There's an additional buffer that did the same thing with the summaries themselves, creating megasummaries. These get put into a hierarchical database where each summary has a link to what generated the summary. Depending on how patient i'm feeling i can configure the queries to use either the megasummary embedding or summary embedding for searching the database.

During conversation, the input as well as the last input/reply get fed into a different instruction prompt that says "Create three short sentences that accurately describe the topic, content, and if mentioned the date and time that are present in the current conversation". The sentences, along with the input + last input/reply, are embedded and used to find the most relevant items in the chat history based on a weighed metric of 0.9*(1-cosine_distance(input+reply embedding, history embedding)+0.1*(input.norm(2)-reply.norm(2))^2 (magnitude term worked well when i was doing unsupervised learning, completely arbitrary here), which get added in the system prompt with some instructions on how to treat the included text as prior conversation history with timestamps. I found two approaches that worked well.

for lightweight predictable behaviour, take the average of the embeddings and randomly select N entries from the database, weighed on how closely they match using softmax + temperature to control memory exploration.

For more varied and exploratory recall, use N randomly weighted averages of the embeddings and find the closest matches for each. Bit slower because you have way more database hits, but it tends to perform much better at tying relevant information from otherwise unrelated conversations. I prefer the first method for doing research or literature review, second method is great when you're rubber ducking an idea with the LLM.

an N of 3~5 works pretty well without taking up too much of the context window. Including the most relevant summary as well as the input/reply pairs gives the best general behaviour, strikes a good balance between broad recall and detail recall. The stratified summary approach also lets me prune input/reply entries if it they don't get accessed much (whenever the db goes above a size limit, a script prunes a few dozen entries based on retrieval counts), while leaving the summary to retain some of the information.

It works better if you don't use the full context window that the model is capable of. Transformer models just aren't that great at needle in a haystack problems, and i've found a context window of 8~10k is the sweet spot for the model both paying attention to the conversation recall as well as the current conversation for the qwen models.

A ¿fun? side effect of this method is that using different system prompts for LLM behaviour will affect how the LLM summarises and recalls information, and it's actually quite hard to avoid the LLM from developing a "personality" for lack of a better word. My first implementation included the phrase "a vulcan-like attention to formal logic", and within a few days of testing each conversation summary started with "Ship's log", and developed a habit of calling me captain, probably a side effect of summaries ending up in the system prompt. It was pretty funny, but not exactly what i was aiming for.

apologies if this was a bit rambling, just played a huge set last night and i'm on a molly comedown.

[–] chicken@lemmy.dbzer0.com 1 points 4 hours ago

Nah that's some great info, though sounds like a pretty big project to set it up that way. Would definitely want to hear about it if you ever decide to put your work on a git repo somewhere.

[–] Formfiller@lemmy.world 11 points 12 hours ago

If the user is a Nazi it probably auto sends their resume to ICE

[–] Anarki_@lemmy.blahaj.zone 25 points 15 hours ago* (last edited 15 hours ago)
[–] pleasejustdie@lemmy.world 35 points 17 hours ago* (last edited 17 hours ago) (6 children)

I don't actually have a problem with this. If people are stupid enough to admit to a crime or engage in criminal activity on a platform that they don't control, that's on them. I put this as the next step of evolution from people who would commit a crime on youtube for views then get shocked pikachu'd when the police arrest them for it. They have no one to blame but themselves, they brought a 3rd party AI company into it and they did not consent to be an accomplice and if there is any company out there with the resources to have AI scan conversations to flags to send to the police with good accuracy, openAi would definitely be at the front of it.

[–] FaceDeer@fedia.io 37 points 15 hours ago (2 children)

You're fine with invasion of privacy as long as it only affects criminals.

I think you'll find that once privacy is broken you'd be surprised how many people end up under that umbrella.

[–] msage@programming.dev 10 points 14 hours ago

Using the fucking GPT is the privacy invasion.

So yes, once the company has the logs and detects any criminal or dangerous activity, it should report it.

Stop using chatbots in the first place.

[–] PattyMcB@lemmy.world 3 points 15 hours ago (1 children)

Can we have it affect the oligarchs and authoritarian fascists, too?

[–] shplane@lemmy.world 1 points 2 hours ago

“How can I best exploit the working class?”

“Great question! Here are several emojis depicting propaganda and slave labor”

[–] Seleni@lemmy.world 13 points 13 hours ago

Ahh, the ol’ ‘nothing to hide’ defense.

Ever consider things that are labeled as ‘crimes’ can and will be anything the people in power want?

Just because, say, calling Republicans ‘shithead pedophiles’ on Lemmy isn’t illegal now doesn’t mean Cheeto Mussolini won’t make it illegal tomorrow.

[–] TommySoda@lemmy.world 23 points 16 hours ago* (last edited 16 hours ago) (1 children)

Well, you should have a problem with it but not for the reasons you think. Any invasion of privacy is an issue when the people in control get to decide what is a reportable offense without explicitly telling you. I get it, you definitely shouldn't be admitting anything illegal or asking illegal advice from a chat bot. You shouldn't be doing anything that is illegal in the first place.That's basically the same as googling how to make a bomb and if you're that dumb you'll get what's coming to you. The issue arrives when you look at the bigger picture. If they have the ability to report anything they want to the police, what's stopping them from releasing anything they want to anyone they want at any time? And when it comes to those receiving the data that's been reported, what proof do you have that these entities have yours or anyone else's interests or safety in mind? What if they decide to change the rules on what they should report, they don't tell you, and then retroactively flag a bunch of your conversions with said LLM.

It's the same kinda situation that we face with these AI cameras that track us and our vehicles literally everywhere we go. There have already been multiple cases where people in law enforcement were using these tools to stalk people like ex girlfriends. All this is putting a lot of trust into people that none of us even know and expect them to have the best of intentions. What would stop them from reporting that you asked ChatGPT about the current situation in Gaza?

[–] thatsnothowyoudoit@lemmy.ca 2 points 12 hours ago

Fair points.

One thing I think we all miss: what happens when an overzealous government makes something a crime retroactively? Say, um, disparaging two Cheetos in an ill fitting suit masquerading as a world leader.

That’s part of why we should care about privacy and why we should care when data we expect to be private isn’t.

Most tech users are victims in a system they don’t understand. We might complain that they don’t want to understand but the truth is the providers don’t want them to understand - as it’s easier to sell them whatever crap they’re shilling.

[–] DrDystopia@lemy.lol 5 points 17 hours ago

Being criminally stupid when planning crimes is pretty stupid.

[–] peanuts4life@lemmy.blahaj.zone 4 points 16 hours ago

I kinda agree. While I do want these llm companies to be more private, in terms of data retention, I think it's native to say that a company which is selling artificial intelligence to hundreds of millions of users should be totally ambivalent in the face of llm induced psychosis and suicide. Especially when the technology only gets more hazardous as it becomes more capable.

[–] dickalan@lemmy.world 1 points 12 hours ago

Bro wants to comply ahead of time. lol You’re a weird little fool

[–] MashedHobbits@lemy.lol 31 points 17 hours ago (2 children)

There is no privacy if you don't self-host everything.

[–] DrDystopia@lemy.lol 17 points 16 hours ago (2 children)

On-site self- hosting, on owned hardware. Who knows what's going on behind the closed doors of data centers around the world.

And let's not get into industry standard ~~hardware backdoors~~ remote control systems.

[–] Truscape@lemmy.blahaj.zone 2 points 6 hours ago* (last edited 6 hours ago)

I was under the impression that self hosting always meant your own hardware on your own property. Buying a VPS was never self hosting in my eyes.

[–] Randomgal@lemmy.ca 4 points 9 hours ago

Pretty sure having someone else host it somewhere else is the opposite of self hosting anyways.

[–] MDCCCLV@lemmy.ca 1 points 10 hours ago

Ironically you can use it without logging on, so the people hurt the most are the paid users that are voluntarily giving money to the company.

[–] Treczoks@lemmy.world 10 points 14 hours ago

So it is not stupid enough to just use it, some people are so totally stupid and think what they put into a commercial, online thing would be private.

[–] omniman@anarchist.nexus 12 points 16 hours ago (1 children)

they will find out about my relation with uwu chatgpt mechahitler skibidi sigma wifu

[–] Glent@lemmy.ca 2 points 15 hours ago

This is why i keep my chat gpt under the sofa so when buckling up for safety my open ai stays extra crunk.

[–] salacious_coaster@infosec.pub 7 points 16 hours ago

Did they think there was patient-sycophantBot privilege or something?

[–] Pxtl@lemmy.ca 3 points 13 hours ago

As much as I hate the AI-gens, this is probably a good thing after that poor kid got talked into killing himself. I assume Google et al do similar already.

Now, if the cops react to being called for a person in crisis by tazing somebody, that's a different problem.

[–] Proprietary_Blend@lemmy.world 2 points 14 hours ago

Stupid is, etc

[–] Smackyroon@lemmy.ml 2 points 16 hours ago (1 children)

What a snitch, grok is this real??

[–] PattyMcB@lemmy.world 4 points 15 hours ago

"Yes, Nazis are cool." -Grok, probably