this post was submitted on 04 Aug 2025
350 points (96.8% liked)

Technology

76362 readers
4155 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] brucethemoose@lemmy.world 18 points 2 months ago* (last edited 2 months ago) (1 children)

Open models are going to kick the stool out. Hopefully.

GLM 4.5 is already #2 on lm arena, above Grok and ChatGPT, and runnable on homelab rigs, yet just 32B active (which is mad). Extrapolate that a bit, and it’s just a race to the zero-cost bottom. None of this is sustainable.

[–] dubyakay@lemmy.ca 7 points 2 months ago (2 children)

I did not understand half of what you've written. But what do I need to get this running on my home PC?

[–] brucethemoose@lemmy.world 5 points 2 months ago* (last edited 2 months ago)

I am referencing this: https://z.ai/blog/glm-4.5

The full GLM? Basically a 3090 or 4090 and a budget EPYC CPU. Or maybe 2 GPUs on a threadripper system.

GLM Air? Now this would work on a 16GB+ VRAM desktop, just slap in 96GB+ (maybe 64GB?) of fast RAM. Or the recent Framework desktop, or any mini PC/laptop with the 128GB Ryzen 395 config, or a 128GB+ Mac.

You’d download the weights, quantize yourself if needed, and run them in ik_llama.cpp (which should get support imminently).

https://github.com/ikawrakow/ik_llama.cpp/

But these are…not lightweight models. If you don’t want a homelab, there are better ones that will fit on more typical hardware configs.

[–] tomkatt@lemmy.world 4 points 2 months ago* (last edited 2 months ago) (2 children)

You can probably just use ollama and import the model.

[–] brucethemoose@lemmy.world 5 points 2 months ago

It’s going to be slow as molasses on ollama. It needs a better runtime, and GLM 4.5 probably isn’t supported at this moment anyway.

[–] WorldsDumbestMan@lemmy.today 3 points 2 months ago (1 children)

I'm running Qwen 3B and it is seldom useful

[–] brucethemoose@lemmy.world 2 points 2 months ago* (last edited 2 months ago) (1 children)

It's too small.

IDK what your platform is, but have you tried Qwen3 A3B? Or smallthinker 21B?

https://huggingface.co/PowerInfer/SmallThinker-21BA3B-Instruct

The speed should be somewhat similar.

[–] WorldsDumbestMan@lemmy.today 1 points 2 months ago (1 children)

Qwen3 8B sorry, Idiot spelling. I use it to talk about problems when I have no internet or maxed out on Claude. I can rarely trust it with anything reasoning related, it's faster and easier to do most things myself.

[–] brucethemoose@lemmy.world 3 points 2 months ago* (last edited 2 months ago)

Yeah, 7B models are just not quite there.

There are tons of places to get free access to bigger models. I'd suggest Jamba, Kimi, Deepseek Chat, and Google AI Studio, and the new GLM chat app: https://chat.z.ai/

And depending on your hardware, you can probably run better MoEs at the speed of 8Bs. Qwen3 30B is so much smarter its not even funny, and faster on CPU.