this post was submitted on 07 Feb 2026
434 points (99.5% liked)
Technology
80795 readers
3580 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
That calculator is total nonsense. Don't trust anything like that; at best, its obsolete the week after its posted.
Yeah, that's a huge caveat. AMD Blender might be better than you think though, and you can use your RTX 4060 on a Strix Halo motherboard just fine. The CPU itself is incredible for any kind of workstation workload.
So far, NPUs have been useless. Don't buy any of that marketing.
That's still 5 words/second. That's not a bad reading speed.
Whether its enough? That depends. GLM 350B without thinking is smarter than most models with thinking, so I end up with better answers faster.
But anyway, I'm get more like 20 tokens a second with models that aren't squeezed into my rig within an inch of their life. If you buy an HEDT/Server CPU with more RAM channels, it's even faster.
If you want to look into the bleeding edge, start with https://github.com/ikawrakow/ik_llama.cpp/
And all the models on huggingface with the ik tag: https://huggingface.co/models?other=ik_llama.cpp&sort=modified
You'll see instructions for running big models on a 4060 + RAM.
If you're trying to like batch process documents quickly (so no CPU offloading), look at exl3s instead: https://huggingface.co/models?num_parameters=min%3A12B%2Cmax%3A32B&sort=modified&search=exl3
And run them with this: https://github.com/theroyallab/tabbyAPI
Ah, a lot of good info! Thanks, I’ll look into all of that!