Technology

84796 readers

3607 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

675

Mozilla lays off 60 people, wants to build AI into Firefox (arstechnica.com)

submitted 2 years ago by einat2346@lemmy.today to c/technology@lemmy.world

326 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] __matthew__@lemmy.world 0 points 2 years ago (1 children)

Sorry but has anyone in this thread actually tried running local LLMs on CPU? You can easily run a 7B model at varying levels of quantization (ie. 5 bit quantization) and get a generalized prompt-able LLM. Yeah, of course it's going to take ~4GB of RAM (which is mem-mapped and paged into memory), but you can easily fine tune smaller more specific models (like the translation one mentioned above) and have surprising intelligence at a fraction of the resources.

Take, for example, phi-2 which performs as well as 13B param models but with 2.7B params. Yeah, that's still going to take 1.5GB RAM which Firefox wouldn't reasonably ship, but many lighter weight specialized tasks could easily use something like a fine tuned 0.3B model with quantization.

[–] cley_faye@lemmy.world 1 points 2 years ago

Yes, I did. And yes, it is possible. It's terribly slow in comparison, making it less useful. It very quickly devolves into random mumbling or get stuck in weird loops. It also hogs resources that are actually used by other tasks you may be doing.

I mainly test dev AI solutions, and moving from 1B to 7B models made them vastly more pertinent. And moving from CPU implementation (Ryzen 7 3700X) to GPU (RTX 3080 Ti) made them fast enough to be used as quick completion and immediate suggestion without breaking workflow, in addition to freeing resources for IDE, building tools and the actual software being run, while running it on CPU had multi-seconds delay, which made this use case completely useless.