Technology

75406 readers

1639 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

278

Someone made a GPT-like chatbot that runs locally on Raspberry Pi, and you can too (www.xda-developers.com)

submitted 2 years ago by throws_lemy@lemmy.nz to c/technology@lemmy.world

30 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] QuadratureSurfer@lemmy.world 47 points 2 years ago (17 children)

Direct link to the GitHub repo:
https://github.com/nickbild/local_llm_assistant?tab=readme-ov-file

It's a small model by comparison. If you want something that's offline and actually closer to comparing to ChatGPT 3.5, you'll want the Mixtral 8x7B model instead (running on a beefy machine):

https://mistral.ai/news/mixtral-of-experts/

[–] DarkThoughts@fedia.io 1 points 2 years ago (3 children)

I tried llamafile for text gen too but I couldn't get ROCm to properly work with it to run it through my GPU without having to build it myself, which I'm really not into. And CPU text gen is waaaaaay too slow for anything. Mixtral response was like ~250 seconds or so for ~1k context tokens, I think Mistral was about 52 seconds or something around that number.

https://github.com/Mozilla-Ocho/llamafile Mixtral is definitely beefy, Mistral is quite a bit faster and there's a few even smaller prebuilt ones. But the smaller you go the less complex the responses will be. I think llamafile is a good step in the right direction though, but it's still not a good out of the box experience yet. At least I got farther with it than with oobabooga, which is the recommendation for SillyTavern, which would just crash whenever it generated anything without even giving me an error.

[–] Flumpkin@slrpnk.net 0 points 2 years ago (1 children)

How fast are they with a good GPU?

[–] DarkThoughts@fedia.io 0 points 2 years ago (1 children)

Have you missed the first part where I explained that I couldn't get it to run through my GPU? I would only have a 6650 XT anyway but even that would be significantly faster than my CPU. How far I can't say exactly without experiencing it though, but I suspect with longer chats and consequently larger context sizes it would still be too slow to be really usable. Unless you're okay waiting for ages for a response.

[–] Flumpkin@slrpnk.net 1 points 2 years ago

Sorry, I'm just curious in general how fast these local LLMs are. Maybe someone else can give some rough info.

load more comments (1 replies)

load more comments (14 replies)