this post was submitted on 09 Aug 2024
69 points (93.7% liked)
Linux
48310 readers
645 users here now
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
OpenCL is needed for me for non AI stuff, so that Darktable (an image program) can use my GPU; which is much faster. But for AI? No idea how they compare, as I did not use it for that purpose. ROCm itself also is troubling...
Do you have the new Llama 3.1 8B Instruct 128k model? It's quite slow on my GPU (I have a weak beginner class GPU with 8GB, but plan to upgrade). To the point its almost as slow as my CPU. I've read complains in the Github tracker from others too and wonder if its an issue with AMD cards. BTW the previous model Llama 3.0 8B Instruct is miles faster.
I have a fairly substantial 16gb AMD GPU, and when I load in Llama 3.1 8B Instruct 128k (Q4_0), it gives me about 12 tokens per second. That's reasonably fast enough for me, but only 50% faster than CPU (which I test by loading mlabonne's abliterated Q4_K_M version, which runs on CPU in GPT4All, though I have no idea if that's actually meant to be comparable in performance).
Then I load in Nous Hermes 2 Mistral 7B DPO (also Q4_0) and it blazes through at 50+ tokens per second. So I don't really know what's going on there. Seems like performance varies a lot from model to model, but I don't know enough to speculate why. I can't even try Gemma2 models, GPT4All just crashes with them. I should probably test Alpaca to see if these perform any different there...
Wow it got worse for me. Maybe through last update? Is this probably related to he application? Now I get 12 t/s on my CPU and switching to GPU it's only 1.5 t/s. Something is fishy. With Nous hermes 2 Mistral 7B DPO with q4 I get 33 t/s (I believe it was up to 44 before).
Now I'm curious if this will happen with a different application too, but I have nothing else than GPT4All installed.
Unfortunately I can't even test Llama 3.1 in Alpaca because it refuses to download, showing some error message with the important bits cut off.
That said, the Alpaca download interface seems much more robust, allowing me to select a model and then select any version of it for download, not just apparently picking whatever version it thinks I should use. That's an improvement for sure. On GPT4All I basically have to download the model manually if I want one that's not the default, and when I do that there's a decent chance it doesn't run on GPU.
However, GPT4All allows me to plainly see how I can edit the system prompt and many other parameters the model is run with, and even configure multiple sets of parameters for the same model. That allows me to effectively pre-configure a model in much more creative ways, such as programming it to be a specific character with a specific background and mindset. I can get the Mistral model from earlier to act like anything from a very curt and emotionally neutral virtual intelligence named Jarvis to a grumpy fantasy monster whose behavior is transcribed by a narrator. GPT4All can even present an API endpoint to localhost for other programs to use.
Alpaca seems to have some degree of model customization, but I can't tell how well it compares, probably because I'm not familiar with using ollama and I don't feel like tinkering with it since it doesn't want to use my GPU. The one thing I can see that's better in it is the use of multiple models at the same time; right now GPT4All will unload one model before it loads another.
That's quite unfortunate. ~~Alpaca needs to support those explicitly to work with the new 3.1 128k models; GPT4All was not compatible with it before update either. There was a bug in some library they was using and needed a patch. So maybe that's why you can't use the new Llama 3.1 in Alpaca.~~ (Edit: Never mind. On the webpage they advertise and talk about 3.1 being working, so a wrong guess by me probably.)
Actually that sounds very useful and I missed that option, to be able to select from a set of related models. One thing that GPT4All can also do is, analyzing text files and then using the data to ask questions about it. It will also output the exact lines of the file in relation to the answer. I only experimented a little bit with this, but sounds useful too. The team also experiments and works on a web search using, but no idea how that would work with a local model if ever.
Hi I just wanted let you know that I managed to get Gemma 2 model to work (didn't work previously too).
These are the new ones Gemma 2. I wasn't 100% sure first, so looked up at Gemma models list: https://ai.google.dev/gemma/docs/get_started ~~and the only 9b variants are the new Gemma 2 versions~~ (Edit: I mislooked. There are Gemma 1 versions with 9b too, so never mind this comment. ). If this works on my low end GPU, it should work on yours too.