Diabolo96

joined 1 year ago
[–] Diabolo96@lemmy.dbzer0.com 5 points 7 months ago* (last edited 7 months ago)

Easy. Don't run the python files directly but create a a launcher script that use md5 hash to check if the python file you wanna run changed and then apply the patch before actually running the patched python file. This avoid ever running the unpatched version.

[–] Diabolo96@lemmy.dbzer0.com 2 points 7 months ago* (last edited 7 months ago)

I was hoping for something that I could use on a mobile app.

Record then transcribe later ? But you can try https://whisper.ggerganov.com ( this runs on your browser but nothing is sent. So works even on your Android/IOS phone.) the website owner is a trusted dev that made whisper.cpp and llama.ccp, the latter basically being the backbone of the entire LLM industry.

I'm not sure what "adapting the model size" means so this might be more complicated than I'm looking for.

A bit of complexity is generally the price to pay for freedom from the constant surveillance and data gathering. Plus, It's actually super easy. Bigger model means better transcription quality, but the smaller ones are really good already. The base.en is probably all you need anyway.

On pc, you can generally try any app from github. They basically all use the same backend.

I found a few : https://whishper.net/ https://github.com/chidiwilliams/buzz

[–] Diabolo96@lemmy.dbzer0.com 1 points 7 months ago* (last edited 7 months ago)

No. Quantization make it go faster. Not blazing fast, but decent.

[–] Diabolo96@lemmy.dbzer0.com 1 points 7 months ago

Completely forgot to tell you to only use quantized models. Your pc can run 4bit quantized versions of the models I mentioned. That's the key for running llms on at consumer level hardware. You can later read further about the different quantizations and toy with other ones like Q5_K_M and such.

Just read phi-3 got released and apparently it's a 4B that reach gpt 3.5 level. Follow the news and wait for it to be add to ollama/llama.ccp

Thank you so much for taking the time to help me with that! I'm very new to the whole LLM things, and sorta figuring it out as I go

I became fascinated with llms after the first AI booms but all this knowledge is basically useless where I live, so might as well make it useful by teaching people what i know.

[–] Diabolo96@lemmy.dbzer0.com 2 points 7 months ago* (last edited 7 months ago) (2 children)

The key is quantized models. A full model wouldn't fit but a 4bit 8b llama3 would fit.

[–] Diabolo96@lemmy.dbzer0.com 15 points 7 months ago (2 children)

It's AI and your voice won't be used for training if you use a local model.

Use Whisper stt. It run on your computer so nothing will be out. You can adapt the model size based on how powerful your computer is. The bigger the model the better at transcribing it will be.

[–] Diabolo96@lemmy.dbzer0.com 2 points 7 months ago* (last edited 7 months ago) (2 children)

Yeah, it's not a potato but not that powerful eaither. Nonetheless, it should run a 7b/8b/9b and maybe 13b models easily.

running them in Python with Huggingface's Transformers library (from local models

That's your problem right here. Python is great for making llms but is horrible at running them. With a computer as weak as yours, every bit of performance counts.

Just try ollama or llama.ccp . Their github is also a goldmine for other projects you could try.

Llama.ccp can partially run the model on the gpu for way faster inference.

Piper is a pretty decent very lightweight tts engine that can be directly run on your cpu if you want to add tts capabilities to your setup.

Good luck and happy tinkering!

[–] Diabolo96@lemmy.dbzer0.com 11 points 7 months ago* (last edited 7 months ago) (2 children)

Teach kids programming by making games with them. Find a random simple to make 'one tap, easy to control but hard to master game' like flappy bird.etc on playstore. Try remaking the game with the kid.

[–] Diabolo96@lemmy.dbzer0.com 1 points 7 months ago (4 children)

Specs? Try mistral with llama.ccp.

[–] Diabolo96@lemmy.dbzer0.com 2 points 7 months ago (6 children)

It shouldn't happen for a 8b model. Even on CPU, it's supposed to be decently fast. There's definitely something wrong here.

[–] Diabolo96@lemmy.dbzer0.com 3 points 7 months ago (8 children)

Sadly, can't really help you much. I have a potato pc and the biggest model I ran on it was Microsoft phi-2 using the candle framework. I used to tinker with Llama.cpp on colab, but it seems they don't handle llama3 yet. ollama says it does , but I've never tried it before. For the speed, It's kinda expected for a 70b model to be really slow on the CPU. How much slow is too slow ? I don't really know...

You can always try the 8b model. People says it's really great and even replaced the 70b models they've been using.

[–] Diabolo96@lemmy.dbzer0.com 18 points 7 months ago* (last edited 7 months ago) (14 children)

Run 70b llama3 on one and have a 100% local, gpt4 level home assistant . Hook it up with coqui.Ai xttsv2 for mind baffling natural language speech (100% local too ) that can imitate anyone's voice. Now, you got yourself Jarvis from Ironman.

Edit : thought they were some kind of beast machines with 192gb ram and stuff. They're just regular middle-low tier pcs.

view more: ‹ prev next ›