this post was submitted on 13 Feb 2026
27 points (80.0% liked)

Selfhosted

56379 readers
1123 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

  7. No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Not sure if this goes here or if this post will be hated upon? but i want to host ai like llms and comfyuis newer models locally but im not sure what type of setup or parts would work best on a possible slim budget? im not sure either if now is the time with inflation and such.

I dont have a price in mind yet but im wondering how much it would cost or what parts i may need?

If you have any questions or concerns please leave a comment.

all 34 comments
sorted by: hot top controversial new old
[–] vegetaaaaaaa@lemmy.world 7 points 3 hours ago* (last edited 3 hours ago)
  • Small 4B models like gemma3 will run on anything (I have it running on a 2020 laptop with integrated graphics). Don't expect superintelligence, but it works for basic classification tasks, writing/reviewing/fixing small scripts and basic chat, writing, etc
  • I use https://github.com/ggml-org/llama.cpp in server mode pointing to a directory of GGUF model files downloaded from huggingface. I access it it from the built-in web interface or API (wrote a small assistant script)
  • To load larger models you need more RAM (preferably fast VRAM/GPU but DDR5 on the motherboard will work - it will be noticeably slower). My gaming rig with 16GB AMD 9070 runs 20-30B models at decent speeds. You can grab quantized (lower precision, lower output quality) versions of those larger models if the full-size/unquantized models don't fit. Check out https://whatmodelscanirun.com/
  • For image generation I found https://github.com/vladmandic/sdnext which works extremely well and fast wth Z-Image Turbo, FLUX.1-schnell, Stable Diffusion XL and a few other models

As for the prices... well the rig I bought for ~1500€ in september is now up to ~2200€ (once-in-a-decade investment). It's not a beast but it works, the primary use case was general computing and gaming, I'm glad it works for local AI, but costs for a dedicated, performant AI rig are ridiculously high right now. It's not economically competitive yet against commercial LLM services for complex tasks, but that's not the point. Check https://old.reddit.com/r/LocalLLaMA/ (yeah reddit I know). 10k€ of hardware to run ~200-300B models, not counting electricity bills

[–] bandwidthcrisis@lemmy.world 2 points 8 hours ago* (last edited 8 hours ago)

I've run koboldcpp on a steam deck. You have to stick to small model files, of course, like maybe 4Gb, but you can get decent speed if you do.

And Edge Gallery on Android can run models locally on a phone.

[–] chicken@lemmy.dbzer0.com 6 points 13 hours ago

If your focus is LLMs, get a 3090 gpu. Vram is the most important thing here because it determines what models you can load and run at a decent speed, and having 24Gb will let you run the mid range models that specifically target this amount of memory because of this being a very standard amount to have for hobbyists. These models are viable for coding, the smaller ones are less so. Looking at prices it seems like you can get this card for 1-2k depending on if you go used or refurbished. I don't know if better price options are going to be available soon but with the ram shortage and huge general demand it kind of doesn't seem like it.

If you want to focus on image or video generation instead, I understand that there are advantages to going with newer generation cards because certain features and speed is more of a factor than just vram but I know less about this.

[–] sj_zero@lotide.fbxl.net 1 points 11 hours ago

I'm running a 4B model on one of my machines, an old surface book 1.

It's a brutal machine. heat issues, and the GPU doesn't work in linux. But pick a minimal enough model and it's good enough for me to have LLM access in my nextcloud if for some reason I wanted it.

Biggest thing really seems to be memory, most cheaper GPUs don't have enough to run a big model, and CPUs are dreadfully slow on larger models if you can put enough RAM in one of them.

[–] Atherel@lemmy.dbzer0.com 10 points 18 hours ago

As others said it all depends on what you expect. I run stable diffusion on my gaming pc with 32GB RAM and a AMD 9070xt and it works fine. Did also on a 6800xt before that one died. A GPU with 16GB RAM helps a lot, would say that 12GB is the minimum. Lower will limit you in the models and speed.

For LLM just try it out, they work fine without special hardware for smaller models and as long as you are the only user. There are tools like Jan or lmstudio which make it easy to run.

[–] KairuByte@lemmy.dbzer0.com 19 points 23 hours ago

It really comes down to what kind of speed you want. You can run some LLMs on older hardware “just fine” and many models without a dedicated GPU. The problem is that the time taken to generate responses gets to be crazy.

I ran DeepSeek on an old R410 for shits and giggles a while back, and it worked. It just took multiple minutes to actually give me a complete response.

[–] panda_abyss@lemmy.ca 15 points 22 hours ago (1 children)

High RAM for MOE models, high VRAM for dense models, and the highest GPU memory bandwidth you can get.

For stable diffusion models (comfyui), you want high VRAM and bandwidth. Diffusion is a GPU heavy and memory intensive operation.

Software/driver support is very important for diffusion models and comfy UI, so your best experience will be Nvidia cards.

I think realistically you need 80gb+ of RAM for things like qwen image quants (40 for model, 20-40 for LORA adapters in ComfyUI to get output).

I run an 128gb AMD AI 395+ Max rig, qwen image takes 5-20 minutes per 720p qwen image result in ComfyUI. Batching offers an improvement, reducing iterations during prototyping makes a huge difference. I have not tested since the fall though, and the newer models are more efficient.

[–] ikidd@lemmy.world 1 points 15 hours ago (1 children)
[–] panda_abyss@lemmy.ca 1 points 14 hours ago (1 children)
[–] ikidd@lemmy.world 1 points 14 hours ago (1 children)

I've really been mulling one of those over with 128GB. I'm on Claude Max and Cerebras $50 so I'm using a good amount of $200/mo for coding and Openclaw. Is it worth it for light coding, or are you only doing SD with it?

[–] panda_abyss@lemmy.ca 3 points 12 hours ago (1 children)

It would not be worth it as a replacement for Claude.

80% of my issue is that it's AMD and their drivers are still awful. 20% is that the token generation speed very slow, especially compared to commercial models running on dedicated hardware. MOE models are fine, dense models are too slow for meaningful workflows. ComfyUI is decent, but I'm not seriously into image gen.

I have a lot of fun with it, but I have not been able to use it for any actual AI dev.

[–] ikidd@lemmy.world 2 points 2 hours ago (1 children)

Thanks for the feedback. That was precisely my worry about outlaying that money and not being happy with the result.

[–] panda_abyss@lemmy.ca 1 points 1 hour ago

It's still a fantastic computer.

I use it as a server and it's very very fast, especially threaded workflows, and IO is fast.

Just don't buy it expecting to replace paid AI services. And don't buy it for AI dev, on paper it should be good, but driver issues. DGX Spark is better if yo want an AI dev machine.

[–] possiblylinux127@lemmy.zip 3 points 17 hours ago (1 children)
[–] Matty_r@programming.dev 0 points 14 hours ago

Best I can do is $2.

[–] derjules@lemmy.world 6 points 23 hours ago

I’m running gpt-oss20b fine on my m3 MacMini

[–] slazer2au@lemmy.world 6 points 23 hours ago (1 children)

Depends on how fast you want it to run. A Raspberry Pi with an AI hat runs well enough.

[–] illusionist@lemmy.zip 6 points 23 hours ago (1 children)

What's an ai hat? Like a red hat? Or a fedora?

[–] shyguyblue@lemmy.world 9 points 22 hours ago (1 children)

Hats are little modules you can stick on your pi for extra functionality!

And they probably do have a Fedora hat...

[–] illusionist@lemmy.zip 5 points 21 hours ago (1 children)

Crazy! I thought that's a joke. Thanks!

[–] slazer2au@lemmy.world 5 points 20 hours ago (1 children)

A lot of expansions of the Pi are called hats from some reason.

https://www.raspberrypi.com/products/ai-hat/

[–] prettygorgeous@aussie.zone 3 points 18 hours ago

How's the performance on something like this?

[–] one_old_coder@piefed.social 6 points 23 hours ago (1 children)

AI said:

To run AI models locally, you'll need a computer with a capable CPU, sufficient RAM, and a powerful GPU

While it's possible to run some AI models on a laptop, a dedicated desktop setup with a powerful GPU will generally offer better performance. The cost of building a dedicated AI PC can range from around $800 for a budget build to $2,500 for a performance-oriented system

Hope that helps /s

[–] paper_moon@lemmy.world 11 points 22 hours ago

I wonder if it took into account when generating the price estimated, all the hikes in RAM pricing that it itself is causing...🤔

Stupid fucking AI data centers...

[–] ShellMonkey@piefed.socdojo.com 5 points 22 hours ago (1 children)

I was using a Nvidia 3060 for a while, then had 2 in one box, then switched to a 3090.

The amount of vram is a big factor for decent performance. Getting it to not sound like a predictably repetitive bot though is a whole separate thing that is still kind of elusive.

[–] surewhynotlem@lemmy.world 1 points 14 hours ago (1 children)

Does multiple GPU help? I could get a cheap 970 to toss in my rig

[–] ShellMonkey@piefed.socdojo.com 2 points 13 hours ago

My go to for messing with chat bots is Kobold that'll let you split the work between multiple GPUs. I get the impression the actual processing is only done on one but it lets you load larger models with the extra memory.

[–] vala@lemmy.dbzer0.com 3 points 21 hours ago

FYI diffusion models are not really LLMs

[–] fubarx@lemmy.world 2 points 20 hours ago (1 children)

Alex Ziskind on YT tests a number of on-site AI devices: https://youtu.be/QbtScohcdwI

[–] ikidd@lemmy.world 1 points 15 hours ago

This was the channel I was going to suggest. A lot of what he shows is pretty pricey, but some would make sense if you weren't too concerned about speed.

[–] Decronym@lemmy.decronym.xyz 1 points 22 hours ago* (last edited 1 hour ago) (1 children)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
Git Popular version control system, primarily for code
NAS Network-Attached Storage
NUC Next Unit of Computing brand of Intel small computers
NVR Network Video Recorder (generally for CCTV)
PSU Power Supply Unit
Plex Brand of media server package
PoE Power over Ethernet
RAID Redundant Array of Independent Disks for mass storage
SSD Solid State Drive mass storage
Unifi Ubiquiti WiFi hardware brand
VPS Virtual Private Server (opposed to shared hosting)

11 acronyms in this thread; the most compressed thread commented on today has 5 acronyms.

[Thread #91 for this comm, first seen 13th Feb 2026, 17:50] [FAQ] [Full list] [Contact] [Source code]

[–] Telorand@reddthat.com 1 points 22 hours ago

You forgot the acronym "EVIL."

[–] oeuf@slrpnk.net 0 points 19 hours ago

I'm running a couple of smaller chat models on my mid-range new-ish laptop and they're fairly quick. Try out Jan with something like their jan-nano model on whatever you've already got and get a feel for what you can do.