Selfhosted

58781 readers

320 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Ollama Server Component Recommendations (piefed.blahaj.zone)

submitted 13 hours ago by irotsoma@piefed.blahaj.zone to c/selfhosted@lemmy.world

8 comments fedilink hide all child comments

I'm looking to build a low-end ollama LLM server to improve home assistant voice control, Immich image recognition and a few other services. With the current cost of hardware components like memory, I'm looking to build something small, but somewhat expandable.

I have an old micro-atx form factor computer that I'm thinking will be a good option to upgrade. I'd love recommendations on motherboards, processors, and video card combos that would likely be compatible and sufficient to run a decent server while keeping costs lower, basically, the best bang for the buck. I have a couple of M.2 SSDs I can re-purpose. Would prefer the motherboard has 2.5Gbit Ethernet, but otherwise I'm open.

Also recommendations on sites to purchase good quality memory at reasonable prices that ship to the US. I'd be willing to look at lightly used components, too.

Any advice on any of these topics would be greatly appreciated. The advice I've found has all been out of date especially with crypto fading so video cards are not as expensive, but LLM data centers eating up and reserving memory before it's even manufactured.

top 8 comments

sorted by: hot top controversial new old

[–] vegetaaaaaaa@lemmy.world 1 points 1 hour ago* (last edited 1 hour ago)

I suggest using llama.cpp instead of ollama, you can easily squeeze +10% in inference speed and other memory optimizations from llama.cpp. With hardware prices nowadays I think every % saved on resources matters. Here is a simple ansible role to setup llama.cpp, it should give you a good idea of how to deploy it.

A dedicated inference rig is not gonna be cheap. What I did, since I need a gaming rig; is getting 32GB DDR5 (this was before the current RAMpocalypse, if I had known I would have bought 64) and an AMD 9070 (16GB VRAM - again if I had known how crazy prices would get I'd probably ahve bought a 24GB VRAM card). The home server runs the usual/non-AI stuff, and llamacpp runs on the gaming desktop (the home server just has a proxy to it). Yeah the gaming desktop has to be powered up when I want to run inference, this is my main desktop so it's powered on most of the time, no big deal

[–] chrash0@lemmy.world 8 points 12 hours ago (3 children)

honestly it’s hard to beat Macs these days in this space for two reasons:

unified memory means that you don’t have to load up on RAM just to load the model and then also shell out for a video card with barely enough VRAM to fit a basic language model
their supply chain is solid and has mostly avoided the constraints that other OEMs and parts manufacturers are struggling with

pricing is tough. sure, crypto is on its way out, but GPUs are still the platform of choice for most neural net workloads (outside of SoCs like Apple M-series). i built a PC in late 2024, and it’s easily worth twice what i paid for it.

[–] irotsoma@piefed.blahaj.zone 5 points 10 hours ago (2 children)

Yeah,but I dont want to get locked into a proprietary OS or have to put a lot of effort into hacking it to run Linux.

[–] chrash0@lemmy.world 2 points 8 hours ago

super fair. i am a Linux guy normally. i’m just being honest. i wish there was a better more open alternative.

if you want to go with the Linux alternative it’s going to cost. get at least 32GB of RAM and at least a 4090 to run the kind of models you’re asking for. it’s the way she goes

[–] ryokimball@infosec.pub -1 points 8 hours ago

The apple silicon is more energy efficient but the latest Intel and AMD CPUs deliver more processing power and can also share a significant amount of RAM to the GPU / AI components.

[–] curbstickle@anarchist.nexus 4 points 11 hours ago

Going to second this, its all my m2 does right now. Putting together a solution for the office with some m4s.

Its a lot of bang for the buck specifically for llm use despite being horribly overpriced otherwise.

[–] irmadlad@lemmy.world 1 points 10 hours ago* (last edited 10 hours ago)

i built a PC in late 2024, and it’s easily worth twice what i paid for it.

spoiler

I wrote the vendor and asked him if the decimal was in the right place or was this the model that was beta testing alien technology. Got to be a misprint.

[–] mierdabird@lemmy.dbzer0.com 1 points 9 hours ago

It's hard to say what exactly your requirements are in terms of VRAM/RAM from what you described here, but as a general recommendation whether AMD or Intel, I'd stick with DDR4 generation hardware. DDR5 is extremely expensive, but any non-MoE model that spills into system memory will still be frustratingly slow.

For GPU's the best bang for your buck if you want Nvidia is probably the 3060 12GB, it has 360GB/s memory bandwidth and one or more of those is a very reasonable starting point for local AI.
If you're okay with AMD there are some really unique cards floating around, I recently picked up a V620 off ebay for $350, it's an ex-datacenter card with 32GB GDDR6 @ 512GB/s bandwidth. It's a bit of a power hog but in my early testing it was running Qwen coder 3 30B at like 100 tokens/sec.

I run it on an ASUS X570 PRO board which is the cheapest AM4 board I could find with an optimal PCI-E setup: three x16 slots running 4.0x8, 4.0x8, 3.0x4. I have successfully tested it with the V620, a 9060XT, and a 3060 for 60 GB total VRAM, though the third x16 is only single slot so I had to borrow a pci extender cable to try it. I've found 48gb VRAM is plenty for me so I doubt I'll actually run a third card unless I find a good deal on a single slot one.

Kinda turned into a ramble but let me know if you got questions