this post was submitted on 24 May 2026
29 points (93.9% liked)

homelab

10445 readers
16 users here now

founded 5 years ago
MODERATORS
 

35883581

I consolidated my setup a bit. This is my local LLM hosting server. I took a gamble on a Chinese NVLink SXM2 mezzanine board from ebay, it was surprisingly plug and play for my dual 16GB v100s 😂. I'm also running a Tesla P40 that I repasted with liquid metal that sits at 24C idle 🥶.

you are viewing a single comment's thread
view the rest of the comments
[–] whatiswrongwithyou@lemmy.ml 2 points 1 month ago (1 children)

What kind of model and space limitations are you under with v100s?

Some of the most interesting computer music I’ve heard in years was composed on them but idk if it’s worth getting into a whole new generation of hardware if it can only really do that.

[–] pech@lemmy.world 1 points 1 month ago

Currently I'm running a Q6K quant of Hermes 4 14B with a 32K context window via llama.cpp that works pretty well. Generation output is a comfy ~50tok/sec. These v100s are 16GB each, but there are 32GB versions available too.

I'm running everything via NixOS and have to do package overrides to get inference engines to build with the right CUDA versions.

My goal is to get a cohesive environment set up for Hermes Agent to learn my system/lab/network and help my grow it over time.

Overall, I'm happy with them. The mezzanine board is good quality, I'm using PTM sheets under those massive heatsinks and some arctic p9 fans to keep them at around 60C under load.