this post was submitted on 02 Nov 2025
98 points (93.0% liked)

Selfhosted

52735 readers
275 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

In the next ~6 months I’m going to entirely overhaul my setup. Today I have a NUC6i3 running Home Assistant OS, and a NUC8i7 running OpenMediaVault with all the usual suspects via Docker.

I want to upgrade hardware significantly, partially because I’d like to bring in some local LLM. Nothing crazy, 1-8B models hitting 50tps would make me happy. But even that is going to mean a beefy machine compared to today, which will be nice for everything else too of course.

I’m still all over the place on hardware, part of what I’m trying to decide is whether to go with a single machine for everything or keep them separate.

Idea 1 is a beefy machine and Proxmox with HA in a VM, OMV or TrueNAS in another, and maybe a 3rd straight Debian to separate all the Docker stuff. But I don’t know if I want to add the complexity.

Idea 2 would be beefy machine for straight OMV/TrueNAS and run most stuff there, and then just move HA over to the existing i7 for more breathing room (mostly for Frigate, which could also separate to other machine I guess).

I hear a lot of great things about Proxmox, but I’m not sold that it’s worth the new complexity for me. And keeping HA (which is “critical” compared to everything else) separated feels like a smart choice. But keeping it on aging hardware diminishes that anyway, so I don’t know.

Just wanting to hear various opinions I guess.

you are viewing a single comment's thread
view the rest of the comments
[–] tmjaea@lemmy.world 4 points 3 days ago (1 children)

Please elaborate. How does it handle ssh keys? And what is fragile regarding corosync?

[–] dbtng@eviltoast.org 4 points 2 days ago* (last edited 2 days ago) (1 children)

SSH key management in PVE is handled in a set of secondary files, while the original debian files are replaced with symlinks. Well, that's still debian. And in some circumstances the symlinks get b0rked or replaced with the original SSH files, the keys get out of sync, and one machine in the cluster can't talk to another. The really irritating thing about this is that the tools meant to fix it (pvecm updatecerts) don't work. I've got an elaborate set of procedures to gather the certs from the hosts and fix the files when it breaks, but it sux bad enough that I've got two clusters I'm putting off fixing.

Corosync is the cluster. It's a shared file system that immediately replicates any changes to all members. That's essentially anything under /etc/pve/. Corosync is very sensitive. I believe they ask for 10ms lag or less between hosts, so it can't work over a WAN connection. Shit like VM restores or vmotion between hosts can flood it out. Looks fukin awful when it goes down. Your whole cluster goes kaput.

All corosync does is push around this set of config files, so a dedicated NIC is overkill, but in busy environments, you might wind up resorting to that. You can put cororsync on its own network, but you obviously need a network for that. And you can establish throttles on various types of host file transfer activities, but that's a balancing act that I've only gotten right in our colos where we only have 1gb networks. I have my systems provisioned on a dedicated corosync vlan and also use a secondary IP on a different physical interface, but corosync is too dumb to fall back to the secondary if the primary is still "up", regardless of whether its actually communicating, so I get calls on my day off about "the cluster is down!!!1" when people restore backups.

[–] tmjaea@lemmy.world 2 points 2 days ago (1 children)

Thanks for your answer.

I use proxmox since version 2.1 in my home lab and since 2020 in production at work. We did not have issues with the ssh files yet. Also corosync is working fine although it shares its 10g network with ceph.

In all that time I was not aware of how the certs are handled, despite the fact I had two official proxmox trainings. Ouch.

[–] dbtng@eviltoast.org 5 points 2 days ago* (last edited 2 days ago)

Cool.

Here. SSH key issues. There was a huge forum war.
https://forum.proxmox.com/threads/ssh-keys-in-a-proxmox-cluster-resolving-replication-host-key-verification-failed-errors.138102/
But its still a thing. That still needs to be fixed by a human. Today that's me.

Regarding CEPH and corosync on the same network ... well I'm just getting started with that now. I do have them on different vlans, but its the same 10gb set of nics. I'm hoping if it gets really lousy, my netadmin can prioritize the corosync vlan. I'll burn that bridge when I come to it.


EDIT ... The linked forum post above leads to the SSH key answer, but its convoluted.
Here's what I put in my own wiki.

Get the right key from each server.
cat ~/.ssh/id_rsa.pub

Make sure they match in here. Fix em if they don't.
/etc/pve/priv/authorized_keys

There's a couple symlinks to fix too, but this should get it.