Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
-
No low-effort posts. This is subjective and will largely be determined by the community member reports.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
view the rest of the comments
I'm not a huge fan of AI, but I consider myself pretty open minded and have been considering doing a demo of Claude to at least gain an understanding of the tech I'm constantly talking shit about.
Is there anything self-hostable that compares in quality to what vibe coders claim Claude Opus is capable of?
The trash talking on AI is half people with legitimate concerns on the societal and ecological impact and the other half just want to be in on the party and aren't interested in understanding it. It's useful like googling things is useful, the items you search for are not always correct, but if you have a basic level of knowledge it'll help you get where you want to be much faster.
Nothing quite compares to Claude Opus in a cohesive package that I'd recommend for an average self hoster but I personally really like running Nemotron from Nvidia. It's not the best model, but in my experience it's consistently good enough along with being fast and stable. If you're focused more on coding, I hear the Qwen series had some good models.
I actually did an experiment on doing just that. For context, I'm an experienced software engineer, whose company buys him a tom of Claude usage so I had time to test out what it can actually do and I feel like I'm capable of judging where it's good and where it falls short at.
How Claude Code works is that there are actually multiple models involved, one for doign the coding, one "reasoning" model to keep the chain of thought and the context going, and a bunch of small specialized ones for odd jobs around the thing.
The thing that doesn't work yet is that the big reasoning model has to still be big, otherwise it will hallucinate frequently enough to break the workflow. If you could get one of the big models to run locally, you'd be there. However, with recent advances in quantization and MoE models, it's actually getting nearer fast enough that I would expect it to be generally available in a year or two.
Today the best I could do was a tool that could take 150 gigs of RAM, 24 gigs of VRAM and AMD's top of the line card to take 30 minutes what takes Claude Code 1-2. But surprisingly, the output of the model was not bad at all.
You really only need a little more RAM than your GPU's VRAM (unless you're doing CPU offloading, which is extremely slow). Otherwise, I did the same thing recently too, and was surprised I was able to get it a Qwen 9B to fix a bug in a script I had. I think Sonnet would've fixed in a lot fewer tries, but the 9B model was eventually able to fix it. I could've fixed it myself quicker and cleaner than both, but it was an interesting test.
Locally? You'd need a VERY powerful GPU to really be able to match the capabilities of Opus 4.6 online. I've played around with this stuff for the same reasons and while you can absolutely run a model with all of Claude's capabilities offline, very few people will have the hardware to let it actually run at an acceptable speed and with a sufficient context window. That last part is the most important thing for coding because it's what allows the model to operate across an entire project and not just a few functions at a time.
Nothing you can run with affordable hardware. The SOTA stuff requires hundreds of gigabytes of memory - and not RAM, GPU memory.
But you can try with stuff like gpt-oss or qwen coder
If it is just the user part of LLM, then paying $20 for one month subscription would be my recommendation.
You will not be able host anything like Sonnet or Opus.
The models that the commercial AIs use are not at all usable on consumer grade hardware. The RTX pro 6000 has 96 gigs of vram, your GPU probably had 8.
I’ve played with the models that run on 16 gigs and it’s alright. But I wouldn’t even try fully vibe coding. Need some help with something small? Sure. But I wouldn’t have it try to make a finished product.