Selfhosted

58565 readers

602 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Replaced $40/month in AI API subscriptions with self-hosted Ollama + n8n (discuss.tchncs.de)

submitted 4 days ago by quickbitesdev@discuss.tchncs.de to c/selfhosted@lemmy.world

39 comments fedilink hide all child comments

Quick post about a change I made that's worked out well.

I was using OpenAI API for automations in n8n — email summaries, content drafts, that kind of thing. Was spending ~$40/month.

Switched everything to Ollama running locally. The migration was pretty straightforward since n8n just hits an HTTP endpoint. Changed the URL from api.openai.com to localhost:11434 and updated the request format.

For most tasks (summarization, classification, drafting) the local models are good enough. Complex reasoning is worse but I don't need that for automation workflows.

Hardware: i7 with 16GB RAM, running Llama 3 8B. Plenty fast for async tasks.

you are viewing a single comment's thread
view the rest of the comments

[–] brownmustardminion@lemmy.ml 8 points 4 days ago (6 children)

I'm not a huge fan of AI, but I consider myself pretty open minded and have been considering doing a demo of Claude to at least gain an understanding of the tech I'm constantly talking shit about.

Is there anything self-hostable that compares in quality to what vibe coders claim Claude Opus is capable of?

[–] BarbecueCowboy@lemmy.dbzer0.com 14 points 4 days ago

The trash talking on AI is half people with legitimate concerns on the societal and ecological impact and the other half just want to be in on the party and aren't interested in understanding it. It's useful like googling things is useful, the items you search for are not always correct, but if you have a basic level of knowledge it'll help you get where you want to be much faster.

Nothing quite compares to Claude Opus in a cohesive package that I'd recommend for an average self hoster but I personally really like running Nemotron from Nvidia. It's not the best model, but in my experience it's consistently good enough along with being fast and stable. If you're focused more on coding, I hear the Qwen series had some good models.

[–] HK65@sopuli.xyz 11 points 4 days ago (1 children)

I actually did an experiment on doing just that. For context, I'm an experienced software engineer, whose company buys him a tom of Claude usage so I had time to test out what it can actually do and I feel like I'm capable of judging where it's good and where it falls short at.

How Claude Code works is that there are actually multiple models involved, one for doign the coding, one "reasoning" model to keep the chain of thought and the context going, and a bunch of small specialized ones for odd jobs around the thing.

The thing that doesn't work yet is that the big reasoning model has to still be big, otherwise it will hallucinate frequently enough to break the workflow. If you could get one of the big models to run locally, you'd be there. However, with recent advances in quantization and MoE models, it's actually getting nearer fast enough that I would expect it to be generally available in a year or two.

Today the best I could do was a tool that could take 150 gigs of RAM, 24 gigs of VRAM and AMD's top of the line card to take 30 minutes what takes Claude Code 1-2. But surprisingly, the output of the model was not bad at all.

[–] sobchak@programming.dev 1 points 3 days ago

You really only need a little more RAM than your GPU's VRAM (unless you're doing CPU offloading, which is extremely slow). Otherwise, I did the same thing recently too, and was surprised I was able to get it a Qwen 9B to fix a bug in a script I had. I think Sonnet would've fixed in a lot fewer tries, but the 9B model was eventually able to fix it. I could've fixed it myself quicker and cleaner than both, but it was an interesting test.

[–] Voroxpete@sh.itjust.works 6 points 4 days ago

Locally? You'd need a VERY powerful GPU to really be able to match the capabilities of Opus 4.6 online. I've played around with this stuff for the same reasons and while you can absolutely run a model with all of Claude's capabilities offline, very few people will have the hardware to let it actually run at an acceptable speed and with a sufficient context window. That last part is the most important thing for coding because it's what allows the model to operate across an entire project and not just a few functions at a time.

[–] lepinkainen@lemmy.world 5 points 4 days ago

Nothing you can run with affordable hardware. The SOTA stuff requires hundreds of gigabytes of memory - and not RAM, GPU memory.

But you can try with stuff like gpt-oss or qwen coder

[–] utjebe@reddthat.com 4 points 4 days ago

If it is just the user part of LLM, then paying $20 for one month subscription would be my recommendation.

You will not be able host anything like Sonnet or Opus.

[–] fuckwit_mcbumcrumble@lemmy.dbzer0.com 4 points 4 days ago

The models that the commercial AIs use are not at all usable on consumer grade hardware. The RTX pro 6000 has 96 gigs of vram, your GPU probably had 8.

I’ve played with the models that run on 16 gigs and it’s alright. But I wouldn’t even try fully vibe coding. Need some help with something small? Sure. But I wouldn’t have it try to make a finished product.