this post was submitted on 10 Sep 2025
29 points (72.3% liked)
Selfhosted
59955 readers
320 users here now
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam.
-
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
-
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
-
Submission headline should match the article title.
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ollama does have some features that make it easier to use for a first-time user, including:
Calculating automatically how many layers can fit in VRAM and loading that many layers and splitting between main memory/CPU and VRAM/GPU. llama.cpp can't do that automatically yet.
Automatically unloading the model from VRAM after a period of inactivity.
I had an easier time setting up ollama than other stuff, and OP does apparently already have it set up.
Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn't support some features/optimizations, and definitely doesn't support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.
And that's not even getting into the various controversies around ollama (like broken GGUFs or indications they're going closed source in some form).
...It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so IMO its important if you really want to try; otherwise one is probably better off spending a few bucks on an API that doesn't log requests.