this post was submitted on 23 May 2024

21 points (92.0% liked)

Selfhosted

52629 readers

1018 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Monitoring software for a wide array of hw and sw (lemmy.world)

submitted 1 year ago by anamethatisnt@lemmy.world to c/selfhosted@lemmy.world

20 comments fedilink hide all child comments

I'm looking into setting up some monitoring combined with simple automation for my selfhosting. Currently I was thinking about using Zabbix.
I want to:
Track bandwidth usage on a router/fw and on a managed switch and track cpu/ram/disk usage on my vms.
Simple monitoring (up/down/maintenance) on the router, switch, my vms as well as on linux services (jellyfin/forgejo/etc) and windows services (lab for studying work-related tools).
I'm also interested in doing simple https checks on my webuis (i've had a service running but the website returning both 403 and 404 before) and testing nslookup on my internal dns (if the service is up but the lookups timeout I still want to try restarting the service).

Is there any FOSS/FLOSS alternatives that I should look into before diving into Zabbix?

all 21 comments

sorted by: hot top controversial new old

[–] Max_P@lemmy.max-p.me 8 points 1 year ago (2 children)

Prometheus/VictoriaMetrics/Grafana are pretty good, had no issues with it and there's an exporter for damn near anything. They're pretty easy to custom write too.

[–] mbirth@lemmy.mbirth.uk 2 points 1 year ago (1 children)

But these 3 are all about metrics, right? While they’re great to monitor and analyse numbers (ping times, disk space, memory, etc.), they aren’t that great with e.g. plaintext error messages in log files. That’s how I remember it from a few years ago, at least.

[–] sociableporcupine@lemmy.world 4 points 1 year ago

Grafana/Loki does logs. Still early days for me but it’s solid so far.

[–] anamethatisnt@lemmy.world 1 points 1 year ago (1 children)

Cheers! I've heard of Prometheus/Grafana but VictoriaMetrics was a new one. Gonna look into it!

[–] 4am@lemm.ee 3 points 1 year ago (1 children)

Yeah VictoriaMetrics is the new favorite since Influx keeps reinventing their wheels and trying to move everyone to the cloud.

[–] keyez@lemmy.world 1 points 1 year ago

May have to explore this, I still run influxdb and telegraf for a push metrics operation instead of pull like prom. Things have been smooth for a while but a couple months ago disk temps and metrics stopped working with no errors or missing plugins

[–] pax0707@lemmy.world 6 points 1 year ago (1 children)

I’ve been using Zabbix for ages now. It has issues but I got used to it.

[–] Tetsuo@jlai.lu 3 points 1 year ago* (last edited 1 year ago)

Zabbix still remains a good choice imo. It works fine with Grafana et now the Zabbix-grafana plugin is officially supported by Grafana.

Zabbix without Grafana is pretty weak in term of visualization.

[–] catloaf@lemm.ee 4 points 1 year ago (1 children)

I've used Zabbix for that before. I hope you like SNMP, though!

[–] anamethatisnt@lemmy.world 1 points 1 year ago

I've used SNMP a lot together with nagios so I should be able to handle it. :D

[–] johntash@eviltoast.org 4 points 1 year ago (1 children)

Uptime Kuma is great for simple up/down and web checks. Librenms is worth looking at too for other metrics.

[–] anamethatisnt@lemmy.world 2 points 1 year ago

I'll have a look! Cheers!

[–] SeeJayEmm@lemmy.procrastinati.org 3 points 1 year ago (1 children)

I'm using CheckMk for pretty much all of that. Personally I found zabbix to have too much overhead.

[–] mbirth@lemmy.mbirth.uk 4 points 1 year ago

For me it’s the other way around. In Check_MK I was constantly writing new custom checks and it was all manual code and overall felt like Nagios on steroids (what it was back then) - just not in a good way.

In Zabbix you can do everything in the UI without messing around in the file system. And things like translating SNMP results to readable text works throughout the system without having to include a Python file and then call it from within your various other checks. All the alerting logic can be clicked together and easily amended in the UI. It’s so much more comfortable once you’ve figured it out.

[–] vegetaaaaaaa@lemmy.world 2 points 1 year ago (1 children)

I use netdata (the FOSS agent only, not the cloud offering) on all my servers (physical, VMs...) and stream all metrics to a parent netdata instance. It works extremely well for me.

Other solutions are too cumbersome and heavy on maintenance for me. You can query netdata from prometheus/grafana [1] if you really need custom dashboards.

I guess you wouldn't be able to install it on the router/switch but there is a SNMP collector which should be able to query bandwidth info from the network appliances.

[–] anamethatisnt@lemmy.world 1 points 1 year ago (1 children)

Gonna check it out!
Is it easy to setup automatic responses to the alerts, f.e. restarting a service if it isn't answering requests in a timely manner?
Have you used it together with Windows Servers too?

[–] vegetaaaaaaa@lemmy.world 1 points 1 year ago

Windows Servers

setup automatic responses to the alerts

It should be possible using script to execute on alarm = /your/custom/remediation-script https://learn.netdata.cloud/docs/alerts-&-notifications/notifications/agent-dispatched-notifications/agent-notifications-reference. I have not experimented with this yet, but soon will (implementing a custom notification channel for specific alarms)

restarting a service if it isn’t answering requests

I'd rather find the root cause of the downtime/malfunction instead of blindly restarting the service, just my 2 cents.

[–] lemann@lemmy.dbzer0.com 2 points 1 year ago (1 children)

I used to use MQTT, static_status and Healthchecks.io, and have that data passed through to Home Assistant, but it started to get pretty cumbersome as the amount of machines I had grew.

I now use just Zabbix and HealthchecksIO. I did need to spend some time writing new templates for some additional data I wanted to collect (like SMART data for SSDs that provide health metrics in non-standard attributes, and HealthchecksIO so I could see the status of various checks on my zabbix dashboard)

Zabbix also has some additional features I found appealing, like proxies that can continue recording data when the main server is down, and built in encryption. Some checks like open ports/icmp responses etc can be checked using either the local agent, the remote server, or both, which helps quickly diagnose things like firewall config issues.

I did look at some other solutions, but I wanted something integrated to hit the ground running. Mobile apps are very limited, and there is no official one to my knowledge. I use Moobix which I don't believe is FOSS - but I could be wrong there

Try each solution out and see what works best for you!

[–] mbirth@lemmy.mbirth.uk 2 points 1 year ago* (last edited 1 year ago)

You know you can basically implement Healthchecks.io completely in Zabbix using zabbix-sender or any compatible implementation of it? (Or find a better way, e.g. querying the timestamp of a logfile or even check the logfile for "OK" or "ERROR" lines... lots of ways possible.)

[–] pax0707@lemmy.world 1 points 1 year ago

Also uptime kuma for fast and easy up/down, web services, etc.