this post was submitted on 15 Mar 2024
22 points (84.4% liked)
Selfhosted
59999 readers
663 users here now
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam.
-
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
-
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
-
Submission headline should match the article title.
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Load average of 400???
You could install systat (or similar) and use output from sar to watch for thresholds and reboot if exceeded.
The upside of doing this is you may also be able to narrow down what is going on, exactly, when this happens, since sar records stats for CPU, memory, disk etc. So you can go back after the fact and you might be able to see if it is just a CPU thing or more than that. (Unless the problem happens instantly rather than gradually increasing).
PS: rather than using cron, you could run a script as a daemon that runs sar at 1 sec intervals.
Another thought is some kind of external watchdog. Curl webpage on server, if delay too long power cycle with smart home outlet? Idk. Just throwing crazy ideas out there.
Thank you for these ideas, I will read up on systat+sar and give it a go.
Also smart to have the script always running, sleeping, rather than launching it at intervals.
I know all of this is a poor hack, and I must address the cause - but so far I have no clues what's causing it. I'm running a bunch of Docker containers so it is very likely one of them painting itself into a corner, but after a reboot there's nothing to see, so I am now starting with logging the top process. Your ideas might work better.