this post was submitted on 15 Mar 2024
22 points (84.4% liked)
Selfhosted
59999 readers
663 users here now
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam.
-
Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.
-
Don't duplicate the full text of your blog or git here. Just post the link for folks to click.
-
Submission headline should match the article title.
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The answer is to create a short script that periodically queries the load, makes a decision and then triggers a reboot. Run it with a SystemD service and give it privileges to do the reboot. Useful languages for the script would be bash or python.
It's a silly way to handle it. You're probably quicker and better off solving the actual issue. Because it's not normal having this happen. Have a look at the logs, or install a monitoring software like netdata to get to the root of this. It's probably some software you installed that is looping, or having a memory leak and then swapping and hogging the IO until OOM kicks in. All of that will show up in the logs. And you'll see the memory graphs slowly rising in netdata if it's a leak.
journalctl -b -1shows you messages from the previous boot. (To debug after you've pressed reset.) You can use a pastebin service to ask for more help if you can't make sense of the output.Other solutions: Some server boards have dedicated hardware, a watchdog to detect something similar to that.
You can solder a microcontroller (an ESP32 with wifi) to the reset button and program that to be a watchdog.
Edit: But in my experience it's most of the times a similar amount of effort to either delve down and solve the underlying problem entirely and at once. Or writing scripts around it and putting a band-aid on it. But with that the issue is still there, and you're bound to spend additional time with it once side-effects and quirks become obvious.