Others mentioned that it sounds hardware related. That reminded me that you should visually inspect all ports, especially unused ones. I had a system locking up constantly once and it turned out to be from debris in a usb port on one of the monitors was causing a short. It's something that wont necessarily cause logged errors and can cause seemingly random behaviour that will have one chasing problems that don't exist. USB ports and DVI plugs especially can be hard to notice with how they are constructed. Audio jack holes too. Make sure there's no breaks in any cords. It seems too simple to be true and then it happens and you feel foolish for not having thought of it.
Linux
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
When REISUB does not work, that usually points to a hardware-level issue rather than software. Here is my debugging checklist for hard freezes:
Step 1: Rule out RAM
- Boot a live USB and run
memtest86+overnight. Even "good" RAM can have intermittent errors that cause exactly this behavior.
Step 2: Check thermals
- Install
lm-sensorsand runsensorsbefore/during heavy loads - Also check GPU temps if you have a dedicated GPU:
nvidia-smior for AMD:cat /sys/class/drm/card0/device/hwmon/hwmon*/temp1_input - A CPU hitting thermal throttle then failing = instant freeze
Step 3: GPU driver
- If you are using Nvidia proprietary drivers, try switching to nouveau temporarily. Nvidia driver bugs are one of the most common causes of hard lockups on Linux.
- Check
dmesg | grep -i nvidiaordmesg | grep -i gpuafter reboot
Step 4: Kernel logs from previous boot
journalctl -b -1 -p err— shows errors from the last boot before the crashjournalctl -b -1 | tail -100— last 100 lines before crash, often reveals the culprit
Step 5: SSH test
- Set up SSH from another device. Next time it freezes, try to SSH in. If SSH works but display is dead = GPU/display issue. If SSH also fails = kernel panic or hardware.
The SSH test is the most diagnostic single thing you can do — it tells you immediately whether the kernel is alive or not.
I had this happen when a game at random times filled up the available memory so quickly that it froze completely before any OOM watchdog could catch it.
The two times something like this happened to me, it was always a RAM issue. One time it was XMP, the other time I forgot to install swap, so the memory simply ran out
This does sound like a hardware problem:
- Find the motherboard brand/model # and memory brand/model.
- Check the motherboard's manufacturer's support page for compatible memory with that specific motherboard. Also check their forum for similar problems and solutions.
- Unplug all non-critical peripherals ( might be a driver issue).
- Swap the two (or more) memory sticks, see if that changes the freezing somehow.
- Check mother board manufacturer for updated BIOS, especially if the new BIOS addresses memory concerns.
- If the BIOS doesn't have a memory test, Try MemTest86+.
- If it's not the hardware, the BIOS, or the BIOS settings: Boot a live USB stick, see if the problem still exists ( might be a corrupt install somewhere; backup data and install a different distro; on a different drive if available; or stick (a backup of?) the boot drive in a different machine).
- Dig into the logs, as mentioned elsewhere.
There was or is a specific bug with earlier Ryzen CPU generations that causes system freezes when the CPU enters the powersaving c-state #6. If that applies to your machine, try the kernel parameter max_cstate=5 for a while. I had one PC build where this 100% resolved the system freezing after several minutes. Note though that this is ~6 year old info. Issue might have been resolved in the meantime. But worth a try probably if you have an older Ryzen CPU.
I sugest you to install/enable sysstat if you have not done that already, and with those metrics you will have some great starting point about what resources may be the culprit next time it happens. It will help you pinpoint if there is a hardware related issue.
Do you have kdump enabled? If so, you can try to force a coredump when the system freezes, so you can uater analyze what the issue is. It is harder to follow this path, as you may need analyzing such dump, but it will help you identify issues not only on the hardware side, but on the software side as well.
Both tools are our bread and butter for RCAs/postmortems
I've had these issues during high intensity GPU usage on an nvidia gpu. It's the only times REISUB didn't work and I've had to do a hard reset.
Not much I can contribute other than don't rule out a nvidia driver problem.
Setting up a watchdog and crash kernel might help or let you diagnose it
Does your machine have a PS/2 port by chance? That should give you magic keys when USB keyboards won't.
Does the previous boot log show normal or error lines when it happens?
journalctl -o short-precise -k -b -1 shows you kernel messages from last boot, this should be a good start.
Doesn't work in my experience, or I'm typing it wrong. I can use the journalctl boot filter to show the current boot, the 2 boots ago, but not the previous boot where the system crashed.
So I end up filtering by time instead with --since
I had / have a similar issue that started at some point on my Ryzen 7 laptop with Kubuntu 24.04. I haven't tried REISUB yet, but otherwise same symptoms.
RAM is the usual suspect. I ran memtest for 24h++ with no errors Also tailed dmesg and journalctl to a remote machine, and checked journalctl after reboot. No errors reported. Presumably because the system hard locked before it had a chance to log the error.
I never found a root cause, but after I changed the KDE Power Profile from Eco to either Balanced or Power (I don't remember which) the random freezing reduced from 1-3 times per day to once every few weeks of continuous uptime.
So my guess is some kernel driver bug relating to power states of the CPU ( or GPU nVidia 3060 with 590 drivers)
For actual advice:
- run memtest to verify RAM. Do multiple passes, at least overnight.
- Check cooling? RAM and other things can overheat and cause locks, not just CPU.
- Can you throw a couple of different distros on there, to try different permutations of kernel and drivers, old and new.
- journalctl --follow (with sudo), dmesg -w. I ran these over ssh from a remote machine. Even better if you can run it on a local 2nd monitor. The point is to have them open the whole time since it's too late to change once the system is locked up
My system completely locks up every few hours. It’s not just a DE crash; the entire machine becomes unresponsive. The mouse and keyboard are completely dead (no cursor movement, Caps Lock key doesn’t toggle).
Before you rule out a DE (or Wayland issue), are you 100% sure the entire system is unresponsive? Like is it still online and responding to ping or SSH? Just to be sure try enabling SSH on the system - then set up a spare laptop/computer on the same network that can normally ping or SSH to your Linux system. Next time the issue occurs test to see if the Linux system is truly unresponsive by checking if it is still responding to pings and allowing you to SSH into it.
If you don't have a spare laptop/desktop but do have an Android phone you could do the same with Termux.
Also if you can SSH into it you should be able to force logout your own user, that would bring your Linux system back to the login screen and you'd then be able to use mouse/keyboard normally again. (run "who" to view logged in users, run "pkill -u your-username" to kill and logout the user, may need to run those with sudo)
Only reason I mention it is that I have an ancient desktop that exhibits similar behavior occasionally but my system is still alive on the network. So far for me it seems like it might be a Wayland + Nvidia + GNOME issue. Once I switched back to X11 it doesn't seem like the issue occurs anymore.
The Caps Lock LED being unresponsive is a good indicator the CPU is locked up hard. USB keyboards don't send interrupts like PS/2 so the bar for operation is slightly higher, but it still signals something is very wrong with the system
First I always check with sudo journalctl -r
Check journalctl --help for more options or do sudo journalctl --since "2015-06-26 23:15:00" --until "2015-06-26 23:20:00"
Then search errors online or come back with more questions.
Sounds a lot like RAM problems. Have you tried changing those to some you know work?
To pile on to your (excellent) suggestion, OP might try enabling POST (power on self test) in BIOS. It takes awhile on modern RAM, but I've had it identify failing RAM sticks for me.
Ye, when I had the problem it was the RAM.