Hi all. I was so excited to roll out my brand new X15. I purchased 1 x 24 TB (parity) and 4 x 20 TB (data) Toshiba drives. Had two crashes in 4 days.
Configured Unraid 7.3.1 (which was preinstalled) and decided to let parity build complete before creating shares, etc. Parity build finished at night (took approx. 26 hours) but when i tried to access the server approx. 4 hours later, Linux was unpingable from two different boxes. MegaRAC SP-X was fine, so I restarted from there and Unraid booted normally. Parity check completed fine without any server crashes.
Started to batch copy about 2TB of files. Script completed OK, but when I got home, Linux was again unreachable. MegaRAC SP-X was fine again, so did another restart.
Worked with ChatGPT after both incidents to check Linux, Unraid, & Bois settings but found nothing of interest.
I’m at my wits end. I’m not even sure if I should refer to these events as “crashes” as it feels to me more like system shutdowns when idle. Tying to figure out how to contact 45HomeLab support, but I thought I ask here to see if anyone can make some suggestions for what to check next.
I really find it odd that the system is fine whenever it’s doing something, but within a few hours of being idle it crashes/shutsdown.
info@45homelab.com, unless they have set up something special for the Unraid line.
I don’t know why your system would be different than others (although 45HL builds don’t necessarily seem as consistent as one might think judging by some posts here), and I’m not an Unraid user, but I would check the power states for the PCIE slots in the BIOS, specifically looking for the settings ffor C-states and ASPM. From your description, it sounds like the OS might be putting the NIC to sleep.
Unless 45HL or Unraid suggests otherwise, I would also be sure you are on the latest BIOS for the Gigabyte MW34-SP0 given the history of Vcore issues on 14th gen Intel.
Finally, my understanding is the system logs are kept in memory in Unraid unless told otherwise, so if you are looking at dmesg logs and such after rebooting, you probably have loast the history. I think to properly save logs to see if you can figure out what is causing the system to lock up there is a setting like “Mirror syslog to boot drive” that you need to set in Unraid.
I’m not sure what support Unraid has, but I’d also search through their forums for posts about system lockups on idle. It seems to be a common issue with certain hardware and Linux kernel versions.
Is the entire system really crashed/shutdown or is just the network stack? It sounds to me like it might just be the network that’s a problem. I’m not an Unraid user but I assume there’s an underlying console you can access through the X15’s remote management. Is that right?
And speaking of remote management, I had an issue with similar symptons with my custom HL8 build that ROMED8-2T and ran TrueNAS. The system would boot up just fine, but, after a few hours, I’d loose all network connectivity. In my situation, the problem ended up being the IPMI network configuration. The default settings had the IPMI dedicated NIC and first onboard NIC bonded together with the onboard NIC in a “shared” mode so both the IPMI and host could use it. The sharing of the NIC would eventually confuse TrueNAS and render the network access unusuable. I haven’t had a problem since configuring the IPMI to only use the dedicated NIC.
Thanks for the reply. I emailed them as you suggested, hopefully they can shed some light on this.
I did check the bios for any power settings and came up empty. I forgot to confirm that I’m on the latest bios, but I have this from one of my screen captures. Asked ChatGPT to check, but he came up empty. I search the web for updates myself as soon as i get a chance.
I did set the syslog to mirror to my boot drive. Passed all the logs through ChatGPT and he didn’t find anything. syslog entries stop almost 8 hours before my backup script completed successfully, but given that the only thing going on was the backup and I’m not sure that a copy process should generate syslog entries anyway, I’m not too concerned by this.
Hi and thanks for the reply. I also wondered originally if it was a network issue.
The X15 does come with a pretty cool IPMI. I was able to connect there and the KVM showed no signal, so I’m pretty sure it’s not just a network thing. I’m new to using IP KVM’s but it worked really well both times I babysat the boot process. It let me get into the bios easily. I’m pretty sure then that the no signal message was legit.
Forgot to mention that, my syslog entries stopped well before my backup completed. While I’m debugging, I have a background process sending heartbeat message to the syslog every 5 minutes.
I’m a bit concerned that even this minor debugging logging might constitute enough activity to prevent the unit from going into “sleep mode”, if that is what in fact is happening here, but I felt that the risk of masking the issue is outweighed by the benefits of getting more precise data on when the events occur.
I doubt that I’ll get this lucky, but imagine if repeated incidents occur at roughly consistent intervals after the system load ends. e.g.
backup script finishes at 10:10, system goes down at 10:42
parity check finishes at 16:06, system goes down at 16:31
I’m trying to "find some sort of “smoking gun” to prove that these aren’t just random events.
Do you have any docker containers running? or apps like Dynamix S3 Sleep to try to power down the drives when the system is idle? I assume you’d have mentioned it if you did, just checking.
What sort of commands are you using to look at the log, something like;
grep -iE “aer|ixgbe|suspend|sync|mce” /boot/logs/syslog-previous
and
tail -n 100 /boot/logs/syslog-previous
Unfortunately, the documentation for mobo BIOSes is typically slim and documentation of any sort for the MW34-SP0 specifically is particularly lacking, so you’ll need someone else with that motherboard to help with where the ASPM and power settings are. I’m not sure if asking Gigabyte support about that would help while you wait on 45HL.