HL15 Full Build - Machine Check errors

I’m hoping I’ve tagged this post with the appropriate tags. I’ve got a new HL15 full build and running into some potential issues with “machine check events” getting logged. Right now these are being corrected automatically but I’m a bit leery about it.

Info about the system:
CPU: Xeon Bronze 3204
Mobo: X11SPH-nCTF
RAM: 8x 16GB DDR ECC, HMA42GR7MFR4N-TF
OS: Unraid 6.12.8

  • I already ran memtest86, passed w/o an issue
  • No errors at startup
  • Seems to start happening once I start a copy from my old NAS to this server
  • SMB mounted folder, using rsync -avzhp /location1 /location2
  • Array is formatted with ZFS
  • I’ve got an output from /var/log/mcelog but having trouble determining which module(s) it could be though I’m not even sure if mcelog is compatible with this motherboard
  • As for potential culprits I’m seeing these being logged there (can post the entire mcelog but I think I need approval first)
    CPU 0 BANK 7 TSC 9f96bc1f192
    MISC 200000c000601086 ADDR 203fdee2c0
    MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

The version of memtest86 included with Unraid is actually on the older side. I got the latest and greatest and 25min into the test getting some ECC errors. Hoping to figure out which slots are at fault

End issue was one of the sticks was indeed producing a ton of ECC correct errors. Got it swapped with another stick and its been fine for a few hours now

2 Likes